All software has bugs from time to time – this is an unfortunate but unavoidable part of software development. The crucial questions are – what happens when software goes wrong? And how can software errors be solved?
Here at Enable, we believe that it is important to thoroughly monitor and address any issues we find within our applications to continuously improve their performance and usability. Part of this includes being able to react as quickly as possible to when an error occurs within our system. Below we explain what happens when our software has an issue.
We have designed and built a three-stage process which goes from an error occurring in one of our systems to the error being raised, investigated and resolved.
Azure Application Insights is a feature integrated within Microsoft’s hosting infrastructure to provide management of the performance of an application. This provides a lot of information about how our systems are working, including details about performance and system usage. It also provides information about any errors that occur in the operation of our system.
Data and telemetry from this is recorded in Microsoft’s cloud storage and permits a wide range of tools for investigating and reviewing the information stored. Alerts can also be triggered whenever particular events occur within the system, with control over what those events are and what should happen when they occur.
Whenever an error is detected in any of our systems, Application Insights will trigger an alert. This alert contains a large amount of information about the error and the state of the system at the time the error occurred. This information is passed directly to our internal systems.
Cello is our bespoke ticketing service, which we use for recording all manner of different workstreams. Customers can use this to raise queries or feedback with us, and we use it internally to track topics as diverse as estimates for new phases of work, technical queries between teams, and maintenance of our office building.
Cello is able to react to the alerts raised by Application Insights, using an Azure function to extract relevant information from each error, such as when it occurred, which product it occurred in, and where in the code the problem occurred. This information is used to generate details for a ticket to be raised in a dedicated backlog. Another function compares the ticket generated to other errors tickets to determine whether the error is new, or should be grouped together with an existing ticket.
Engineers’ working days are organized through another bespoke software product called Dashboard. Every day of the week, some of the Engineering team have the day set aside for handling support tickets and other time-sensitive items of work.
Once the error has been raised, it can be scheduled to be reviewed. Some urgent errors are identified automatically based on the area of code where they occurred – these will be automatically assigned to an engineer on Dashboard to review and resolve the error promptly.
For less critical errors, time is also regularly allocated for engineers to review the backlog of errors observed in the system, identifying and resolving issues raised.
Our approach to handling software errors automatically offers a number of benefits:
At Enable we understand how important it is that our software products work effectively, reliably, and accurately. By consistently monitoring our solutions, we can proactively identify and address areas where this is not the case, giving you more confidence in our software.