The Evolution of Fixing Broken Tests

Background and history of debugging

The huge growth of web applications over recent years has been accompanied by a growth in automated testing solutions. Many web applications are so complex that without automation they would be impossible to test properly. However, automated tests require constant maintenance and debugging. Indeed, often these tasks end up taking up the overwhelming majority of the QA team’s time.

The problem is that debugging test failures in automated tests is seldom easy. Typically, it requires scripting or recording the test, running it until it fails, trying to identify the failure, updating the test script and running it again. Often this process has to be repeated again and again before the test works. Then a simple change in the application might cause the test to break and the whole process has to restart.

To give a simple example of why this can be so complicated, imagine the shopping cart page for an online shop. After filling in their email address the user can click one of two buttons. One button checks out as a guest, the other creates an account. In the test, the button creating an account is selected. If the page is redesigned so these buttons swap, the test will happily proceed to the next steps. The problem is that instead of creating an account they now just proceed through checkout. If the final step is to check the order status in the user account page that will now fail. Analysing failures such as this are hard because the root cause of the failure happened several steps earlier. In some cases, the root cause may even have been right at the start of a flow and may be non-obvious.

How existing solutions try to solve the problem

Over the years, a number of companies have tried to solve this problem, but with limited success. Typically, their solutions have been too narrow and are usually focused on reducing the effort needed for selector maintenance in tests. This is often done by deriving a confidence measure for all the test elements on an element-by-element basis. The idea is to try and identify the most likely element that may have triggered the failure.

For instance, many automated test solutions use machine learning (ML) to analyze the thousands of parameters for each element in the test. This allows them to measure the overall probability of each selector being correct and to rank them accordingly. The test engineer can then see which element is likely triggering the failure and can then change that in the test.

The problem is that such approaches aren’t comprehensive enough. They still require extensive scripting, exhaustive checking of selectors and repeated rounds of making changes, re-running tests and checking the results. All they are really doing is reducing the burden of selector maintenance, which is just one small part of the picture.

The Missing Piece

So, what’s missing from these solutions? The answer is obvious (though implementing it is far from easy). You need a solution that is more holistic and takes account of the entire application flow and test history. The test system should behave like a skilled human and learn how the application is meant to work in order to be able to identify the likely root cause of a test failure. It isn’t enough simply to understand all the attributes of each element or the relationship between elements. Even being able to cope with changes in element location and functionality is insufficient.

What is needed is a system that is able to truly understand what each element is actually trying to do. It should be able to use information about service calls, visibility states, metrics relating to page load times and the deeper relationship between user actions and server calls. To take our webshop example above, the system should be able to know that if you haven’t asked to create an account, then you can’t later choose to see your order status. To a human, this seems obvious, but for a computer, it’s much harder.

Functionize’s AEA™ (Adaptive Event Analysis) engine provides exactly this level of functionality and is the first solution that takes a truly holistic approach to root-cause analysis and self-healing tests. In the next section, we will see in detail how this works and how this can help test engineers become far more productive.

The Functionize Solution

The Functionize AEA™ engine is designed to allow fully automated debugging of tests ­– using this technology, no QA engineer will need to spend time manually debugging unless they choose to. The system consists of three elements: root cause analysis, self-healing, and 1-click updates. These will be described in more detail below. These elements work in combination to provide the most advanced level of automated debugging available anywhere. The different elements form a hierarchy of automation from simply assisting the engineer with identifying the root cause of the failure through to offering 1-click test updating.

Root Cause Analysis

Root Cause Analysis uses a combination of a rule-based expert system and machine learning in order to identify the most likely root cause of a given failure. This advanced modeling system relies on historical test data and an understanding of common failure scenarios in order to help identify the root cause of the current test failure.

The rule-based expert system is able to identify the most common failure scenarios such as element-selection failures or comparisons that have been set too tightly and fail after small data changes. Each action in the test is checked against a set of rules, compared with related actions earlier in the same test and with similar actions in previous successful tests. The outcome of all these comparisons is passed into an intelligent scoring mechanism that uses machine learning to evaluate their relative probabilities of being the correct action.

The machine learning models have been trained to appropriately score the results of the rule-based expert system. Because these scores depend on multiple data dynamic sources, they can’t be trivially calculated algorithmically. The outcome of this part is a probability that each action might have been the one that failed. Machine learning is also used to learn what successful actions look like over time. This process gives another way to identify anomalous actions that may have been the root cause of the failure. This second set of ML models is often able to identify the root cause even when the expert-based system is not able to.

Smart Suggestions

Identifying the likely root cause of the failure is only one part of debugging the test. Having identified it, you potentially have a huge number of possible changes that might resolve the failure. The smart suggestion engine uses historical data and ML in order to learn what the correct action is most likely to have been. Using this it is able to automatically make a number of suggestions for possible failure resolutions. These suggestions are presented to the user. Once the user accepts a suggestion, the test it updated accordingly and is re-run.

The evolution of fixing broken testsSelf-Healing

Self-healing is the highest level in the AEA hierarchy. Often there is more than one potential root cause for a given failure, and each of those root causes may have several possible resolutions. In such situations where the correct solution may not be immediately obvious, the self-heal feature allows the test to be re-run with each of the potential solutions. The results of these are then presented to the user who can then click to approve any successful resolution. This will automatically update the test for future runs. The idea here is to replicate what a user would have to do if they were manually updating the test. This feature has the potential to save a vast amount of QA maintenance time.

Examples of AEA in Action

To better understand how AEA™ works let’s look at two common failure scenarios, incorrect element selection, and comparisons.

Element Selection

The root cause of many test failures is incorrect element selection. For various reasons, a test may identify and select the wrong element on a page, resulting in a failure. Sometimes this failure may occur immediately, but more often it fails further through the test process. The example given at the start is a good illustration of this sort of failure.

With this sort of failure, root cause analysis can usually identify the original failing selection by comparing it with previous successful test runs. This saves time in debugging the cause of the test failure. Smart suggestions can then use the historical data to suggest the most likely changes needed to select the correct element. Finally, self-heal can implement and validate these changes, allowing the test to be automatically updated.

Comparisons

Many tests rely on a comparison ­– checking that a piece of data lies within a given range or is equal to a given value. Taking the shopping cart example above, the test may have earlier selected a $500 laptop to add to the cart. On the shopping cart page, the test might check that the total value of the order is equal to $500. If the underlying price data has changed this overly precise comparison may well fail.

Here the root cause analysis will use the rules-based expert system to identify that the value of the items was incorrect. The smart suggestions engine will then suggest that this comparison be improved. For instance, storing the value of the laptop that was selected and comparing this with what is displayed in the shopping cart. It may even suggest that a more valuable check is to ensure there are the correct number of items in the shopping cart, that all items have a value greater than $0, that the values were the ones displayed on the relevant pages and that the sum has been correctly calculated. Finally, self-heal will change the comparison to a more appropriate one and update the test.

Summary

Functionize’s AEA™ engine is a complete game changer in automated test debugging. By using machine learning, the system is able to replicate what a skilled tester would do to identify and correct the root cause of test failures, but in a fraction of the time. Not only does this free up valuable test engineer time, it also makes it easy for less skilled users to maintain even the most complex of test suites. The AEA™ engine is a big step forward in automated test debugging, but we’re not stopping there. In the near future, we intend to add more ways to identify and locate bugs. These include checking for tests that failed due to slow loading of a page following an action, checks for page or element load failures (for instance an image may not have loaded correctly due to a timeout) and authentication/ authorization errors where the test user doesn’t have the right privileges to complete an action.

 

...