How ML underpins Functionize
Software testing has endured, what I term, a QA Winter. Developers and testers still maintain tests the same way they did in the early ages of the internet. Test automation has fallen far behind — and at Functionize we are on a mission to change that.
Tamas Cser, CEO of Functionize.
As we reviewed in the first part of this series, test automation is an essential component of the software delivery lifecycle. For years, test automation lagged behind other developments in software testing, leading our CEO to coin the term “QA winter”.
The test automation problem
Test automation has been transformative and is arguably one of the key technologies that drove the tremendous uptake in web and mobile applications over recent years. However, test automation suffers from some key issues. Generally, these issues crop up at every stage in the test lifecycle:
- Test creation is slow. It requires skilled developers, even for the most basic tests.
- Test maintenance eats time dealing with false-positive failures when the application’s UI is updated.
- Debugging is tedious, and particularly painful when testing cross-browser and cross-platform.
- Test execution takes too long. It’s also limited by the test infrastructure, and often requires a Devops engineer to be heavily involved for any kind of scale.
- Analysis only checks a small part of the overall user interface, which means that it misses bugs and defects.
As we saw in part 1, people have started to turn to machine learning (ML) in an attempt to solve these issues. But most solutions are only partial. For instance, advanced test recorders that use ML to improve element selection or systems that use natural language processing (NLP) to help with converting test plans to scripts.
Here, we explain how Functionize takes a different approach, embedding machine learning, and fundamentally data science, into everything we do. The end result is a system that allows you to automate your entire test suite quickly, reliably, and efficiently. Something that is only possible with the intelligent application of machine learning.
Machine learning at Functionize
Functionize was founded in 2016 with the aim of applying machine learning to revolutionize Functional UI testing. Our product is a complete solution for creating, executing, scaling, and analyzing tests without the need to write traditional scripts (though you can still tune your tests with scripting if you like!). We’ve noticed that all the most popular technology
Simplicity in Creation
Quickly and easily build test by walking the test steps using our Architect utility or typing steps in plain English which our ML converts to automation.
Scalability in Execution
Our cloud based testing allows you to run as many tests as you need, as often as you need across all major browsers and for multiple data scenarios. No infrastructure limitations or added costs.
Efficiency in Debugging
Our ML learns your application, quickly diagnosing test failures across a number of different dimensions ranging from visual and temporal anomaly to full root cause analysis.
Durability in Maintaining
No more tedious test maintenance work. Our ML recognizes changes in the UI or underlying code and dynamically updates selectors in your tests as necessary.
(Jira, Slack, etc) supports the ability to extend the product so we wanted to make sure that if what we have out of the box doesn’t fit your needs exactly, you can always extend it. Over the years, we have embedded machine learning and data science techniques in every part of our product. The result is a system that simplifies test creation, streamlines debugging, achieves completely scalable execution, and creates durable tests that need little maintenance.
Machine learning grew out of the data science field, and many of the algorithms it uses originated there. Indeed, many practical applications of machine learning still require data scientists to turn to approaches like Akaike information criteria to help prepare the data and assess the overall quality of their statistical models.
At Functionize, we’re looking at the emerging big data problem underpinning effective testing and applying data science to solve the lasting (and new) challenges of test automation.
Functionize offers two ways to create new tests. Both of which incorporate machine learning and are designed to simplify the process of test creation.
Architect is the newest addition to our product which enables creation of complex tests with a similar user experience as an advanced recorder. During test creation Architect leverages machine learning in a few key places:
Element Selection: When you select an element on the page, our ML engine captures and catalogs hundreds of data points for that object, creating a unique fingerprint. This allows for a combination of factors to be used to uniquely identify the element in future test runs as opposed to a single selector. Additionally, you can navigate elements that are within a different DOM, such as within an iFrame. This is particularly useful when you are testing an application with embedded content.
Image Recognition: Architect enables creation of complex verifications based on image processing. For instance, you can ask it to compare a page against either the previous test run or against a previous test step. You can specify the acceptable variance—how precise the comparison must be. This is based on our custom template recognition system. It understands how visual elements relate to each other and knows to ignore elements that change frequently, such as the date.
Beyond ML based capabilities, Architect also offers a number of advanced features that make it uniquely powerful. These include the ability to test two factor authentication flows, even ones that require an SMS for verification. Additionally, It includes advanced tools like a database explorer (allowing you to create tests that verify what happens on the backend) and API explorer (allowing you to incorporate API tests into your broader test plans). You can even take advantage of in-test variables, project variables, and variables used across tests when running with a data set.
Natural Language Processing
The second approach for creating tests is adaptive language processing or ALP, which is based on the principles of NLP. In this method, you pass the system a set of test plans written in plain English and it uses these to generate the corresponding set of tests. Unlike other NLP-bases systems, ALP can cope with both structured and unstructured text. Structured test plans use a set of keywords such as open, verify, and click. Each step can optionally include test data and the expected result. Unstructured tests allow you to use plain English for your tests. For instance, you can just say:
“Open Facebook and log in using “firstname.lastname@example.org” and the password “PassW0rd”. Then check that the Facebook homepage has loaded.”
Once ALP has parsed the test plans, it will generate the actual tests. This is done using our Cognitive ML engine, which combines NLP, computer vision, machine learning, and boosting to generate a mulit-dimensional model of your UI. This is especially important when dealing with dynamic and personalized sites where many elements on the page may change from version to version or even depending on which user is logged in.
Which approach should I choose?
ALP really works best when you want to automate a large number of simple tests. For instance, when you are migrating a test suite. It requires a complex modeling process so ALP tests may take some time to be created, as a result it can be inefficient when you want to automate a single test. By contrast, Architect is quick to create a single test and offers the ability to add advanced test steps such as 2FA logins. However, it is slower to create multiple tests as each one has to be generated individually. Depending upon your project, you can combine both approaches. Using ALP to migrate existing test plans, then using Architect to refine them and add advanced functionality.
Ideally, the tests always run seamlessly, but there are times when the application under test breaks or the software changes significantly. In these cases, our Cognitive ML engine may not always know what element to select. If it can’t find the right element to click or verify, it will typically provide a set of elements which might be the right choice. You can select a potential fix, accept the proposed changes and then immediately verify if it solves the problem. No lengthy review of the failure or tedious tuning of the test. You are also given the option to apply that change to similar tests.
Improving test analysis
As we saw in the previous chapter, analyzing test results can be challenging. At Functionize, we leverage ML-powered anomaly detection to identify bugs and defects.
We leverage two forms of anomaly detection within our system. Visual anomaly detection allows us to identify UI elements that have changed from either the previous test step or the previous test run. Temporal anomaly detection allows us to identify anomalies in test output over time. This is important for identifying certain kinds of bugs or defects.
Most UIs are rendered dynamically and often contain content that changes with each run (example – ads) which may render differently on every browser instance. As a result, a traditional pixel-by-pixel comparison will give far too many false positives. That is, it will identify things as a test failure when they aren’t.
Our system is more nuanced. To enable our visual anomaly detection, we take UI screenshots before, during, and after every test step throughout every test run. We then use computer vision coupled with a fuzzy comparison logic to assess changes. This allows for a (configurable) level of variation between screenshots. It also learns to ignore things that always change between runs. When there is a significant variation, it will return a warning until that visual change is accepted thus setting a new baseline for the visual component. The system is also able to identify elements that have been restyled. As you will see, this is important for reducing maintenance.
Temporal anomalies occur when test results differ between test runs. One of the challenges is differentiating between changes that are expected and ones that indicate a test failure. For instance, the price of goods in an eCommerce app might change dynamically based on currency exchange rates, so a price difference of a few cents isn’t a test failure. However, if a product is suddenly 10x cheaper, that is a problem. By using the data science technique of LSTM (long short term memory), our system is able to understand complex user flows and learn the difference between these sorts of anomalies.
Test maintenance is one of the biggest time sinks for any test automation team. Not only that, all the effort is effectively wasted—it achieves nothing productive. This is why one of our main goals has been to slash the time needed for test maintenance.
We already explained how our Root Cause Analysis engine can help with debugging tests, but our Cognitive ML engine also addresses the massive time sink of updating your tests and their underlying selectors after each change to the UI or DOM. Imagine the shopping cart page for an online store. After filling in their email address the user can click one of two buttons. One button checks out as a guest, the other creates an account. In the test, the button creating an account is selected. If the page is redesigned and these buttons swap, the labels change slightly or perhaps they move to a different place on the page many automation approaches will be unable to find the “create account” element and fail the test.
Traditional selector based element identification will generally have an awful time navigating the common changes mentioned in the preceding example, especially if more than one occurs at once. Our Cognitive ML engine and intelligent anomaly detection already reduce the need for test maintenance by removing some of the more common causes of false positives, but we take it further by going beyond the traditional selector based approach. Our ML models derive contextual meaning from each step of a test and with the understanding of that underlying intent are able to navigate multi-dimensional changes as your application evolves.
Self-healing uses an application meta-modeling approach (more on that later) to create a multi-dimensional understanding of each element on each page. This enables our Cognitive ML engine to understand the underlying intent of a test step and navigate changes that might materially alter the UI, but not the ability to complete that action. When change is detected the system uses that understanding to update the test accordingly.
Tests are multi-dimensional, and many different things can change to cause the incorrect element to be selected. We use an approach called application meta-modeling to deal with this. The aim is to identify the actual underlying intent of the test. The system is then able to tell what the likely correct element was, mimicking the way a human might arrive at that decision. The following diagram shows the flow.
Statistical models are used to categorize, rate and rank thousands of data points to ensure the stability of element identification.
In the above scenario, we described a relatively common set of changes you might expect to see as an application evolves over time. These minor adjustments wreak havoc on automated tests and generally require many hours of work to identify why a test failed and then update the script accordingly. As we explained earlier our system is designed to eliminate those time consuming updates, but what happens if the underlying change was not supposed to happen? While self-healing will statistically make the correct decision in updating steps the vast majority of the time, there are times where it is not aware that a particular change was unintended. To address that, the Cognitive ML engine is constantly assessing which attributes of every element are stable, which are dynamic, thus prone to change (ex – dates, currency exchange rates, etc…) and which are anomalous, signaling unexpected change.
Using that information we communicate back to the user when a step has been updated, what the system changed and what underlying information was used to make that decision. This provides testers with added confidence that the system is making the right decisions and provides better awareness of those unintended changes should they arise.
Visual template recognition
The final part of the puzzle is our visual template recognition system. This is a powerful approach for coping with dynamic or configurable UIs. It breaks the UI down into elements and then works out how these elements relate to each other. For instance, it can identify the user name and actual name on the screen and knows that these are related. By looking for these elements, the system can create a template of how the page should look. It is then able to identify significant changes while ignoring ones that change dynamically, like different date formats or localization data.
This is also particularly useful for identifying changes to the page that are otherwise not assessed by the various validations you have configured. Traditional test automation only knows to look for failures where you tell it to and even then based on a narrow view of the specific elements in question. Many times other parts of the page are also changing which will otherwise pass through testing unseen unless specifically addressed by a validation of some form. In a previous section we used the example of the “create account” button changing. Imagine in that scenario the “checkout as guest” option is misspelled or disappears entirely. Unless you are testing for the presence of that item and verifying it’s label there would be no way to identify that. Visual templates give you a holistic view of the entire page and will surface these differences.
All Functionize tests run in our test cloud. This allows us access to the computing power needed to run our machine learning models and also creates a number of significant advantages. Firstly, there are no test environment based constraints on execution. You can scale up to running thousands of tests, allowing you to get through your test suite quicker without the effort (and cost) of provisioning those environments. Secondly, you don’t need to expend SysAdmin or DevOps resources on maintaining test infrastructure, navigating security patches and monitoring system performance to ensure environmental factors don’t impact test results. Thirdly, it makes it extremely easy to do complex test orchestrations. Finally, we keep the history of all your tests, which helps with debugging and identifying bugs or defects.
Scalable elastic infrastructure
Functionize is based in cloud and runs on Google Cloud and AWS. This means we can support customers all over the world and can offer virtually unlimited test capacity. Every Functionize test runs in its own instance or virtual machine. This is important because it ensures tests are more realistic. Each test comes from a different IP address. Most modern web applications rely on load balancers to control the load on their backend. If you run all your tests from behind a single machine, the load balancer will put all the load on one backend server.
One of the most useful aspects of our test cloud is that it offers you the ability to do cross-platform testing. We leverage a technique called nested virtualization (offered as part of Google Compute Engine). This allows us to simulate mobile devices within our test cloud. As mentioned already, all our tests automatically work cross-browser thanks to the machine learning models powering the Cognitive ML engine. So, you can do all your mobile browser testing in our cloud. This saves you significant cost and effort. You don’t need to maintain an expensive inventory of mobile test devices and testing is completely automated and efficient. This avoids the need for extensive manual testing or a cobbling together of multiple tools, plus you get all the benefits of our ML-powered approach to testing.
We are committed to constantly improving our products and invest heavily in R&D. Over the coming months, we will be introducing Smart Analysis reporting which will provide gap analysis of your test coverage; evaluating what tests are running as compared to what your users are actually doing in your product by observing real user interactions with your UI. Combined with enhanced diff reporting this will provide a complete view of what has changed across your entire application from release to release. Our ultimate aim is to empower you to test every feature of your UI and to eliminate maintenance completely.