How to demonstrate the value of your automated testing
Your test team faces a conundrum every day. How can they help deliver high quality software quickly while not wasting money on unnecessary testing. Here, we explore how intelligent test automation can solve this and deliver a good return on investment.
Testing ensures you release quality software that gives your users the best experience. However, there has always been a tension between software quality, speed of delivery, and value for money. The accepted wisdom has been that you can only have two of the three. However, as we will see, intelligent automated testing allows you to have all three. However, before we look at that, we need to understand what testing is, and why it is so expensive.
The cost of software testing
Software testing is often presented as a V-model, which matches stages in product evolution with the respective form of testing. Or it may be shown as a hierarchy as in the picture below.
During development, your software engineers will (hopefully) conduct thorough unit tests. These tests are designed to verify that individual functions perform as expected. Skilled software development teams may even employ mutation testing to ensure that their unit tests are robust enough. By their nature, unit tests can only check that each individual function is correct.
Integration tests are needed as you combine functions to create features. These integration tests will typically use fake data sources and APIs. Often, integration tests are also done by software engineers. However, they are a critical piece of testing done by quality engineers or test analysts.
System tests happen as soon as you reach the MVP stage. This is when testing becomes the sole preserve of your test analysts. Within systems testing, their job is multi-faceted::
- Functional testing, to check that the new features work correctly.
- Regression testing, to verify that any new features or fixes don’t break the existing product.
- Performance testing, to check that the product works well under various loads.
The costs of system testing
Broadly, your team of quality engineers has two approaches they can use for system testing. These are manual testing and automated testing. There are costs associated with both approaches, as we shall see.
The traditional testing approach was to do everything manually. Your test engineers will go through detailed manual test plans one step at a time. At each step, they will check to see if the outcome is as expected. If it isn’t, they mark it as a bug/test failure. For new features, they will also need to spend time creating detailed defect reports for the developers.
Manual testing is slow, inefficient, and expensive. Each test analyst will only be able to complete a certain number of tests per day. You don’t want them to rush, in case they make mistakes. The only way to speed up your testing is to employ more test analysts. However, test analysts are relatively well-paid, so this comes at a high price. Even with a large (and expensive) team, manual testing will take time to run a complete set of tests.
The other option is to automate these tests where possible. Automated testing involves using a computer to interact with your software in order to complete your tests. In most cases, you test via the UI, since that is how your end-users will interact with the system.
UI-based test automation has been around for a long time. The most well-known framework is Selenium, used to interact with web browsers, first created in 2004. Selenium requires the use of test scripts, typically based on common frameworks such as jUnit or pytest, to run through your automated test. It works by simulating a user interacting with your application; selecting elements in your UI and interacting with them. For instance, finding a button and clicking it or locating a text input field and filling it in. At each stage, you can add checks (assertions) to see if the test has been successful. These can include checking whether you have moved to a new page or looking for specific elements on the screen (e.g. a login name).
Creating robust automated tests takes a great deal of skill. A test script is a software engineering project in its own right. You need to go through repeated cycles of coding and debugging as you create it. You also need to refactor your script for each browser and device it might run on. So, you need to be a skilled developer. At the same time, you need to be an expert in testing, so you understand how to create robust tests. This is why Developers in Test are among the best-paid developers in the industry.
The value of testing
If you are in charge of QA for a product, you will know there is a trio of business outcomes you need to meet. Chances are, these are used to set your KPI targets each year. Certainly, they are the aspects that the executive team all care about.
Everyone knows that bugs are a sure-fire way to annoy your users. But even minor defects can harm the user experience. This is why companies devote significant effort to testing their software. In the ideal world, every release performs better than your last. New features should interact seamlessly with your existing ones. And, above all, none of your users should come across unsolved bugs in the software. In a nutshell, you want to ensure that you only release high-quality software. Achieving this requires you to test your software thoroughly.
Speed of delivery
In the modern world, you are engaged in a constant race to release new features faster than your rivals. Giant companies like Google only got where they are by constantly releasing updates and tweaks to their software. Indeed, many multinational companies will now release new code every few hours. This process of continuous integration and continuous delivery (CI/CD) is a far cry from the old days of monolithic releases. It also aligns closely with the concept of agile development. However, it places enormous demands on your whole team, including developers, quality engineers, and dev-ops engineers.
Value for money
In the current climate, every business has to strive to keep costs down. This means increasing efficiency and delivering the best value for money possible. As a rule, testing is a relatively high-cost item. You typically need to employ lots of manual testers as well as several skilled (and expensive) Developers in Test.
The conundrum of testing
With testing, there is always a tension between software quality, speed of delivery, and value for money. The following diagram illustrates this.
Essentially, you can only ever achieve 2 of these goals. If you want to increase quality, you either have to reduce speed or reduce value for money. Similar trade-offs exist for all three goals. The upshot is, measuring return on investment is hard. Getting it right requires you to understand why it is that good testing costs money.
Traditional ways to measure testing
Clearly, the starting point for establishing the cost-effectiveness of testing is measuring how good it is. So, how do you go about measuring how good your testing is? There are a few metrics that are widely used in the industry.
Number of tests
This is probably the simplest metric and will most likely bring back terrible flashbacks for experienced testers. It simply looks at the total number of tests you have for your system. The more tests you have, the better. Or that is the accepted wisdom. Actually, it is a bit more nuanced. As software evolves, tests need to evolve as well. Some tests will become obsolete, others may be merged. Also, you should only be creating tests that actually achieve something useful. If you incentivize your test team based on the number of tests they create, they may just create useless ones.
Number of test runs
This is a slightly better metric in that it measures how much testing your team is doing. However, it only makes sense if your tests are sufficiently comprehensive and useful. It’s also worth noting that running a full test suite may take days or even weeks. By contrast, your team is probably running smoke tests constantly. This means you need to combine this metric with an understanding of the sorts of tests being run.
One widely-held belief is that the more defects you find, the better your tests must be. This is clearly nonsense if you think about it. If your developers are doing their job, the number of defects should drop over time. So, while it is true that finding defects proves your tests are working, the inverse isn’t true. A lack of defects-found may just mean your software is stable and mature.
Code coverage is often used by developers when they assess the quality of their unit tests. Ask any developer what code coverage they should aim at and they will give you an answer. Realists will tell you 80%, fatalists will say it depends, and managers will probably say 100%. 100% coverage is the mythical ideal that they all strive to achieve. Code coverage in this sense measures the proportion of functions in your code that have explicit unit tests. However, there is a big problem with this measure—it ignores the quality of the test. Bad unit tests still contribute to increasing code coverage!
Automated test coverage
In system testing, your aim is usually to automate as many tests as possible. This is measured by the proportion of your overall test suite covered by automated tests. It is analogous to code coverage. However, it is actually quite a good metric, within limits. It is definitely true that you should aim to automate as many tests as possible. However, full automation (100% test coverage) is impossible in practice. This is because some tests simply have to be conducted manually. That might be doing exploratory tests trying to find new bugs or trying to recreate a reported bug.
The problem with traditional metrics
As we have seen, traditional metrics are not great at measuring software quality. They tend to focus on measuring the testing, rather than the result. They are rather akin to the vanity metrics marketing teams often use. This makes them extremely unsuitable for measuring your return on investment. What is needed is some way to measure the effectiveness of your testing.
In unit testing, this is done using an approach called mutation testing. This involves intentionally adding small errors to your code in order to check whether the unit tests fail as they should. Let’s use a toy Python example to show what this involves. Imagine you have a simple function to calculate the product of two numbers.
You also define a unit test for your function:
We can trivially see that this unit test isn’t very good, despite achieving 100% coverage. It fails to test most cases. For instance, if you change the operator in the function to divide, the test still passes. This is an example of mutation testing—you have mutated the operator, and now your test fails. Thus, we can say that if your unit tests that achieve high coverage AND pass all mutation tests, then they are good quality.
The search for a better metric for system testing
As we have seen, the classic metrics for measuring the effectiveness and quality of your tests aren’t very useful, especially for unit testing. What you need is a metric that measures how effective your testing is at finding potential bugs. You can then gain insights into how efficient your testing is and how cost effective it is. In turn, you can put in place initiatives to improve your return on investment.
Splitting the problem
There are two distinct forms of system testing. Regression testing is about verifying that your new code hasn’t broken the existing product. Progression testing seeks to track down new or previously unknown bugs in new features. The former is done by running through all your existing tests, the latter requires you to create new tests or undertake ad hoc testing. This conflict in the aims means we need to measure these differently.
Measuring the effectiveness of regression testing
As mentioned, regression testing involves using your existing tests to check whether new code has introduced bugs in your product. The dumb approach would be to run all your existing tests every time. However, there are two problems with that. Firstly, many of your tests will duplicate certain steps. This is inevitable when a test has been defined to check for a specific bug. Secondly, most test suites will consist of thousands of tests. Even with the best test automation system, that can mean a full test run takes days. There is a third problem, which is that many of your tests won’t have been automated. Thus, quite a few of the tests will still be done manually.
These problems lead us to the first metrics that we can use.
Test coverage: We already examined this above. However, to make this metric more complete, you need to understand where you are duplicating effort. The ideal should be that every possible outcome in your system gets tested exactly once. Of course, this isn’t as simple as testing each functional element once, because you need to test under all conditions. That means using different combinations of data, deliberately using bad data, etc.
Speed of testing: This is a much simpler metric. It just measures how quickly you can complete all the tests you deem necessary. This will be influenced by the proportion of tests that are automated. However, even when tests are automated, you can take steps to speed up testing. For instance, allowing tests to run in parallel.
Measuring progression testing
Progression testing happens for three reasons:
i) When you release new code and need to create tests for new features
ii) If a user has reported a bug and you need to find the steps to recreate it
iii) When you want to do ad hoc testing to try and find new bugs
Measuring the effectiveness of progression testing is hard. But here are a couple of things to consider.
Speed of creating new tests: Cases i) and ii) above both relate to developing effective new tests. The faster you can do this, the better. In case i), this helps you to ship your new feature faster. In case ii) it helps your developers find and solve the bug quicker.
Ratio of bugs found vs bugs reported: Case iii) is about trying to find corner cases. Unusual sequences of steps that a user might take that break the app. This is much harder to measure directly. Probably the best measure is the ratio of bugs found by ad hoc testing versus bugs found by users in the wild. The ideal would be to get this ratio as near to 1 as possible.
Improving your return on investment
At the end of the day, what matters to a company is to get a good return on their investment in testing. So, you need to be looking to invest in things that improve the metrics mentioned above, while also reducing the overall costs.
Boosting the relevant metrics
First, let’s look at how to improve these metrics.
Speed of creating tests
Test automation requires you to convert as many of your tests to test scripts as possible. It also implies a need to create tests for each new feature. The traditional hands-on approach to test automation requires you to employ Developers in Test (sdets) to write your scripts. But there are other approaches. For instance, test recorders can make it quicker to create simple tests. However, your best bet is to turn to intelligent test tools, like Functionize. Creating an automated test framework from scratch is hard to do well; even when you do it well, maintenance and reporting on these statistics is not a solved problem. Our system is designed to make it really fast to create by leveraging natural language processing or our other robust creations methods such as our test Architect or autonomous test creation methods.
Speed of running tests
Selenium was created before the modern cloud era. This means it isn’t optimized for running tests in parallel. Instead, you were expected to run all your tests on your local infrastructure. Even Selenium Grid doesn’t really solve this, it merely makes it easier to run tests in parallel. Of course all of the infrastructure just to run the tests is closely so most turn to purchasing tools. To really speed up testing, you need to move to the cloud. This allows you to run thousands of tests at once. However, there is a caveat: you need to do proper test orchestration. That means, making sure all the tests are independent and don’t cause each other problems. For instance, giving each test a different set of login credentials. This is an area where Functionize excels.
Ensuring you have good test coverage requires three things. Automating as many tests as possible, checking for gaps in coverage, and ensuring you reduce duplicated test effort. Getting this right requires the leadership of a really experienced Head of Testing, who can work alongside the Product Manager to strategically plan your test coverage. Functionize helps in two ways. Firstly, Coverage Analysis looks at production data and compares real usage of your application with what you are testing. It can help you to identify any gaps, so you can be sure to improve your test coverage. Secondly, you can create new tests from existing functional blocks, making it easy to automate more tests.
Ratio of bugs found to reported
The best way to improve this metric is to find bugs before they get reported. But of course, it’s hard finding bugs before new code is released, especially if you need to spend a lot of time discerning broken tests from true test failures. Functionize tests self heal, so you know that test failures show real issues in your application. Moreover, it is really easy to create new tests using Architect or NLP. This means you can find and fix all the bugs before your end users find them in the wild.
Reducing your costs
The other side of the return on investment equation is looking to reduce costs. There are three big costs in software testing.
- Direct costs for personnel, software, and infrastructure
- Indirect costs of any additional delays to your release
- Indirect costs of any bugs that aren’t found during testing
The first of these is easy to account for, but may be harder to control. The others are hard to account and hard to control.
Sources of direct costs
For many teams, your biggest direct cost is Test Analysts and Developers in Test. Even a small team could easily set you back $1m in salaries and benefits. Thus, anything you can do to reduce your team size immediately saves money. Equally, anything that allows you to employ less skilled engineers and achieve the same results also helps.
Your other direct costs relate to the test infrastructure and any licensed software you use. Large test rigs require dozens of servers, which represent a capital investment and an ongoing cost. Often, these won’t be used to their full capacity all the time, so this can be quite wasteful. Alternatively, you can use virtual servers in the cloud. This is more efficient, but you probably need to use a service designed to run Selenium tests in the cloud. These services are quite expensive. The third choice is a complete intelligent test solution, like Functionize, that includes a test cloud. Obviously, this is also an expense, but it brings many benefits too.
Indirect costs are always hard to account. This is particularly true with testing. However, there is no question that the faster you can release your software, the better. This is why so many companies pursue continuous integration and continuous delivery (CI/CD). The Internet giants like Google, Amazon, and Facebook often release new features several times per day. They take a twin approach to avoiding bugs. Firstly, they invest heavily in good testing, helping eliminate the risk of bugs. Secondly, they often use an approach called Canary Testing. This allows them to release the new feature in a controlled fashion to a few users. If there are no issues, then the feature is rolled out to more and more users until everyone has access. If there is a problem at any stage, they can just roll back to the earlier version.
The hardest thing to measure is the cost of buggy software. Ideally, a bug only affects a few users, and you are able to fix it before it is a problem. But in the worst case, it becomes widespread and triggers a twitterstorm of complaints about how bad your software is. Some companies never recover from this sort of reputational damage.
There is a final indirect cost that can impact test automation badly. This is the cost of ongoing test maintenance. In the worst case, this can eat up half your test team’s time. Test maintenance is required because Selenium creates very brittle tests. Minor changes to your code and CSS can break all your tests. This is because these changes can affect the element selectors used by Selenium. If a selector can’t be found, then the test will probably just fail at that point. But if the order of selectors changes, you may end up choosing the wrong element. Then you may not find the failure until many steps later. Your team then has to go back through all your scripts, updating the selectors, debugging, and checking the test now works.
As we have seen, it is quite hard to measure return on investment for testing. You need to measure some quite abstract things and may not have full insight into all your costs. But fortunately, it is much easier to improve your return on investment. Firstly, you can invest in better automated testing. Ideally, by using an intelligent testing platform like Functionize. Leveraging AI and machine learning can boost your test coverage, help reduce your costs, and increase the speed of testing. Secondly, you can shift your focus towards being more intelligent in how you test. This means better test planning along with closer collaboration between your product and testing teams. If you get these things right, you should be able to achieve the apparently impossible: Tests that are faster, cheaper, and more effective.