Ask what code coverage should devs aim for, and many will say 100%. In this blog, we explain why this is a mythical goal that can lead to a false sense of security.
Ask any group of developers “what is the ideal code coverage?” and you will get many answers. Realists will say “80% or more”, or will talk about “continuous improvement”. Fatalists will say “it depends”. Managers will probably say “you should always aim for 100%”. For many people, 100% coverage is the mythical ideal that they all strive to achieve. After all, if 100% of your code is covered by unit tests, all that code must be working, right? Sadly, as we will see, 100% code coverage can lead to a false sense of security. In this blog, we look at why 100% is a poor target, how to improve your tests, and what this all means for test automation.
Code coverage is a remarkably simple metric. Take the number of lines of code that have unit tests associated, divide by the total number of lines of code and multiply by 100. Of course, in practice, systems that measure code coverage are clever enough to ignore non-functional code such as boilerplate, comments, etc. But essentially, code coverage just checks that you have created at least one test for each function within your code.
This is probably one of the most debated topics in software testing. Certainly when it comes to unit testing. Ask a group of 10 developers this question and you will get 10 answers. However, ask a manager and you will hear: “You should aim for 100%”, or “80% is the absolute minimum”. The result is that in many projects, there is a drive to achieve some arbitrary (and usually high) code coverage. This is often used to accept or reject a pull request. If the PR increases code coverage, it must be good. If it reduces it, it must be rejected.
The problem is, code coverage is a remarkably dumb metric. In essence, it assumes any test is a good test. It makes no attempt to assess whether you have tested for every scenario. As a developer, it can be very tempting to assume that good code coverage equates to good testing. Let’s use a simple Python example to show what can go wrong.
def product(a, b):
total = a * b
assert product(1,1) == 1
Clearly, in the above example, test1() achieves 100% code coverage. However, it isn’t a very good test. For instance, if the operator in product() was changed to divide, the test would still pass! Yet, you have been lulled into a false sense of security because you have 100% code coverage and all your tests pass.
Fortunately, for unit tests, you can turn to mutation testing for help. Essentially, this involves making simple mutations to functions like changing operators or changing absolute values. If the test correctly fails after the mutation, then that mutant is killed. But if it passes, then it shows that that test isn’t good enough. In the above example, the test itself can be improved, for instance by changing the values being tested:
assert product(2,3) == 6
Now, the test will pass whatever the operator is changed to. NB there are still potential mutations that may break this test, but they are more complex ones.
While this approach can help, it doesn’t scale well. For instance, PIT (a well-known Java mutation testing tool) lists almost 30 mutations it will test. Every one of these has to be applied in every combination for each functional line of your code. Moreover, you need to test any dependent functions. So, as you can see, this results in hyper-exponential growth in the number of tests you need to run.
Realistically, there is no way you can complete a full set of mutation tests on a large software project in one run. So, your best bet is a compromise. Try to increase code coverage with your tests, but concentrate on tests that pass mutation testing. Whenever you have time (e.g. overnight or at weekends) use a mutation testing tool, and (importantly) preserve the resulting runtime state. That way, the next time you can restart from where you left off to and steadily increase mutation testing coverage. Above all, try to encourage your developers to understand why they have to add tests. It isn’t about chasing some arbitrary code coverage figure. It’s about verifying that their code does exactly what it is meant to.
In automated testing, we have an exact analog to code coverage, namely, test coverage. The simplest definition of this is the proportion of your total tests that are fully automated. On the face of it, you should be aiming for 100% test coverage. If every test is automated, you can run your tests 24/7 and be sure that all bugs and regressions will be identified. Or can you? There are a few problems here. For starters, not all tests are suitable for automation. We discussed this recently in another blog. For another thing, you still face the issue of test quality. How can you know if you are actually testing everything you should do? Then there’s the issue of test maintenance – more on that later!
Sadly, there’s no simple answer to this. It will depend on your circumstances and the nature of your application. However, a good starting point is to aim to automate as much of your regression testing as you can. This means that when you have a new release candidate you can quickly verify that it hasn’t broken any existing functionality.
Sadly, with Selenium, you will find it a huge challenge to get above 50% coverage. Firstly, it takes a long time to create good Selenium test scripts. Especially, given the need for cross-browser and cross-platform testing. Secondly, if you use Selenium for test automation, you will face the test maintenance issue. In effect, this means that every change you make to your application requires most or all your tests to be updated. You can reach a Catch 22 where the time needed to update all your tests becomes greater than the time you save by automating them. Thirdly, if you have really large automated test suites, they can start to take too long to run on your in-house infrastructure. And finally, you may start to run into issues with tests affecting each other. For instance, if two tests simultaneously try to log in with the same user details.
There isn’t really any equivalent to mutation testing for test automation. The equivalent to this is ensuring that you test all happy and sad paths for every user journey. So, for a login test, this means testing with correct name and password, correct name, wrong passwords, wrong name, correct password, etc. You can get help with this by using test management tools. These map all the combinations and create test plans for each one. You can also work with your product team to check that you are testing everything they want. Finally, you can instrument your production environment and check if real users are actually behaving as expected. You may find your users taking novel routes through your application!
There are a few other things that may help. For a start, all testing should be cloud-first. Pretty much every application is based in the cloud for scalability, reliability, and efficiency. Testing shouldn’t be an exception. You can also move away from Selenium and start using intelligent test agents, such as Functionize. Our tests self-heal, meaning you can cut test maintenance by 90%. We also allow you to create tests from plans written in plain English. This helps bridge the gap between your product and testing teams. In turn, this makes it easier to test every user journey in your application. Plus we offer advanced orchestration that helps you to avoid gotchas like multiple logins from a single test user.