You’ve all been there. You make a tiny change and suddenly another part of the code stops working. You’re just hours away from an important release and panic sets in. How can changing that one variable cause everything to fall over? But then a colleague remembers an obscure bug from months ago. Can this be the same thing? Now you just need to remember how to test that. This is where regression testing should come into play.
Put simply, regression testing is about making sure old bugs don’t come back to haunt you. Wikipedia states that it’s used to make certain that previously developed and tested software still performs the same way after it is changed or interfaced with other software. More formally, regression testing is about ensuring your software continues to behave in the expected manner after you have made any changes. It is about ensuring there has been no regression in the state of the software and checking that the new code hasn’t triggered an old bug that may not have been fixed properly.
When you are struggling to meet a tough deadline, it is tempting to assume that because your unit tests all passed and a basic smoke test passes, your code is ready for release. However, by their very nature, smoke tests only touch a small part of the codebase and are designed to ensure basic functionality rather than test every detail. By contrast, regression testing is designed to re-run all the relevant tests to make sure there have been no regressions. So, regression testing is just as vital as unit testing and testing of new features and is clearly distinct from either of those.
What are software regressions
Software is notoriously complex, and apparently minor changes can have big (and unexpected) impacts on the rest of the codebase. A software regression is a change in the code that triggers an unexpected behavior. Regressions come in three types:
- Local regressions occur when a change to a piece of code cause that code to stop working correctly. Generally, these will be easier to spot since the team working on that code may spot the error before it gets to testing.
- Remote regressions happen when a change to the code causes a failure in another part of the code that was previously working fine. These are harder to spot since often it affects code from a totally different team.
- Unmasked regressions happen when a new piece of code reveals an existing failure that wasn’t previously being tested for. These are the hardest to find since they are often unexpected and may relate to code that was apparently bomb-proof.
- Regression testing should identify all these types of regression.
Approaches for Regression Testing
There are three approaches to regression testing. Which you choose will depend on circumstances, the size of your QA team, the size of your codebase and the resources you can commit.
- Re-test everything. Sometimes the easiest option is to re-run all your existing tests on the new codebase. So long as those tests were well designed, this will definitely reveal any unwanted regressions. However, this is resource intensive and infeasible for a large codebase.
- Selective re-testing. Often there is significant overlap between tests that were created for specific bugs. By carefully looking over the test coverage it is often possible to find a subset of your existing tests that covers all “moving parts” of your codebase.
- Prioritized re-testing. Many codebases are so large that you have to use a form of prioritized regression testing. This requires a degree of expertise from your QA team in order to establish which tests to prioritize and which to leave. The priority tests cover all the expected code paths and all serious bugs. Once these have been completed you can go back and fill in the rest of the tests.
Often people use a hybrid approach where they first re-run the priority tests, then run sufficient other tests to give full coverage. Between releases they may even leave the test harness to churn through all the remaining tests, just in case there’s an obscure bug that has been missed.
Automating Regression Testing
By its very nature, regression testing is both time-consuming and repetitive. For small codebases, it’s possible to perform regression testing manually. This is hardly efficient. This is when you turn to test automation. Because it is so repetitive, regression testing is ideal for test automation. Indeed, for anything other than a simple project, automation of regression testing is a requirement. Once you’ve created your tests, automating it will also free up your QA engineers to focus on tracking down new bugs and adding test cases for new features.
Before you can automate your regression testing, you need to decide which of the three approaches you are going to adopt. If you are opting to re-test everything, you simply need to set up your test environment to re-run every previous test on each new release candidate. However, it’s more usual to re-run a subset of the tests, either aiming for full coverage or prioritizing the most important tests.
Choosing Test-cases to add to Regression Testing
If you’re not choosing to re-run all the tests, choosing exactly which tests to re-run for your regression testing is a mix of art, skill, and science. Whether you are doing prioritized testing or subset testing, your aim is to maximize the chances of triggering any regression that may have been introduced. You should start by selecting a set of test cases that fulfill the following criteria:
- A mix of negative and positive tests. Good testing practice states you should always include tests that are known to fail, as well as tests that are meant to pass.
- Prioritize user-facing code. Failures that are visible to users are much more damaging than invisible failures.
- Concentrate on code that has been changed recently. But don’t completely ignore old test cases.
- Test for edge cases, especially ones relating to the latest code changes.
Once you have made your selection above, it’s important to check if you have covered all your code base. It’s also worth checking for duplication of tests at this stage. The aim is to end up with the Holy Grail of the minimum set of tests that achieves complete code coverage while requiring the minimum of resources and time.
This is where good documentation will be your friend. Well-documented test cases clarify the exact scope of the test is, as well as the expected outcome. (There’s nothing more irritating than seeing a test case apparently fail, only to be told by an engineer that that is expected behavior!) By comparing the matrix of test cases against the code you should be able to choose the best set of tests.
Running the Regression Tests
Having selected your full set of tests for regression testing you can then set up your test environment, be it Selenium or something you’ve rolled yourself. It’s important that if you are using a Continuous Integration/Continuous Testing approach, regression testing must be done on an isolated branch of the code to make sure you aren’t introducing new bugs during the test run. If you have a specific release candidate, this is relatively easy as that code should be quarantined during pre-release testing anyway. However, if you also do frequent releases, it may be necessary to snapshot or git-tag your code, so you know what the state was if you find any regressions.
Before the advent of autonomous testing, a previous requirement for creating good regression testing is making sure you have scripted your tests well. If your system relies on being in a certain state for a test, try to sequence tests to minimize the number of times you have to update the state. Make sure your test suite outputs your results in an easy-to-interpret fashion. It should be easy to identify which cases failed, and what the system was doing at the time. Sometimes you will see apparent failures that actually turn out to be a result of a misconfiguration. It should be easy to see this from your output.
Maintaining your Regression Tests
As with any tool, automated regression testing is only as good as the people using it. And like any good tool, it’s essential to look after it and maintain it. As new test cases are created, thought should be given to whether that test needs to be added to the regression tests. Whenever you fix an actual bug in your code you should ask yourself “does this bug need to be added to the regression testing?” In most cases, the answer to this will be “Yes.” However, you also need to add tests that verify the behavior of any new code paths.
The emergence of AI and machine learning-based automated regression testing demonstrates many fruitful new methods. These include approaches to adapting data collected during regression testing to the aggregation of big datasets. The end result is automation which can self-heal and autonomously create their own new test cases. Regression testing is one of the most vital parts of software testing. However, it’s sometimes seen as the poor relation to bug fixing, unit testing and testing of new features. By investing time in automating your regression testing, you can help ensure your software is always free from regressions. Creating an efficient suite of regression tests will take a bit of effort, but this will quickly pay off if it’s been done well. Take a look at Functionize’s autonomous testing platform and learn how your team can dramatically speed up the time of authoring and maintaining your test suite.