What is canary testing and why is it so useful?

Canary testing lets you validate a new feature by testing it in your live environment. This can show up server-related issues that are missed in testing.

Canary testing lets you validate a new feature by testing it in your live environment. This can show up server-related issues that are missed in testing.

November 15, 2021
Tamas Cser

Elevate Your Testing Career to a New Level with a Free, Self-Paced Functionize Intelligent Certification

Learn more
Canary testing lets you validate a new feature by testing it in your live environment. This can show up server-related issues that are missed in testing.
Modern applications rely on frequent launches of new features. Naturally, your QA team will do their best to test the new features but how can you make sure the launch goes smoothly? A common approach is to use canary testing.

Read on to learn what it is, how to do it, and why it is so useful. 

What is canary testing?

Canary testing involves releasing the new code to a small subset of your users. You then compare their experience using the application with the users on the older version. In effect, this is like coal miners using a canary to test for bad air in the mine. If the canary users have problems, you are able to roll back the changes with minimal impact on your overall user base.

This is in stark contrast with the traditional approach to testing. However, it’s definitely not a replacement for proper regression testing.

Why is it so useful

Originally, most microcomputer software was delivered as stand-alone applications that were installed on a user’s computer. Developers only released updates every few months. Before each update, the development team would invest heavily in testing the software for performance, security, and general quality.

However, as agile development becomes more commonplace, companies are increasingly moving away from red-letter day launches for new software features. Nowadays, it’s business-as-usual to update applications regularly, sometimes even daily. Moreover, most applications are web-based or work on mobile devices, relying heavily on their backend. To stay ahead, developers need to constantly update their applications, pushing out incremental changes with every release. This has driven the growth in continuous integration and continuous delivery (CI/CD) and the associated changes in test methodology.

How do you do canary testing?

The first step is to set up the system by launching a set of back end containers or servers to run the new code. As new users arrive, your load balancer deflects a percentage of them to this “canary cluster”.

To discern how well (or poorly) the new version performs, DevOps engineers carefully monitor the servers to identify issues. For instance, you might monitor the compute load, and compare it to the servers running the old code. If the load increases substantially, you know that’s a potential issue. Equally, if you see a much higher rate of I/O, that might also indicate an issue. 

Because only a subset of users are affected, this real-world testing process doesn’t cause problems for everyone. If the testing team spots any issues, it’s easy to roll things back. It is as simple as redirecting all connections to the old servers. If all seems to be going well, you can progressively migrate more customers to the new backend.

How to run canary tests

You can easily implement and automate canary testing with the help of tools such as Spinnaker to assign a suitable percentage of users to the new code. A typical test assigns about 5% of users to the new code. Then, if there are no issues, the DevOps team can steadily ramp up the user percentage until everyone is on the new code.

According to its technology blog, Netflix further refines this process. Netflix doesn’t compare the performance of the canary cluster with its existing production clusters. Instead, the company creates a new instance of the existing cluster alongside the canary cluster. This so-called baseline cluster is the same size, so the performance can be compared directly. This means the results are compared against a clean setup with no potential issues caused by long-running processes in the production cluster. 

Canary tests diagram

One important caveat: you need to be aware of any expected impact from the new code. It may be that the new changes are known to increase the I/O in the system, in which case seeing increased I/O does not indicate a problem. In other words, identify which metrics matter for each test and then define the acceptable parameters. Of course, some issues such as crashes, stuck processes, or timeouts are always signs of a problem with the new code. In which case, you need to roll back pronto!

How automated testing helps

You might think that canary testing lets the development and testing teams get away with doing less testing. But that absolutely isn’t the case. First, you are utilizing your real users as test subjects, so you want to ensure no one has a negative user experience. Second, there’s no point in doing this if the new features don’t work properly. This means you need to view this as an addition to all your other testing.

However, the whole point is to enable you to release faster and more frequently. So, you need to make sure you use the best test automation possible to remove any roadblocks to release. 

But as you know, automating an entire test suite is challenging, even if you can afford to hire a large team of test automation engineers. Fortunately, Functionize greatly simplifies test automation. Our suite of tools makes it quicker and easier to create new tests. They speed up the process of running and analyzing test results. And, unlike most other test automation frameworks, we help you eliminate test debt. One final benefit — all your tests run in real virtual machines, meaning you can run them against your production servers. This means you can back up the results of the canary testing with targeted testing using your predefined test cases.

What’s next

Canary testing is widely used for testing new features in complex applications. It is ideal to test the performance of an application back end. It lends itself to automation when you combine it with DevOps tools. If you also integrate it with smart test platforms like Functionize, you will end up with a complete testing solution that delivers the best quality for your users.