Production Testing - what, why, and how?

Testing used to finish before release. Now production testing is integral to any large system. We explore some concepts and show how Functionize can help.

Testing used to finish before release. Now production testing is integral to any large system. We explore some concepts and show how Functionize can help.

January 7, 2019
Jon Seaton

Elevate Your Testing Career to a New Level with a Free, Self-Paced Functionize Intelligent Certification

Learn more
Testing used to finish before release. Now production testing is integral to any large system. We explore some concepts and show how Functionize can help.

Ensuring your production systems are performing

Avoiding unexpected problems with new releases

Testing used to finish before release. Now production testing is an integral part of any large system. We’ll explore some of the concepts used in production testing and show how Functionize can help you do this intelligently.


Once upon a time, software development was easy. You planned your software, built it, tested it, then released it. Each of these functions was largely discrete and they were performed sequentially. Nowadays, of course, things are very different. For many apps, development is agile, meaning there is an ongoing process of refining and developing the code. CI/CD means new code is continuously pushed to production.  All this means that testing starts early on and continues right through into production. Clearly, testing in production is different from traditional unit, integration and acceptance testing. In this blog, we look at some of the ways production testing can be used and describe how Functionize’s products can help.

Production testing differs from other testing in that it isn’t simply about identifying bugs (though that’s important still). Production testing falls into two categories. Performance monitoring answers questions like “is the new code more efficient than the old?” or “does the system scale properly?” Comparative testing (which covers canary testing and A/B testing) answers questions like or “does the new code improve the UX?” or “is the new code more performant?”. Let’s look at each of these in more detail. 

pre production vs production testing
Cindy Sridharan's apt illustration on Medium


Performance Monitoring

Performance monitoring is critical for modern production systems. It allows you to spot problems before they become serious and gives you a proper understanding of resource usage.

Every modern production system should incorporate performance monitoring. This is especially true if you are relying on any form of containerization or PaaS for your backend. Performance monitoring covers a number of different things. This includes several things. Basic “liveness” checking is about spotting if any service has died. Responsiveness monitoring is checking things like DB queries are returning quick enough. Monitoring the overall load is also critical as this affects costs and planning.

Liveness monitoring

In any well-designed system, business-critical systems should be included in some form of disaster recovery plan. This may simply take the form of frequent backups with the ability to spin up a new version of your system in a different location, or it may be a more dynamic system that is able to failover in real-time to a backup system. In either case, what is critical is the ability to know when things are going wrong in your system. This can be as simple as periodically pinging your server or it can be a proper integrated liveness check, with proactive monitoring of all individual service endpoints.


For many consumer-facing applications, latency is critical. Amazon found that an increase of just 100ms in response times cost 1% in sales. Given their turnover, that amounts to $17bn per second of latency! Studies by Google back this up, showing a direct correlation between the speed of getting search results and the number of searches a user completed. Sometimes the causes of latency are external to your system (e.g. latency in the wider Internet). But more often they are caused by things like databases being slow to respond, either because they are overloaded with requests, or because you have designed them badly for the volume of data you handle. Monitoring this sort of responsiveness is also critical because it can be an indication that something else is starting to fail (e.g. you may have a failure in some service that shows up as an increase in latency for requests).

Monitoring resource usage

One of the biggest reasons companies use containers or Platform as a Service providers is the ability to auto-scale to meet unexpected demand. (To a lesser extent this is also done in IaaS, but often relies on you setting up your own system). The problem is, in general, your provider will charge you according to the number of containers or similar that you consume. Keeping track of this is essential in order to prevent unpleasant shocks at the end of the billing cycle.

Comparative Testing

The term Comparative Testing comes from the world of testing physical products. However, it applies equally well to computer software, especially user interfaces. For software, the main types of comparative testing are A/B testing (a classic part of traditional product testing) and canary testing.

A/B Testing

This is a fancy term for comparing two versions of a feature in order to decide which one you want to release. Traditionally it was about comparing usability, but you can also compare other aspects like performance, resource usage, and responsiveness. With major new features, it can pay to release them to a subset of your users and garner feedback on the experience. This is one reason why Apple opens up its beta testing program to so many people.

Canary Testing

This has similarities to A/B testing, but it’s more about identifying unexpected bugs and issues in your new code. Canary testing is based on the idea of the coal miner’s canary who would collapse before low levels of oxygen became dangerous for the miner. Likewise, in canary testing, your new code is released to a small subset of users and you monitor whether they are experiencing negative issues. Amazon, Google, and Microsoft all use this for their large-scale IaaS services. Indeed, they often do it in stages, releasing to one rack, one cluster, then a whole data center, then an availability zone, etc.

How Functionize helps

Functionize can help you with production testing in a couple of ways. Firstly, we offer an intelligent canary testing approach which uses AI to autonomously detect anomalies and can be used as part of your automatic CI/CD release process. Secondly, our load testing can be used to test your production monitoring systems. And thirdly, Functionize builds detailed performance monitoring into our test system.

Canary Testing

Functionize’s autonomous canary testing was announced by our CEO, Tamas Cser, at UCAAT (the leading industry conference). The system consists of three main parts, user journey tracking, prediction, and anomaly detection. The first stage is to autonomously identify all the user journeys through your system (things like log-in, reset password, make purchase, etc.). Having done that, the next thing is to be able to predict what a user will do next after any given set of actions. This uses the data generated in the first step, and our system can predict next steps with 85% accuracy. Finally, you need to be able to identify and compare similar user journeys between the existing code and the new code (or canaries). If there is any spike in responsiveness or a similar anomaly this will be flagged and the new code can be rolled back.

Load testing

One of the issues with performance monitoring is being able to see how it performs when things go bad. Clearly, it’s not a good idea to do this in your live production system. But you also don’t want to test at the scale of your staging servers. So, the obvious time to do this sort of testing is when you do your load tests. Unlike traditional, dumb load tests, Functionize allow you to do load testing with realistic user sessions. These are generated from multiple, diverse geographic locations. This means that you will be able to also test how your performance monitoring system is working. As you ramp up the load towards overload you will be able to check that your monitoring identifies the problem correctly. You could even use it to trigger and test your disaster recovery plan!

Test performance monitoring

The other more subtle way to use Functionize is to leverage our built-in performance monitoring tools. These are designed for use in testing to compare whether code is working well or not. However, they can also be used to periodically test your production system to check that things are still working as expected. They could also be used to check that your full monitoring system is accurate. The advantage is that they are seeing your system as an end-user will, whereas your own monitoring is generally internal to your system.

As we have seen, production testing is an essential part of modern systems. Without it you risk missing warning signs of problems and can’t tell what is going on with your system. It can also allow you to test new features before full launch, and to check new code in a real-life setting. Using Functionize’s autonomous test suite can help you with all these things and more! Check it out by clicking below.