The truth is, most software systems don’t have automated tests for everything. What should we prioritize?
You’re shipping a major update today! You have butterflies. Will it work? How can you make sure it does?
When we talk about this question, the answer “100% test coverage” comes up fairly frequently. No doubt, test coverage makes a huge difference. But it’s not always feasible, or even sufficient.
Maybe we inherited an existing application that was built without tests, and its critical functions require an update that cannot wait for us to backfill all the tests.
Maybe our software is written in a framework or under a deadline that makes 100% test coverage untenable.
Or maybe our application includes risks from which automated tests don’t protect us. Automated unit tests do little to warn us about low availability, or unique elements of the production environment, or system states that we didn’t anticipate when we wrote the code.
Maybe these caveat cases shouldn’t exist, but they do. Few software systems have automated tests for everything.
So what should we prioritize to make sure our system works, and how should we make sure it works? We need a system to identify and evaluate the risks we’re facing. It needs the right combination of practicality and thoroughness, and it should help us decide which parts of the system to prioritize in a software verification effort.
Let’s look at an example
Suppose you work for WalletPal, a website that helps people manage their expenses and receipts. The original application stores people’s receipt data in a relational database. Here’s a very high-level diagram of how it works:
Here’s a diagram demonstrating the two different means by which the WalletPal interface would get data: the old way (local database), and the new way (HTTP API):
You’re in charge of rewriting that data layer and performing the cutover. The frontend itself should not change.
How do you verify that this fairly large refactor works without messing up the application or its data?
Make a risk profile
A risk profile allows us to view and categorize the different risks in our system. We can document, quantify, and aggregate this data, but for many systems, I find it helpful to start with a handy diagram.
Let’s go through the diagram and ask the question: What could go wrong here?
For this application, we have several potential failures at different levels of abstraction. For example, the system depends on an API, and that server could crash. Or the response could have a different structure than our client application expects. Maybe we’ll store so many receipts that we run out of memory! Or maybe something could go wrong with the import from the original database to the new one.
How do we prioritize our focus over all these myriad possibilities?
For each thing that could go wrong, answer three questions:
- Would the outcome be catastrophic if this went wrong?
- Is this likely to go wrong?
- If this goes wrong, is it likely to sneak through QA and deployment?
Each of these three questions checks for a risk amplifier, a condition that makes the consequences higher if this thing goes wrong. Let’s label the risk amplifiers on our diagram:
Plan to prevent or mitigate those risks
First, the API of our view might not match up with the structure of our data objects. This is relatively likely to happen. Certainly, it happens often on user interfaces: one piece of data has a field that’s null, and the application design and development did not account for that field being null. This happens so often that programming languages developed in the last ten years, like Swift and Kotlin, incorporate entire language design elements to avoid the dreaded Null Pointer Exception. This is something to watch out for. Either thoughtful language choice or thoughtful automated testing can help ensure that this doesn’t break in production.
Second, we have two interchangeable classes that might have inconsistent APIs. Unless the two classes are bound by an interface, it would be relatively easy for them to get out of step. In addition to the “likely” label, this risk also gets the “catastrophic” label because the system’s core purpose is to fetch and update receipts. If this goes wrong, the system cannot fulfill its core purpose. So we want to use a programming language with interface binding, or maybe an abstract class. If we can’t do either of those things, we could write a long, gnarly unit test that uses reflection to check these two classes’ APIs. Would it be pretty? Probably not, but it would protect us from a relatively likely and relatively problematic failure.
Third, we are migrating data between two databases. This is the kind of thing that rarely goes right on the first try, so we should expect an issue to occur. And once again, we apply the “catastrophic” label because poor data integrity affects Walletpal’s core functionality. But there’s a third, insidious risk amplifier here: the risk that an issue goes uncaught. Development teams tend to overlook this risk amplifier, but it can be the costliest one of all.
Here’s how a data issue might go uncaught
What if that server serves data that looks kind of valid but is, in fact, inaccurate?
Check out this JSON representing a receipt. Everything on there is a perfectly valid ingredient, including the strawberry-flavored Polish pastry listed on the fourth line. But there’s a special character in the name of that pastry.
Suppose that we send this receipt over to the server with the new database, but the parser cannot recognize that special character. Suppose, also, that the parser handles this by cutting off the string and only committing the part that parsed successfully.
In that case, when our purchaser looks at their receipt served from the new database, they might see something like this:
The new server’s version of this receipt only shows the portion of the items that appeared on the receipt before the weird character. It still looks like a receipt, but it’s missing items. That could go uncaught unless QA is looking very closely. This kind of data snafu is an excellent candidate for generative tests that help us identify edge cases and fallbacks.
Think worst-case scenario
Automated testing can help us make sure that we ship stable software without feature regressions. But what if we’re shipping under circumstances that do not loan themselves to 100% test coverage? And what if we’re dealing with potential failures that automated tests do not adequately address?
A risk mitigation approach requires more nuance than, “Write a test for each line of code and you should be fine.” Instead, examine the system as a whole and develop a holistic understanding of its risks.
First, we identify things that could go wrong in our system, both within components and at the seams. Next, we label them with three risk amplifiers: Would they prevent the application from doing its core function, are they likely to happen, or could they go uncaught after deployment?
Finally, make a plan to mitigate the largest risks and communicate those risks to the rest of the team (perhaps over pączkis). Mitigation tactics might include language choices, test harnesses, quality assurance, or other tactics that suit the situation.
This risk-focused perspective, over time, makes it easier for you and your teammates to spot and preemptively address the problems that could become headaches later.
If you’re going to write tests, make sure you do it well. Our white paper shows you how.
by Chelsea Troy
Chelsea Troy writes code for mobile, the web, and machine learning models. She consulted with Pivotal Labs before launching her own firm to focus on clients who are saving the planet, advancing basic scientific research, or helping underserved communities. Chelsea live streams her programming work on NASA-funded mobile and server projects, and she teaches Mobile Software Development at the University of Chicago. Off the computer, you’ll find Chelsea with a barbell or riding her ebike, Gigi. She writes about software at chelseatroy.com.