Since the 1950s, developments teams have been asking, “Why waste any time in premature testing when we know that the product will change substantially before we reach dev-complete—and we’ll need to rerun many of the same tests?”
The thinking is: Let’s save the effort that we’ll spend addressing upstream test failures and continue building out the new features. Then, we’ll test it at the right time. This view is common and drives many product teams towards late testing, in which most testing is done at the end the development cycle/iteration—or at the end of the product pipeline.
Many teams will eventually look to an economic view of product development. The aim is to minimize product development costs while maximizing the value of the deliverables. To those that hold strictly to this conventional view, it is for them a waste of effort (and therefore cost) to test too early because it increases investment in the product while providing no evident value.
Insisting on an economic gain—getting a healthy return on investment—is nearly always the right approach. But too many teams are missing a major piece of the puzzle. Let’s look a bit more closely at the motivation here. These teams are saying this to themselves: Each time we test, we incur overhead. So, let’s reduce overhead by testing less frequently. Since we have to test at the end anyway, let’s just defer testing to that point. Stop and think for a moment. These teams are forgetting the overhead they incur to fix bugs. More on that below.
Product developers pay a cost each time they move through a process. Running a testing cycle incurs a transaction cost that includes (a) deployment of the test candidate in the form of a software build, (b) populating test data (c) and running regression tests. The transaction cost is well known, but the cost to test new functionality is variable—dependent on the size of the new functionality in the test candidate. The amount of new functionality for which a team waits before running a test cycle is the batch size.
A unit batch size is set when a test cycle is run for the smallest set of new functionality. A batch size of a day would mean running a test cycle every day; a batch size of a week means that a test cycle is run every work week. A release batch size would only run tests once per release cycle. If a team constrains itself so that it only considers the transaction costs plus the cost to test new functionality, it will readily prefer to use a large batch size—which seemingly keeps costs down.
Good things don’t come to developers who wait
Here’s a common software development story. Consider a team that is working on a complex feature, such as a file system layer that bridges a gap between user space and kernel space. They build a simulator to test some of the code. But some of the code requires the presence of actual appliances in the lab to perform proper testing. Other code requires running the appliances over a simulated network (WAN).
All team members agree that no code gets checked in without rebasing with the main branch, running against the simulator, and fixing any issues. Despite the fact that this is complex code, fixing most of our defects was pretty easy since we could pinpoint them quickly. Also common in such stories is a significant problem. Defects got past the simulator—undetected. Those bugs remained hidden, waiting for the test cycle to arrive. More bugs joined the game of hide and seek as more code was checked in. Eventually, when it was time to test, all those critters came running out into broad daylight.
Why does this happen to many teams? It’s actually a classic product development scenario. By the time testing begins, the developers move on to other tasks. And much is forgotten. The longer it takes until the defect surfaces in testing, the more the developers forget about the context, and also the rationale that went into the design and construction. When the developers begin bug fixing, it is challenging to reacquaint with the context in which the bug has been found. It’s time to roll the sleeves back up again, figure out what’s going on, and understand why the code has been built in a specific way. Not only is it difficult to find the root cause of the defect, the developers must build a solution—often, take up the risky task of rebuilding the code—into a good solution that does not break anything else.
There is another major concern here: As more check-ins and changes occur between the last successful test cycle and the next one, it is more challenging to find the change that corresponds to a particular defect. Without this key information, it is much more difficult to figure out the problem and apply a remedy.
The cost of fixing bugs downstream
In the classic book Code Complete, Steve McConnell makes a strong case about the cost to repair defects. He popularized the rule of thumb that a defect introduced upstream generally costs 10 to 100 times as much to remove further downstream—compared to the cost to remove the defect close the point at which it is introduced. This rule of thumb has come under attack, as some people claim that software defects aren’t as expensive today. Costs don’t increase as quickly as before. An ounce of prevention is no longer worth a pound of cure.
What is missing from the narrow economic view is the high cost of bugs that lurk undetected in the shadows. This cost can be very high with complex software products. Those who recognize these costs call it a holding cost, or cost of delayed feedback. To be entirely realistic, it’s important to fully accept that there are both transaction costs and holding costs for any new feature development. Understanding that both of these costs are always part of any release cycle, it is easy to agree that pushing testing downstream is not the best economic decision. Even if your team has high transaction costs on each test run, it’s sensible to work toward testing in smaller batch sizes.
Batch size can be set so small that the returns on investment will diminish. Very small batches can also be tedious. The chart explains why moving to small batch testing makes no sense for an organization that has done no preparation or reconfiguration. For development teams that want to begin on this journey, the first question to answer is how to reduce transaction costs to enable earlier testing and reduce the overall product development costs.
Unquestionably, investments that your team can make in continuous integration and testing automation are very critical to improving the economics of software product development. It is worthwhile to find ways to implement automation for as much of your testing as possible. Then, you can reap the rewards of early feedback and defects that are much less expensive to remedy. Even better, your team will build better quality products and improve customer satisfaction.
Doubtless, there will be some test types that are not automatable such as user experience or exploratory testing. For these, you can explore ways to run more frequent test cycles—with greater efficiency. Ideally, you could reach the point at which most high-risk regression testing occurs within hours after dev-complete.
All this makes for lower costs and overall economic improvement. Donald G. Reinertsen encapsulates this well in his well-known diagram. Here we can see the direct linkages from smaller batch size to smaller changes, fewer outstanding defects, faster cycle times, and better economics.
Software development continues to advance by periodically reexamining questions that some people think have solid answers. The correlation of latent defect management and cost increases are inherently part of any software development effort. It doesn’t matter whether the team follows conventional waterfall methodology or an iterative, agile approach. Requirements, specifications, design, code, test cases, and documentation will always be interdependent.
Shift Testing Upstream
As this article comes to a close, here’s a hint on some upcoming features in our blog. Many software product companies see the product pipeline as increasing in value as work products move downstream. Not only is it beneficial to decrease the batch size, but it’s also important to shift testing efforts leftward as much as possible.
At a minimum, shifting left means pushing test design and execution as close to development tasks—as much as you can feasibly accomplish. This is really the essence of the buzz about agile testing practices. But even if your team doesn’t do agile development, it’s to your great benefit if you can reconfigure your pipeline so that testing happens as early and as often as possible. Your team is sure to achieve higher quality and greater velocity.