BLOG

DevOps: Make the most of your error budget with AI-powered end-to-end testing

DevOps is an essential part of modern software delivery. Virtually all applications are now delivered first and foremost over the web. As a result, keeping your service online is mission critical.

DevOps grew up in parallel with cloud computing—as companies migrated their services to the cloud they needed a new breed of SysAdmin. At Google, these became the now-infamous SREs or site-reliability engineers. In other companies, the portmanteau term DevOps began to find favor. Either way, the aim was to focus on maximizing the uptime for your services.

What is DevOps?

Most of you will have some ideas about what DevOps means. But to avoid confusion I want to give a clear definition here. DevOps is an approach to running your production systems that reduces the traditional gap between your developers and SysAdmins. One of the best definitions comes from Bass, Weber, and Zhu:

“a set of practices intended to reduce the time between committing a change to a system and the change being placed into normal production, while ensuring high quality.”

Importantly, the key performance metric for any DevOps engineer is up-time. They often pride themselves on achieving better than “five nines” reliability. Which means keeping your systems online 99.999% of the time. To put that in perspective, that means you are allowed a total of 5m15s per YEAR of downtime.

The fabled error budget

DevOps teams often convert their reliability target into something called an error budget. Basically, they tell the developers that the code they push is allowed to cause a maximum of x seconds of downtime, where x is the error budget. Once they have spent their error budget, the DevOps team won’t allow them to push any new features for some time. The aim is to focus the developers’ minds on delivering reliable software. It also encourages developers to liaise with DevOps and check the possible implications of any code changes they are making.

The on-call pager

Clearly, achieving just a few minutes of downtime per year requires 24/7 availability, 365 days per year. As a result, DevOps teams are on permanent standby to react to any problems. In bigger companies, there will be a shift system. But smaller companies can’t afford so many staff. Instead, there is always at least one engineer on “pager duty” ready to react immediately to any problems. Another favorite trick for DevOps teams is to make the developers share this pager duty. If a team submits a pull request for a new feature to go into production, then they need to take responsibility for it while it launches. This actually makes a lot of sense, since they are the people that really understand how this feature works. Again, this focuses the minds of the developers. 

The role of testing

DevOps implicitly covers QA. After all, QA is traditionally the gatekeeper between developers and production. Without QA sign-off, a new feature shouldn’t go live. However, we are increasingly seeing the line becoming blurred. CI-CD means that many features go almost straight from development into production. Teams rely on continuous testing to hopefully catch any serious bugs. Then the DevOps team may well use an approach like dark launching or canary testing. 

Testing in production

Dark launching and canary testing are two forms of testing that happen once a feature is in production. Dark launching involves releasing a new feature on the backend, but hiding from users in the frontend (hence dark launching). The aim is to verify that the backend remains stable with the new code before you start to actually stress it. 

By contrast, canary testing involves releasing a new feature to a limited set of users at a time. You then compare their user experience with that of users still on the old code. If there is a problem, you immediately roll-back the release. If things seem OK, you steadily release to more users. Big cloud operators like Google often do this very incrementally. They will release to a few users, then to all users in one cluster, then a whole availability zone. Once they are sure things are good, they release it to the whole data center and then the entire region. Then they repeat the process in the next region until all systems are updated. 

Continuous testing

CI-CD is about releasing features as soon as they are ready. This is essential to allow companies to remain agile and keep up with the needs of their users. However, successful CI-CD can only happen if you are continuously testing every build and identifying bugs as early as possible. In the ideal world, testing should be part of the process before a new pull request is approved. This is already routinely done for unit testing but it’s done much more rarely for system testing. Instead, we tend to see companies rely on smoke testing to identify the more obvious bugs. Then they try to complete their set of regression tests as regularly as possible. The problem here is you are increasing your risk of allowing a bug to slip through.

How can AI test automation help?

For years, software teams have known that test automation is a key part of the testing mix. With automated testing, you can run your regression tests 24/7. That way, you greatly reduce the time to complete each round of testing. However, traditional test automation leads to problems with test debt. Put simply, the more tests you automate, the more time you end up spending on maintaining those tests. In turn, you spend less time on automating new tests. The end result is your team can never break the 50% automated test coverage barrier. Fortunately, AI can solve this problem. Moreover, AI-powered test automation brings a number of other key benefits for DevOps teams.

Tests that evolve with your code

Traditional test scripts regularly break as a result of changes in your UI or underlying code. It’s not that the scripts are wrong or badly written. It’s a fundamental flaw in Selenium, the underlying system that nearly all test automation relies on. The issue stems from how Selenium chooses which part of your UI to interact with in each test. Each time your UI changes, the selectors it uses also change. As a result, that test fails. The only solution is for your test engineers to go in and edit the code to update the selector. 

AI, or machine learning to be precise, offers a much better solution. AI-powered test platforms like Functionize learn how your UI actually functions. Elements are selected intelligently based on the millions of data points recorded every time you run the test. This means the tests evolve with your application without the need for routine test maintenance. As a result, you can incorporate these tests as part of your CI-CD pipeline, secure in the knowledge that any failure is actually due to a bug.

Tests that run early and often (with CI)

The purpose of test automation is to find issues with your system early enough to fix them before release. Some QA teams build automation but manage tests separately from CI builds kicked off by developers. But Functionize features multiple ways to integrate test orchestration alongside your CI pipeline. You can use our native CI integrations, use the CLI to kick off tests, or use APIs to build your own integration. Ideally, you run your regression suite along with CI tools like Jenkins or Bamboo. That way, functional tests are executed with unit tests every time there’s a new build. Running tests so frequently allows you to find bugs earlier. In turn, that makes it easier for developers to diagnose the root cause. 

Tests that give traceability for bug tracking and test management

QA teams are part of a much bigger development organization, so visibility and collaboration with developers are of paramount importance. When testers detect issues with their application, use tools like Jira to log bugs so that developers know what needs to be fixed. Traditional scripted automation tools lack native integrations. That means integrating test automation with bug tracking tools requires effort to set up and maintain over time. 

Functionize ensures integrations to Jira and test management tools like TestRail and Xray are included in the subscription. Anyone in your company with access to these tools can easily click the Functionize orchestration link to view details from that test execution. From the Functionize side, you can easily see the latest test execution results and the bug resolution status. This means that both testers and developers have complete visibility into the bug reporting and validation process. Not to mention that tests which pass provide evidence to show that the bug was fixed.

True end-to-end testing

Scripted testing also makes it really hard to test applications end-to-end. That’s because the script is only really able to see what is going on in the part of the application you created. Embedded 3rd party content is largely invisible to it. It is also really hard to write scripts that cope with dynamic content. Then there is the issue of testing generated content such as PDFs. Put simply, that is impossible for a test script. The upshot is, most end-to-end testing gets done manually.

Functionize takes a different tack. Our AI can see your entire application, not just the part under your control. Better yet, it is smart enough to understand the difference between a bug and something that changes constantly. That means it will know to ignore embedded ads, but will correctly identify when an ad fails to load. We also include the ability to test complex workflows, such as 2-factor authentication. Not to mention visual checking of the entire UI and verification of downloaded files. 

Proper testing in production

We already mentioned how important production testing is for DevOps. Canary testing and dark launching are great, but they rather ignore the end-user experience. So, imagine if you could also continuously test how end users perceive your application? Getting alerted as soon as things start to slow down or behave unexpectedly. What if you could do true localization testing by launching live tests directly from every location where you operate? And what if you could see exactly how users interact with your application and use this to define what tests are needed? Thanks to AI and our global Test Cloud, Functionize lets you do just this. 

Use Functionize to maximize your error budget

As you probably realized, we think Functionize is a no-brainer solution for DevOps. Just look at all the ways it helps maximize your error budget:

  • Potential bugs are identified as soon as new code is pushed. That means you can be more certain the new code will really work.
  • The whole application is getting tested, not just parts of it. So, you will see if there are any issues caused by an unexpected change in some 3rd party service you rely on. That’s a big difference from having to wait for users to start reporting the problem!
  • You can also spot issues before your backend monitoring picks up on them just by running tests against your production servers. If things start running slow, it can be a flag that there’s an issue you need to investigate.

All this equates to a more reliable application and a much smaller chance you need to use your error budget to handle an application issue. To learn more, simply book a demo with one of our team.