Data-driven testing – when does it help you?

Data-driven testing is increasingly common in test automation. Here, we explore common scenarios and explain the impact it can have on your testing efforts.

Data-driven testing is increasingly common in test automation. Here, we explore common scenarios and explain the impact it can have on your testing efforts.

November 12, 2019
Brad Kallaway

Elevate Your Testing Career to a New Level with a Free, Self-Paced Functionize Intelligent Certification

Learn more
Data-driven testing is increasingly common in test automation. Here, we explore common scenarios and explain the impact it can have on your testing efforts.
Data-driven testing is increasingly common in test automation. Here, we explore some common scenarios for this approach and explain the impact it can have on your testing efforts.

Test automation is critical for efficient software delivery. Almost every test team now includes test automation engineers. Automated tests run 24 hours a day, 7 days a week, meaning they can get through huge numbers of tests relative to manual testing. Often, when speaking to potential customers, I hear of teams that are running tens of thousands of tests. Doing that many tests manually would be nearly impossible, even with unlimited time and budget.

However, test automation, if implemented badly, can also be inefficient. Automated tests can be quite repetitive with the same basic actions repeated using different data. For instance, imagine testing the address fields on a registration form. If your application needs to work in multiple regions, you need to be able to test a huge range of different valid and invalid addresses. For instance, you may need to check many variations of ZIP or postal code from the US 5 digit code to the UK complex letters and numbers format SW1A 1AA. Here, it makes sense to have a single test but with multiple test data for every permutation you need to check.

How data-driven testing works

Data-driven testing is designed to improve the efficiency of this sort of repetitive test. Rather than create different registration flow tests for every country, you create 1. Then you pull in the address data you will use for the test from an external data source. When you create the data you also provide the expected result for each set of data. The system then uses this to assess whether the test has passed or failed.

How data-driven testing works - diagram


How to implement data-driven testing

Data-driven testing is slightly different from test automation. In a standard test, you include any required data in the test itself. By contrast, in data-driven testing, you connect your test to a data source. You can use many different data sources, from simple CSV files, through XML, and even full-featured databases like MySQL.

Choosing the data source

For simple scenarios, you may be OK just using some form of text file like a CSV file. This would work well for instance if you have sets of username + password you want to test with. More complex tests might need you to use XML so that you can add extra information to the test data. In large automated test suites, you may be better off using a proper database. This is particularly useful when you want to orchestrate your tests.

Connecting the data source

The important thing is how to link the data source to the test. With Selenium test scripts this can be quite easy. Let’s say you are writing a Selenium test in Python. You can import your test data from a CSV file. Then you create a loop that runs through each entry in the data source. For more complex scenarios, it may be better to use XML as a data source. Again, you can import this into your script and parse it to extract the data and the expected result.

Assessing the result

One of the big challenges is how to assess the result. One way would be to check against all the possible outcomes using a case statement. You can then compare the actual outcome with the expected outcome. If the result is more variable, you might want to use XML to provide a richer description of the expected outcome. At the end of the day, this is where the skill of your test automation engineers will show itself.

When data-driven testing works best

Data-driven testing is especially well-suited to scenarios where the same test steps need to be repeated with different data. Typically, this is when you want to test the actual application logic as much as the functionality. For instance, you might want to test the registration flow as described above. Or you may be testing that your shopping cart logic is correct. Let’s look at a couple of scenarios that are ideal for this approach.

Shopping cart logic. Many eCommerce sites employ quite complex shopping cart logic. You need to test this carefully since any errors could be expensive. Firstly, the cart must correctly calculate the total cost. This means you need to test with multiple combinations of items. Secondly, you need to test the logic around special offers. For instance, “buy one get one free”. Thirdly, you need to test voucher codes properly, including any logic relating to validity. Finally, there’s usually some complex logic relating to delivery costs depending on the options chosen, cart total, etc.  

User management. Any system with registered users requires you to test user management functions. For instance, you need to test user registration flows. This includes testing using different forms of invalid data. You also need to test changes and updates to user data. Finally, you may need to test permissions. E.g. does a given user have permission to perform an action? If you change their permissions, does the system get updated correctly?

Assessing whether data-driven testing is appropriate

Here are some simple guidelines for deciding whether you should use data-driven testing.

  1. Do you repeat the same test steps several times in different tests? Typically, this is the case for many user flows within your UI.
  2. Are there obvious happy and sad paths you are trying to test? For instance, will one set of data trigger an error message and a different set not?
  3. Do you need to test multiple variations of the same data? When you are testing application logic, you need to be able to vary the input data.
  4. Will different data generate different outcomes for a test? For instance, login flows which will either login the user or not.
  5. Are multiple tests run in parallel on the same system? If so, you may want to ensure you use different data for each test.

If you can answer yes to any of these, then you should consider looking at data-driven testing.

The last point is especially important. If you do have multiple tests running at once, it is very easy to inadvertently “step on the toes” of a different test. For instance, in many systems, a user can only be logged in from one location at a time. So, you need to ensure that each test is using a different user. One way to do this is to have a system that assigns users sequentially as requested by each test instance.

When to avoid data-driven testing

Of course, there are some scenarios when data-driven testing isn’t suitable. These fall roughly into two groups.

Frequency of testing. If you are only repeating a test occasionally, it may not be worth setting up data-driven testing. Equally, if you usually use a single set of data but only occasionally vary it, then data-driven testing may not be worth doing. 

The complexity of the result. Data-driven testing needs you to be able to specify the expected outcome for each set of test data. In many scenarios, the outcome varies so much that it becomes difficult to do this. This is especially true if the different outcomes bring up completely different page views. 

There are many other reasons not to use data-driven testing. Your team may lack the skills and experience to set it up. You may be just starting to create tests for a new product. You may decide that the effort saved is not sufficient to make it worth doing. Or the tests you have simply may not benefit from this approach! At the end of the day, you have to assess tests case-by-case to see if it’s worth implementing data-driven testing.