Autonomous Template Recognition

In this blog, we’ll explain why AI-based system testing is such a hard problem and look at how autonomous template recognition can help!

In this blog, we’ll explain why AI-based system testing is such a hard problem and look at how autonomous template recognition can help!

November 6, 2018
Geoffrey Shenk

Elevate Your Testing Career to a New Level with a Free, Self-Paced Functionize Intelligent Certification

Learn more
In this blog, we’ll explain why AI-based system testing is such a hard problem and look at how autonomous template recognition can help!

For the last few months, Functionize have been working with one of the largest car manufacturers in the world. The aim of the collaboration is to apply Functionize’s revolutionary AI-based testing approach to automate the testing of in-vehicle infotainment systems. In this blog, we’ll explain why this is such a hard problem and look at how autonomous template recognition can help.

Why infotainment systems are challenging to test

Long-gone are the days when cars were simply fitted with CD players and radios. Nowadays even the cheapest vehicles often include advanced infotainment centers offering a range of functions including entertainment, navigation and vehicle diagnostics and interaction. These systems are UI-based and usually have a large display screen, often a touch screen. The underlying system may be Windows-based, Linux-based or a specialized car-specific OS such as MeeGo.

The systems share much in common with modern web-apps. For instance, you navigate through the system by clicking or selecting icons. Generally, there are multiple routes you can take through the system. And if that wasn’t enough, often the frontend can be heavily customized by the vehicle owner, with different skins, custom icons, and controls that can be assigned to multiple functions. 

Typically, there are multiple ways to interact with the systems (touch screen, joystick/controller, steering-mounted controls, gesture recognition, etc.). This means it is possible that the system will receive multiple near-simultaneous commands from different sources and will need to be able to handle these without freezing.  

Furthermore, every vehicle model will generally have a different infotainment system with access to different features and capabilities. And, these systems are generally not browser-based. They don’t use standard HTML and javascript. Instead, they are usually custom embedded GUIs specifically designed for the task.  

The upshot of this is that testing infotainment systems is a difficult task. It is possible to do it semi-automatically using custom embedded scripts, but test cases have to be manually generated and the overall process can take significant time. In the case of the manufacturer we are working with, scripted tests take around 11 hours to complete (and countless weeks to write) and manual tests take many days.

So how can you test these systems?

Given the lack of HTML and javascript support, the only sensible way to test such a system is visually. Effectively after each action you need to take a screenshot, determine whether the action succeeded and then determine which action to take next. This sounds easy enough to do, but let’s look at it in a bit more detail.

Image recognition

At the heart of any such system is the ability to automatically segment images into the constituent objects, classify those objects and work out the semantic relationship between all the objects. This is exactly the problem that self-driving vehicles face when they have to analyze the input from a vehicle-mounted camera. The usual approach to doing this is to use convolutional neural networks to perform the image processing. However, to do this reliably requires large training sets of pre-classified images. It also needs the objects being classified to be capable of being described unambiguously.

In the system we have developed we found that CNNs were only minimally useful and added extra complexity. Instead what we have done is to create a domain-specific expression language that can accurately describe 2D images. When an image is processed it can be decomposed and described in this language. This allows us to create templates that identify key elements in the system.


Another key requirement is to be able to read the text on screen. To see why this is critical let’s take a typical home screen. Along with a set of icons for frequently-used applications, the screen will probably have a clock displaying the current time and date. It may also display details about the current location or recent traffic alerts. While it would be feasible to just use image recognition to identify these elements, that would be hugely inefficient. A much better approach is to use Optical Character Recognition (OCR) to actually “read” the text. This text can then be parsed using a natural language processing in order to extract the actual semantic content. The upshot is that you can now define areas on screen as “time of day”, “day of week”, etc.

Icon recognition

As mentioned above, the system relies extensively on icons on screen. These icons are customizable, however, there are only a finite number of them. Rather than use the image recognition to identify each icon, we have developed a system that identifies a small number of key features in each icon and uses these to identify the icon.

Template creation

By combining all the approaches above, each application screen can now be described as a template containing certain features. There are two ways to do this. The first is to use static templates. Here, the template gives a static layout for the screen, specifying what icon/text appears in each region. Testing the screen then becomes a matter of verifying the actual image against this template. However, the trouble is that end-users can customize the screens. This means that icons can move about. Also, different vehicles have different layouts, and, as with web applications, new software updates can cause changes.

Autonomous template recognition


Our solution takes a more flexible approach. Rather than statically define each template, we define a dynamic template listing the items that should appear on each screen, and specifying additional metadata such as their relative (not absolute) positions. This last aspect is important because obviously, you want the whole time/day/date field to appear in one location, not scattered all over the screen!

So what’s the end result?

The upshot of all this is that our system can autonomously locate the home screen on the system under test, then proceed to navigate around the system by matching icons against labels and performing suitable actions. The system can replicate numerous human-computer interactions such as clicking, dragging, scrolling, button presses, etc. This allows it to even test complex interactions like customizing the home screen

As the system performs the tests it is constantly recording the screenshots, checking for any visual anomalies and creating warnings unless these are in dynamic data (e.g. date fields, temperature, radio frequency). If icons have been moved, the system simply notes this as a warning, rather than a failure. If icons disappear in a new release, this will be flagged as an error.  

In the image below you can see that the date and radio frequency have changed. The system has marked these in red but won’t issue a warning as these are both dynamic data. 

Autonomous template recognition


Using this approach, testing the entire infotainment system is reduced from a minimum of 11 hours down to just minutes, and 20 or more systems can be tested in parallel. Moreover, once the system has been trained initially (which is simply a matter of labeling icons and their associated actions), the system is fully autonomous.

So where next?

We are very excited by the possibilities of autonomous template recognition in testing. This approach offers a far more flexible way to specify page content on UIs because it can be trained to understand the semantics of what is being displayed. Rather than telling the test to verify each element in turn, the template approach allows you to specify the key content that should be displayed on the page. It is also able to be flexible, understanding that some content changes and that buttons or icons might move.

The really exciting thing about this approach is that, unlike standard test automation, it isn’t limited to HTML-based web frontends. This means it could be used for testing custom UIs in numerous settings. Imagine being able to autonomously test a hospital ICU monitor? Or a complex touch-screen controller for use in an assembly line. Even better, this approach works with legacy systems and is OS-independent. All you need is the ability to interact with the system under test and to record screenshots. 

Autonomous template recognition


Testing custom UIs in systems that don’t rely on a web frontend is a tough problem. This is especially tough when they are part of an embedded system such as a car infotainment system. The autonomous template recognition approach we have developed solves this problem in a flexible way and opens up all sorts of exciting future possibilities. We are already exploring how to apply this approach in other scenarios and believe it will form the basis of a completely new approach to test automation.