Automating Visual Testing

Convolutional neural nets — and a superb new test-set — help Functionize perform web visual (and machine-vision-enhanced functional) testing that’s fast, robust, and resource-efficient

Determining if web pages have been correctly rendered is essential to website testing. If a layout breaks, if controls render outside the viewport or selectors stack in the wrong z-index order, content can become unreadable; controls can become unusable.

Robustly evaluating a web page render implies the need to understand its visual semantics: discriminating static vs. dynamic elements; identifying links, buttons, popdowns, and other page objects; discerning images nestled in markup. Given the enormous range of possible designs, design components (e.g., fonts), low-level browser/version rendering variations, dynamic layout changes driven by responsive design, and arbitrary page complexity, even well-trained human testers can be challenged to efficiently and reliably evaluate rendering correctness, or recognize when rendering issues impact functionality. Where releases are frequent, it becomes cost-prohibitive or impossible to test dozens, hundreds, or thousands of pages, perhaps also homologized in multiple languages, for basic correctness and usability.

Clearly, automation is required. But conventional QA/test automation, such as Selenium, is notoriously weak for visual testing. Solutions that purport to enhance Selenium with this capability mostly focus on discerning human-visible deltas between sample screen shots and pages under test. This methodology begs the question of where those screenshots come from (most solutions require you to generate them for every browser, device, and configuration); require considerable manual tweaking (e.g., identifying items that can move or be ignored) before they become usable for even a single test run; and must be updated for any significant change.

The overhead of maintaining tests that use this kind of solution is prohibitive for larger sites, particularly those that change frequently. The cost of such solutions, too, tends to be non-negligible: some Selenium-oriented visual test service providers charge in terms of validations/month, and may offer limited concurrency, even on enterprise-class accounts.  

Functionize: Automating Everything

At Functionize, our goal is to help developers and QA/test engineers avoid coding entirely in preparing test cases. We begin by letting the user record a test case — in the process, capturing a reviewable ‘filmstrip’ of page rendering (so no manual or automated acquisition of screen captures is required). The render is analyzed through a decision tree that segments the page into regions, then invokes a range of visual processing tools to discover, interrogate, and classify page elements: discriminating text from controls and graphics; identifying static vs. dynamic objects; automatically determining what parts of a page are most important to test. Tools built into Functionize include convolutional neural networks trained on heretofore-proprietary sample libraries of fonts, buttons and controls, and other visual elements, as well as robust standardized solutions for source image processing, optical character recognition, and other basic functions.

Once the test case is recorded and validated, it can be executed across the full range of browsers and emulated devices supported: Functionize’s combination of AI visual element classification, web code and real-time DOM analytics is sufficiently robust to enable (for example) reliable visual testing of very large websites, across responsive designs running on multiple devices through a range of viewport configurations — all from a small number of recorded test cases. When changes are made, moreover, these tests turn out to be mostly self-updating: by combining deep DOM analytics with sophisticated visual processing, Functionize can, in many cases, determine not only what has changed, but whether the change will affect functionality, or is otherwise relevant to users. Functionize can also use recordings of actual user site interactions to refine and focus its visual analysis, constraining this resource-intensive process in response to what users are actually doing on your site.