Machine learning and the humans behind the screen

Machine learning is at the heart of AI. But most ML models need to be trained. Here, we see how humans are essential for creating good training data.

Machine learning is at the heart of AI. But most ML models need to be trained. Here, we see how humans are essential for creating good training data.

August 13, 2019
Geoffrey Shenk

Elevate Your Testing Career to a New Level with a Free, Self-Paced Functionize Intelligent Certification

Learn more
Machine learning is at the heart of AI. But most ML models need to be trained. Here, we see how humans are essential for creating good training data.

The untold story of AI

AI is a truly disruptive technology and most of its successes come from machine learning. However, in order to work, most ML algorithms need to be trained. Here, we look at the humans that sit behind the success of AI.

Machine learning is the basis of most of the successful applications of AI that have emerged over the past few years. ML works by taking some algorithm and training it to spot certain patterns. These patterns might be numerical, lexical (related to words) or visual. Using ML you can teach a computer to convert speech to text, to recognize objects in a picture, even to spot patterns in stock prices that might predict where the market is moving.

But have you ever wondered how machine learning really works? I’m not just talking about neural networks, decision trees or support vector machines. What I mean is how do you create the datasets that are used to train the AI? Without these datasets, machine learning is simply impossible. And if these datasets aren’t large enough, or high-enough quality, then they resulting AI will perform badly at recognizing the patterns you want it to spot.

A machine learning primer

Let’s look at a simple example of an artificial neural network (ANN) to explain how ML works. The basis for this is the artificial neuron shown below.

Peceptrons or artificial neurons are often used in machine learning (From https://commons.wikimedia.org/wiki/File:Artificial_neural_network.png)

 

The neuron calculates a weighted sum of all its inputs. Then an activation function will determine whether the neuron should ‘fire’ or not. These activation functions are generally simple mathematical functions such as the identity function, binary step, logistic (soft step) or TanH. We call this type of artificial neuron a "perceptron". 

Perceptrons use different transfer functions

 

ANNs consist of a multi-layer network of perceptrons. The nature of the data set determines the number of inputs and the number of states you are looking for dictates the number of outputs. There are one or more hidden layers in the middle. 

An artificial neural network (By Glosser.ca - Own work, Derivative of File:Artificial neural network.svg, CC BY-SA 3.0, https://commons.wikimedia.org/w/index.php?curid=24913461)

A simple machine learning example

Imagine you want to create a machine learning model to decide what time you should get up. On a weekday you want to wake up at 7 am. On a weekend you want to wake up at 9 am. But if you are on holiday you want to wake up at 10 am whatever day it is.

In this instance, you need an ANN with 2 inputs. The first will encode the day of the week as the numbers 1-7 (Monday-Sunday). The second is a binary flag for holiday or not. There will be 3 outputs reflecting the 3 times you get up. In the middle will be a hidden layer with 2 perceptrons. At each stage, the inputs are all assigned weights such that the sum at each node is equal to 1.0 (think of these as probabilities). Initially, you assign completely random weights as shown below. In the hidden layer, the activation function is a step with a threshold of 1.0. E.g. it outputs 1 if the sum of the inputs is greater than 1.0 or 0 if it is less. 

A simple machine learning example

Walking through the example

If you input Wednesday (3) and not on holiday you can see that these weights give a result of {0.5, 0.8, 0.0}, weakly implying you should wake up at 9 am. The correct result should be {1.0, 0.0, 0.0}. The aim is to try and improve the weights that were assigned by a process called backpropagation. Put simply, you are trying to decide whether to increase or decrease each weight. You do this by walking back through the network. At each stage, you look at the error (loss function) and find its derivative. The aim is to move it in such a direction that it is minimized by using a technique such as gradient descent. There isn’t enough space to explain this in detail in this blog, but you can read a detailed explanation here.

The importance of training data

The important thing is that in order to train ML models, you need a reliable set of data where you know what the outputs should be. This is the so-called Ground Truth. Clearly, in the toy example above there are simple rules that allow us to define the correct outputs. But imagine if you were actually trying to train a Convolutional Neural Network (CNN) to recognize handwritten numerals? Banks need technology like this to automate the reading of checks.

CNNs are a special class of artificial neural network originally developed by Yann LeCun in 1988. LeCun modeled CNNs on the human eye. Essentially they work by breaking down an image into a number of smaller images and applying convolutions to these to extract the features. You repeat this process many times. To train a CNN like this you might choose to use the well-known MNIST dataset.  

The MNIST data set shows samples of handwritten numerals

Creating training data for more complex problems

But what if you need to train an image classifier to classify more complex images such as identifying vehicles in a moving image? A recent article on the BBC website highlighted this perfectly. It tells the story of the 1,000 or so Kenyans that work for Samasource in their Nairobi office. Samasource is based in San Francisco. Their tagline is “Ground truth data for your computer vision and natural language algorithms.” Samasource counts some of the biggest tech giants among its customers, including Google and Microsoft, as well as some companies that may seem surprising at first glance, like VW and eBay.

Outsourcing to the developing world

The Samasource office in Nairobi specializes in labeling images of road scenes. These are then used as training data for ML. The human workers spend hours each day looking at images taken by vehicles driving along, carefully outlining each object on the screen and tagging it as ‘car’, ‘pedestrian’, ‘bicycle’, ‘road-sign’, etc. Their aim is to create a pixel-perfect outline of each object. A supervisor then assesses the quality of their work. If you perform really well you receive perks like shopping vouchers or having your name displayed on the big screens in the office.

The aim of this work is, of course, to provide high-quality training and verification datasets that can be used to develop self-driving vehicles. SDVs rely on a number of sensors such as LiDAR, radar, and ultrasound, but some of the richest and most dynamic data come from their camera systems, which are also among the cheapest sensors. So, if you can train an AI to perform image recognition at highway speeds, this is a significant step towards making SDVs more affordable.

reCAPTCHA – training AIs on the sly

Google’s reCAPTCHA is a really interesting example of ML training. You will all be familiar with the reCAPTCHA “I’m a human” screens. For a long time, these asked you to type two fragments of text from an image to prove that you are human. Quite simply, by doing this you were helping to train Google’s OCR engines to cope with complex tasks (you may recall that usually, one word was harder to read than the other).

reCaptcha used to help train machine learning to read text

 

More recently, they have started to ask you to “click on all the pictures of buses/cars/bicycles” or “select all the squares showing street signs”. As you will maybe guess, this is helping to verify the accuracy of image segmentation and classification systems such as those mentioned above for use with SDVs. 

reCaptcha now trains machine learning for self-driving vehicles

Training Functionize’s AIs

Here at Functionize, we use AI extensively, especially machine learning. So, how do we train our models? Well, rest assured that we are not exploiting thousands of low-skilled, low-paid workers from the 3rd world for this! Instead, many of our systems use a process called Reinforcement Learning. Essentially this just means the computer learns from its mistakes. In Reinforcement learning, what matters is the actual performance. Each time the algorithm runs it will make some changes to try and improve its performance based on some metric.

Training our natural language processing

Let’s look at a very simple example of a test that adds an item to your shopping cart and checks that the correct item is present. Using our NLP test creation, this can be written as a few simple steps:

  1. Open the homepage
  2. Locate the search box
  3. Enter “Samsung Galaxy S9” and click enter
  4. Verify that you are on the Samsung Galaxy S9 product page
  5. Locate the “Add to cart” button and click it
  6. Now click on the cart at the top right
  7. Verify that your cart contains 1 Samsung Galaxy S9

NLP then converts this into a fully functional test. And the system is using machine learning to understand what is really happening “under the hood”. Each time you run the test it gets more training data. For instance, when you locate and press the “Add to cart” button, the system is fingerprinting this action and learning all the API calls, the server responses and the dependent changes to the page (it may be that the shopping cart icon now changes to show there are items in your cart). The upshot of this is that if a redesign moves the button on the page and changes it to “Buy Now”, then the test is able to self-heal.

This approach is especially cool. Even if you don’t have a large training set to begin with you can train a model. This makes it ideal where you are trying to apply machine learning to very specific cases, such as testing a web application! Other techniques we use include Markov Decision Processes, custom NLP (trained using a combination of keywords and natural language) and even image recognition (to identify things like custom icons on a screen).

Conclusions

As we have seen, good quality training data is essential for many machine learning applications. You almost always need to use a human to classify images before they can be used to train a computer. However, if you only have a limited dataset, you will never be able to use a classic ML approach to train your AI. This is where other techniques like reinforcement learning become valuable. At Functionize we have learned that often the best results actually come from combining several of these techniques and some of our coolest work comes from these hybrid approaches.