Teaching students that collecting more training examples improves accuracy

This post was written for MachineLearningForKids.co.uk/stories: a series of stories I wrote to describe student experiences of artificial intelligence and machine learning, that I’ve seen from time I spend volunteering in schools and code clubs.

This video starts with one student’s training data from their Pac-Man project. They played a simplified version of Pac-Man in Scratch.

They set up the game in Scratch so that every time they pressed an arrow key (‘left’, ‘right’, ‘up’, or ‘down’) as well as moving their Pac-Man character, it put the x,y coordinates for Pac-Man and the Ghost into the training bucket for that direction.

For example, when Pac-Man was at x=3,y=4 and the Ghost was at x=5,y=5 – they went right. That became a training example for when it’s good to go right. and so on.

They used these examples to train their own machine learning model to be able to play the game: a machine learning model that can predict when Pac-Man should go left, right, up, or down, (based on the current location of Pac-Man and the current location of the Ghost). Their goal is to train the game so that Pac-Man can play itself, without them having to touch any of the keys at all.

Some students will be impatient – eager to try their model as soon as they can. Their machine-learning Pac-Man will play very badly at first.

With only a few training examples, their Pac-Man might not have even learned enough to navigate the maze and can get stuck in a corner or against a wall.

They go back and do a bit more training, and collect more training examples of them playing. With a few more examples, they see Pac-Man able to navigate the maze but it gets caught by the Ghost very quickly.

They do even more training – collecting more examples, and they see their Pac-Man surviving a little longer.

They play more and more, adding more training examples, and see their Pac-Man start to play well. By the end of the lesson, they’ve collected a large number of training examples, and some students will see their Pac-Man able to evade the Ghost forever.

What these students stumble onto is the correlation between the amount of training data, and the accuracy of a machine learning model.

This is a crucial principle in applying machine learning technologies, but instead of being told that as a rule, students can discover this for themselves through observing the performance of their own machine learning model – noticing how it improves as they add more training examples.

Giving students the freedom to experiment with their machine learning model lets them learn for themselves about the relationship between the quantity of training data and the accuracy of machine learning models.

Tags: ,

Comments are closed.