Teaching students that crowdsourcing and gamification help generate training data

This post was written for MachineLearningForKids.co.uk/stories: a series of stories I wrote to describe student experiences of artificial intelligence and machine learning, that I’ve seen from time I spend volunteering in schools and code clubs.

I like running projects like Pac-Man (where students collect training examples by playing a game) with a class after they’ve done a project like chatbots (where students collect training examples by typing them in).

There will often by at least one (shamelessly honest!) student in the class who will tell me that typing in example questions to train a chatbot is ‘boring’! And that they think creating training examples for Pac-Man by playing a video game is ‘fun’, ‘easier’, and just ‘better’.

This is a useful learning experience for the students, because – they’re not wrong!

Collecting and labelling training examples in machine learning projects is a chore. It is often manual. It is repetitive. It is almost always time-consuming.

But students do understand that projects need a lot of training examples to be successful.

It’s helpful to allow students to notice all of this for themselves, and encourage them to discuss the challenges this brings to real-world AI projects. I’ve seen classes have fascinating and constructive discussions once they realise that making the creation of training data into a game is one way to help.

Students can see for themselves how gamification is a great way to motivate people to create a lot of training data.

Games are an effective tool for teaching students about machine learning, and several of the project worksheets available through Machine Learning for Kids are based on games.

For example, noughts and crosses (or tic tac toe) is a great basis for a project.

As with Pac-Man, students play the game in Scratch, coding the game so that every time they make a move, the state of the game board is added to one of their training buckets.

They can use these training examples to train a machine learning model to be able to play noughts and crosses.

Students can do this individually, with each of them training their own custom machine learning model, and seeing how it behaves in response to their own training.

I remember running one class where the students realised how much quicker and easier this would be if they worked together.

Instead of each student setting up their own training buckets, they asked if they could have one set of training buckets for the whole class, that they all add training examples to.

This was inspired. (At the time, I had to let them all share a username and password to do this, but it went so well I ended up modifying the site to support group/class projects.)

Thirty students can create training examples 30x times faster than one student working on their own.

In the time available in one lesson, thirty students can create 30x as much training data as one student working alone.

And (as students normally realise) the more training examples they can create, the better their model will be at playing the game.

(In fact, this tends to improve the quality of the training data as well as quantity, because it tends to result in more varied examples. This is a nuance that students find easier to recognise in more visual projects.)

Students can see for themselves how crowd-sourcing the generation of training data is a great way to more easily create a large amount of training data.

This sort of project planning are all key parts of planning a lot of successful real-world machine learning projects, whether it’s finding ways to crowdsource the generation of training data, or to create training data as a by-product of something that people are already doing or that they would enjoy doing anyway.

Students understand the fundamentals for why this is done, and can often think of approaches like this themselves based on their first-hand experiences in the classroom.


Tags: ,

Comments are closed.