Collecting poorly handled inputs is a common practice for training ML projects

This post was written for MachineLearningForKids.co.uk/stories: a series of stories I wrote to describe student experiences of artificial intelligence and machine learning, that I’ve seen from time I spend volunteering in schools and code clubs.

Digital assistants, such as Amazon’s Alexa or Google’s Home, is a great basis for student projects, because it is a use case that the students are familiar with.

A project I’ve run many times is to help students create their own virtual assistant in Scratch, by training a machine learning model to recognise commands like “turn on a lamp”. They do this by collecting examples of how they would phrase those commands.

This is an example of what this can look like:

By the time I do this project, my classes will normally have learned that they need to test their machine learning model with examples they didn’t use for training.

Students like trying to break things – they enjoy looking for edge cases that will trip up the machine learning model. In this case, it can be unusual ways of phrasing commands that their model won’t recognise.

I remember one student came up with ‘activate the spinny thing!’ as a way of asking to turn on a fan, which I thought was inspired.

But when the model gets something wrong, what should they do about that?

Students will normally suggest by themselves that a good thing to do is to collect examples of what their machine learning model gets wrong, and add those to one of their training buckets.

That means every time it makes a mistake, they can add that to their training, and train a new model – and their model will get better at recognizing commands like that in future.

They typically think of this for themselves, because with a little understanding about how machine learning technology behaves, this is a natural and obvious thing to do.

But when Amazon was found to be doing this, maintsteam media was surprised.

Really surprised. Shocked.

Many articles included journalists saying that they had no idea this was something a machine learning project might want to do.

Students who’ve worked through a similar process for themselves can easily understand the motivation.

This is not to say that students automatically think it’s acceptable or appropriate.

Most classes I’ve run are typically mixed on this issue: some students think it’s reasonable, while others think it’s inappropriate. And I’ve watched them have some fantastic debates about it.

The crucial thing is that they debate this in the context of an understanding of the technology, and an understanding of the motivation of the tech companies (an understanding that is often lacking in media reporting).

First-hand experience with machine learning technologies gives students a crucial insight into the behaviour and motivations of tech companies.


Tags: ,

Comments are closed.