This post was written for MachineLearningForKids.co.uk/stories: a series of stories I wrote to describe student experiences of artificial intelligence and machine learning, that I’ve seen from time I spend volunteering in schools and code clubs.
Machine learning models don’t just give an answer, they also typically return a score showing how confident the system is that it has correctly recognized the input.
Knowing how to use this confidence score is an important part of using machine learning.
An example of how a student used this in their project is shown in this video. Their Scratch script says that if the machine learning model has less than 50% confidence that it has correctly recognized a command, it replies “I’m sorry I don’t understand” (instead of taking an action).
The project was trained to understand commands to turn on a lamp or a fan. When they asked it to “Make me a cheese sandwich”, their assistant didn’t try to turn the lamp or fan on, it said “I don’t understand”
This command was unlike any of the example commands that had been used to train the model, causing the machine learning model to have a very low level of confidence that it had recognised the command. This was represented with a very low confidence score.
The challenge for the students making this project was knowing what confidence score threshold to use. Instead of telling them a good value to use, I let them try out different values and decide for themselves. By playing and experimenting with it, they get a feel for the impact that this threshold has on their project.
What they learned was that there isn’t one “correct” confidence threshold that makes sense for all projects or for all machine learning models.
What I mean is, they noticed that if they set the confidence threshold very very low, their assistant would almost always take an action. But it would do this even when it hadn’t understood the command correctly, taking the wrong action.
Conversely, if they set the confidence threshold very very high, their assistant would only take an action when it was very very confident that it had correctly recognized the command. The good thing was that, when it took an action, it rarely took the wrong action.
But the bad thing was that it said “I don’t understand” very often, even when actually it had correctly understood what they’d asked.
It was fascinating hearing students describe their assistant configured like this as being too “shy” or “timid”, and that it “didn’t have enough self-confidence”. My favourite description was a student who described their machine learning-powered assistant as being like when they don’t put their hand up in class to answer a question, even when they actually knew the answer, but aren’t confident enough that they have the right answer.
They were able to recognise and describe these two behaviours for themselves – purely through their own playing and experimenting.
This also sparked a great conversation about when those different types of behaviours are appropriate. For some applications, such as a machine learning system used by doctors, they said they would want a cautious system that doesn’t risk getting things wrong. For other applications, such as machine learning systems that recommend what song they should listen to next, it is better that the system tries when it has an answer, and is okay if it sometimes makes mistakes.
Students learn how to effectively use confidence scores returned by machine learning models by playing, experimenting, and seeing the difference that different confidence thresholds make.
Tags: machine learning, scratch
I wonder if you can dovetail this with the important concepts around aleatoric uncertainty in a machine learning model in addition to the epistemic uncertainty you describe above? I think it’s important to understand both and teaching them and their differences to kids would be great fun.