Explaining crowd-sourcing in machine learning

I pushed a new feature to Machine Learning for Kids last night: “class projects”. Now a whole class of students can work on a project together – all helping to train a shared, group machine learning model.

I’ll write some proper documentation for it, but in the meantime I thought I’d share a few quick thoughts on how this works and what it’s for.


Until now, students created private projects, to independently collect training data, and train their own machine learning models. That’s still the default.

But there are reasons why it might be useful to sometimes have students work together on shared projects.

More ambitious projects

It opens up more ambitious projects.

Say you want to do a project where kids train a text classifier to recognise passages as being one of four different types. If you start with a goal of ten training examples of each type, then you’re asking each student to type in forty passages.

For many kids, that’s not a realistic goal for a lesson. That’s too much typing to do in too little time. So that project would’ve been too ambitious to consider in a code club session or school lesson.

Now think of doing that with a class of 30 kids. If each student can manage typing in just two examples, then they’ve already collected 60 passages – more than enough to try the project.

Suddenly a variety of more complex and ambitious projects become realistic.

A lesson in crowd-sourcing

This lets us explain the use of crowd-sourcing in machine learning. We can use the sort of small-scale experience in the classroom I described above as a demonstration and metaphor for big real-world projects.

When kids see how much easier it is to do a machine-learning project with 30 people than 1, then it’s not a big leap to realise how much easier it would be with 300 people or 3000 people. And how the complexity and ambition of the projects could continue to grow.

How it works

Teachers can now optionally choose to create “class projects”.

This is a project that they create and set up, but which will appear in the projects list for all the students in their class when they log in.

Students can’t do anything too destructive to these class projects that they access (so they can’t delete the project, they can’t remove training buckets, etc.) but what they can do is add training examples to the buckets.

When it’s time to use the examples to train a machine learning model, only the teacher gets the “Train ML model” button. The model that is trained is made available to all the students in the class for them to use in their own Scratch projects.

More detail?

I’ll write new “crowd-sourced” versions of some of the existing worksheets that I think would most benefit from this.

For now, I’ve quickly started with “Sorting Hat”, so you can look at the project worksheet for “Sorting Hat” as a rough idea of what this might look like in practice.


Leave a Reply