I added support for “local projects” (storing projects on your own computer) to Machine Learning for Kids this week. In this post, I want to give a little background.
In the beginning…
Since I started Machine Learning for Kids in 2017, I had the site storing student projects in the cloud. (Originally it was a MySQL database, which I moved to PostgreSQL and Cloud Object Storage a few years later.)
When I say I’m storing “projects”, I’m mostly talking about the training examples that students create and collect to train their machine learning model, plus a little bit of metadata about each project itself.
There were many reasons for storing everything in the cloud:
- I thought students would often not complete a machine learning project in a single sitting. I wanted them to be able to start a project then log on again a week later and pick up where they left off – so I wanted persistence.
- I was training all machine learning models in the cloud at the time – so having the training data made things easier
- It kept hardware requirements for school computers to the absolute minimum – so projects could still be created on the low-spec computers I was often seeing in schools
Setting limits
Usage of the site exploded far beyond what I’d originally expected, and a key impact was the amount of storage I needed for the site. I’m storing gigabytes after gigabytes of images that students have created and collected to train image classifiers. Gigabytes and gigabytes of spectrograms from sound recordings that students have taken to train sound classifiers. Even text and numbers projects, which you’d think would compress so well that it’d be next to nothing… nope, I’m storing huge amounts of that, too.
To try and keep running costs under some sort of control, I introduced some pretty stingy limits, for example:
- Number of projects
I wanted to encourage students to delete a project once they finished it, so I limited this to two. This would allow students a little overlap as they started a new project, whilst still encouraging students to clean up projects they’re not working on any more. - Amount of training data in a project
For example, for text projects you can store 500 text examples. For image projects, you can store 100 images. And so on. I set limits that I felt were high enough to enable simple student classroom projects that would help school students understand what machine learning was.
(Plus I thought asking a seven year old to write or collect 500 sentences, create 100 images, or take 100 sound recordings would be such a mammoth undertaking that this wouldn’t get in the way.)
The complaints. So. Many. Complaints.
These limits have long been the source of a lot of emails I get about the site.
Teachers explain their students want to keep all the projects they’ve ever worked on. They don’t want to just be able to keep two projects, they want to store a dozen. Or more.
Teachers are increasingly using the site with older students that aren’t daunted by the prospect of creating hundreds of training examples.
Teachers complain their students have amazing ideas for projects that need more training examples to be successful, and that limits are preventing them from properly exploring their ideas.
Does everything need to be in the cloud?
Over the last few years, I’ve been moving more of the machine learning processing out of the cloud and onto the browser using TensorFlow.js.
Web browsers are powerful today – the capability of a web browser is light years ahead of what it was when I started working on Machine Learning for Kids back in 2016-ish. What I think was the right answer back then isn’t necessarily the right answer today.
Do I even need to store the training data in the cloud?
Enter “local projects”
My answer to “Let me keep a dozen projects with thousands of training examples each!” has been “Sorry, no, I don’t want to have to store all of that“.
Now I’ve got what I hope is a better answer:
“Okay, if you can store all of that yourself“.
With “local projects“, the training examples and all* the project metadata are stored in the student’s web browser using IndexedDB.
Limits begone! 🙂
If I don’t have to pay for the storage, there is no need for me to make students delete their old projects, or be stingy on how big their projects can get.
The cloud still has its uses
The previous approach to storing projects (what I’m now describing as “cloud projects”) isn’t going anywhere. None of that has changed or been removed.
Storing projects in my PostgreSQL database and Cloud Object Storage buckets still has value.
IndexedDB support is increasingly widespread, but even so – not all students have computers or devices that can store all of their data in the browser.
In a school or code club setting, not all students can depend on being able to use the same shared computer from one lesson to the next.
In a classroom where a computing lesson means getting one of the tired and clunky old laptops out of the cupboard, it’s not safe to rely on each student being able to get the same computer they had last week (and that no-one has since cleared the browser storage in the meantime!).
Plus for non-Scratch projects (e.g. students using Machine Learning for Kids to make a Python project) storing everything in the web browser makes things harder.
The idea of being able to store your project in the cloud and access it from any computer is still a useful thing, so I’m not going to take that away.
It’s live!
It took a lot longer to implement this than I had hoped… with a variety of painful issues such as how to let the different subdomains in the site all share the same training databases. But I’m finally done – “local projects” went live this week.
I would’ve packed this blog post with screenshots, but not much looks different with this new feature. I put a lot of work into making “local projects” look and feel and behave in the same way as “cloud projects”. I hope most students will hardly notice any difference!
* – Okay, almost all. Very nearly all. Few tiny exceptions for text projects to enable coordinating with the Watson Assistant API.
Tags: mlforkids-tech