Using pitch estimation to play with music in Scratch

I’ve added a pitch extraction machine learning model to Machine Learning for Kids today. In this post, I want to describe the model a little, and suggest a few ways that students could use it.


I started adding pretrained machine learning models to Machine Learning for Kids last year. Although my main focus is still allowing students to create their own machine learning models and make things with them, there are some fun projects that can be made using models that are too complex for students to train by themselves.

imagenet (that I added last Christmas), and the question-answering model (that I added in April) are both good examples of that!

I hope this one will be similarly welcomed!


The new model is a pitch estimation model. Given some audio as input, you can use it to recognize the dominant pitch in sung audio (even if there is background music and noise).

The model is called SPICE (“Self-supervised PItch Estimation”), which reflects that one of the particular novelties in creating the model was the approach to collecting and using training data – avoiding the challenges of trying to label an audio ground truth.  This is described more in the paper that introduces the model – a lot of the maths in that paper goes over my head, but the description of the training approach is quite readable and very interesting!

The training set used for the model was MIR-1k, which is a set of 1000 sound recording clips:

The duration of each clip ranges from 4 to 13 seconds, and the total length of the dataset is 133 minutes. These clips are extracted from 110 karaoke songs which contain a mixture track and a music accompaniment track. These songs are freely selected from 5000 Chinese pop songs and sung by our labmates of 8 females and 11 males. Most of the singers are amateur and do not have professional music training.

Amazing. 🙂

Using SPICE in Scratch

I’ve created a new Scratch extension to let students play with this model in their Scratch projects – adding blocks to start/stop listening to the computer microphone and a block that returns the frequency of a note the model has recognized.

I also added a few helper blocks that convert between frequency and note names and MIDI numbers to make it easier to use.

I haven’t written any project worksheets using these blocks yet, but I have a few early rough ideas for what could be done.

To give any of these a try, or to try making your own idea, go to

Idea : visual tuner in Scratch

You could make a tuner. Tell it the note you want to sing, and see how close you can get to it.

This actually works quite well – the arrow sprite is a nice responsive visualisation of how flat or sharp you are. (I’ve removed the audio from my testing video, because no-one needs to hear my singing!)

In practice, a class full of children all trying to hold a sung note would probably get chaotic pretty quickly, but it’s a fun and simple make.

Download: voice-tuner.sb3

Idea : record and play back

You could sing a simple melody to Scratch, and let it play back the tune it thought it heard.

Essentially, you make a list and add the midi note for every note the project hears to it. Then when you’re ready, use the Music extension to play the list of notes.

This is a little fiddly, as the play back gets complicated when you want to take different note lengths into account, or sing the same note more than once. But the basic idea sort of works, and it’s kind of interesting.

Just don’t have the recording and playing back running at the same time, because it quickly gets in a loop!

Download: play-it-back.sb3

Idea : music graph

Play a song and use Scratch to draw a graph of the vocals.

The idea is to use the Pen tool to draw a graph, with the frequency returned by the machine learning model determining the “y” coordinate.

This is a little scrappy, and needs a little more work. It’s hard to recognize many aspects of a song from the graph it generates.

(I’ve removed the audio from my testing video to avoid copyright claim headaches… imagine you are listening to Michael Bublé’s finest work. If you can recognize the song from the graph, I’d be amazed.)

The range is perhaps more useful, and I think you could probably spot extremes.

For example if you generated a graph from a song with mostly high notes, and then generated a new graph from a song with mostly low notes, the frequency ranges in the different graphs should be quite obvious – you can tell which is the graph for the low vocals compared with the graph for the high vocals.

Anything more detailed than that would need a bit more thought.

Download: voice-graph.sb3

What else?

These are just quick tyre-kicking ideas I did to get the Scratch extension working. I’m sure there’s something more creative that could be done. I’ve got a vague idea at the back of my head about trying to do something auto-tuney but nothing coherent enough to share yet. ?

For inspiration about other possibilities, check out this Raspberry Pi blog post about a guitar-tuner hack – that is an amazing project.

I think it’d be difficult to get something so detailed working in Scratch, but I love the idea of combining pitch detection with something physical.

Huge thanks to the engineers behind SPICE for sharing their work – and publishing it all to TensorFlow Hub which made it so easy to put the Scratch extension together.

If you can think of other models I should be adding to the collection, please let me know!

Tags: , ,

One Response to “Using pitch estimation to play with music in Scratch”

  1. Thanks for sharing this article, This is very helpful.