What was IBM Shoebox?
IBM Shoebox was the world’s first speech-recognition system, created in 1961. It was a voice controlled calculator: you input a sum by speaking the numbers zero through nine and six command words, including “plus”, “minus”, and “total”.
To calculate 12 + 34
you could say “one two plus three four total” and it would respond with the answer.
You can see it being used by inventor William Dersch in this two-minute demo video.
There is a load of information about Shoebox on ibm.com/history and it is worth a read.
Some of it is just photos and fun trivia. For example, it was named after it’s size – as it was the size of a shoebox. And it was a successor of an earlier (larger) prototype in the 1950’s called “Suitcase”.
But there is also technical information – including hardware specs (Shoebox contained 31 transistors), but perhaps more interestingly there is an explanation of the approach that the system took to recognize words.
This is also covered in a ten-minute film available on the site. This one was also recorded in the 1960’s, but this one is in colour!
I find it fascinating to compare the coverage that Shoebox received at the time with the way that artificial intelligence is reported today. TIME Magazine’s article from November 1961 is a good example:
Shoebox is not distracted by ordinary room noises—even loud ones—but Dersch talks into its microphone gently and takes pains to pronounce his words completely. Shoebox listens and dutifully prints numbers and symbols on a roll of paper. … It is not disturbed if ‘six is pronounced “seex,” but it insists on being obtuse if “five” is pronounced “fi’,” as is common in rapid speech.
[Shoebox’s engineers will] … try to make Shoebox recognize mumbled, slurred, and female voices; at present it can handle only the words of clear-spoken males. Most foreign languages are no problem for Shoebox, but it is baffled by Chinese, Bantu and other tongues that depend on tone for their meaning..
When Shoebox grows up, IBM may set it to work taking down spoken words and numbers for such harried people as airplane pilots or supermarket checkers. Later, it may graduate to recording customers’ orders, controlling machine tools, or solving mathematical problems. Eventually, the day may come when a troubled scientist or businessman can tell his problem by voice to the listening ear of an electronic computer—and get a spoken oracle answer soon after he stops talking.
We’ve been anthropomorphizing AI for over sixty years.
Using Shoebox as an educational project
I think recreating Shoebox today is a great project for children.
It has a simple goal that they can easily understand – adding and subtracting numbers is something they have done since they an early age.
Implementing a calculator is simple to code in low-code environments such as Scratch.
A machine learning model able to recognize fifteen words is small enough to run on (almost?) any computer so it’s accessible even to students with low-powered devices.
A model can be trained to recognize numbers and a few commands with a small number of examples, so students can create this machine learning model for themselves rather than just use an existing model.
It was such a well-documented project that there is a wealth of pictures, videos, documents and news articles about it. Including the history in the project is a great reminder that artificial intelligence is a field with a long history, and that we’re all building on the decades of achievements of engineers that came before us.
Try it for yourself!
One example of how this could be done is using Machine Learning for Kids. You can record examples of yourself saying numbers and words like “plus” and “minus” using your computer’s microphone.
With younger students, it’d be best not to ask them to record all numbers as this is likely a little time-consuming. Just record a few instead. A calculator that can do sums with, for example, the digits 1, 2, and 3 still gets the point across just as well.
The tool shows a spectrogram representation of their recordings.
I always enjoy trying to recognize what I say in these visualisations – such as the two hard “T” sounds in “total” in this screenshot.
These recordings can be used to train a custom machine learning model that can then be used in a Scratch project.
This is what it looks like in action!
If you’d like to give it a go, I’ve written step-by-step instructions that you can download as a free PDF.
It’s Creative Commons-licensed and you can download my original Microsoft Word doc used to make the PDF. I hope that someone will improve on it to make a more compelling activity from this. I am convinced that there are ways to bring some of this history to life that makes this more than just another AI project.
What do you think?
I had a lot of fun learning about Shoebox and putting this together. I’d love to hear what you think of it, especially if you give it a try for yourself.
Tags: mlforkids-tech