Archive for the ‘code’ Category

Using TensorFlow to make predictions from Kafka events

Sunday, September 6th, 2020

This post is a simple example of how to use a machine learning model to make predictions on a stream of events on a Kafka topic.

It’s more a quick hack than a polished project, with most of this code hacked together from samples and starter code in a single evening. But it’s a fun demo, and could be a jumping-off point for starting a more serious project.

For the purposes of a demo, I wanted to make a simple example of how to implement this pattern, using:

  • sensors that are easily and readily available, and
  • predictions that are easy to understand (and easy to generate labelled training data for)

With that goal in mind, I went with:

  • for the sensors providing the source of events, I used the accelerometer and gyroscope on my iPhone
  • to set up the Kafka broker, I used the Strimzi Kafka Operator
  • for the machine learning model, I used TensorFlow to make a simple bidirectional LSTM
  • the predictions I’m making are a description of what I’m doing with the phone (e.g. is it in my hand, is it in my pocket, etc.)

I’ve got my phone publishing a live stream of raw sensor readings, and passing that stream through an ML model to give me a live stream of events like “phone has been put on a table”, “phone has been picked up and is in my hand”, or “phone has been put in a pocket while I’m sat down”, etc.

Here is it in action. It’s a bit fiddly to demo, and a little awkward to film putting something in your pocket without filming your lap, so bear with me!

The source code is all at
github.com/dalelane/machine-learning-kafka-events.

(more…)

Supporting CI/CD with Kubernetes Operators

Thursday, August 20th, 2020

Operators bring a lot of benefits as a way of managing complex software systems in a Kubernetes cluster. In this post, I want to illustrate one in particular: the way that custom resources (and declarative approaches to managing systems in general) enable easy integration with source control and a CI/CD pipeline.

I’ll be using IBM Event Streams as my example here, but the same principles will be true for many Kubernetes Operators, in particular, the open-source Strimzi Kafka Operator that Event Streams is based on.

(more…)

Pretrained models in Machine Learning for Kids

Monday, May 25th, 2020

I’ve started adding pretrained machine learning models to Machine Learning for Kids. In this post, I wanted to describe what I’m doing.


(more…)

Geo-spatial data in Scratch

Wednesday, November 13th, 2019

In this post, I want to share a random thing I made in Scratch this week, and ask for suggestions of what I could do with it.


Click for larger version

I get a lot of emails from teachers and coding groups asking for help with Scratch projects. They’re normally small or specific questions – asking for help figuring out a bug in a Scratch project or how to get something working.

But this week I got a more challenging email. It asked for a way to show a map in Scratch, and use a Scratch script to plot points on the map, given coordinates in latitude and longitude.

I agreed to give it a try. (Details for how to access it below.)

(more…)

Using TensorFlow with IBM Event Streams
(Kafka + Machine Learning = Awesome)

Thursday, October 31st, 2019

In this post, I want to explain how to get started creating machine learning applications using the data you have on Kafka topics.

I’ve written a sample app, with examples of how you can use Kafka topics as:

  • a source of training data for creating machine learning models
  • a source of test data for evaluating machine learning models
  • an ongoing stream of events to make predictions about using machine learning models

I’ll use this post to explain how it works, and how you can use it as the basis of writing your first ML pipeline using the data on your own Kafka topics.

(more…)

Using Avro schemas from Python apps with IBM Event Streams

Thursday, October 17th, 2019

I’ve written before about how to write a schema for your developers using Kafka. The examples I used before were all in Java, but someone asked me yesterday if I could share some Python equivalents.

The principles are described in the Event Streams documentation, but in short, your Kafka producers use Apache Avro to serialize the message data that you send, and identify the schema that you’ve used in the Kafka message header. In your Kafka consumers, you look at the headers of the messages that you receive to know which schema to retrieve, and use that to deserialize message data.

(more…)

SQL queries on Kafka topics using Apache Hive

Tuesday, August 6th, 2019

Apache Hive is open source data warehouse software built on top of Hadoop. It gives you an SQL-like interface to a wide variety of databases, filesystems, and other systems.

One of the Hive storage handlers is a Kafka storage handler, which lets you create a Hive “external table” based on a Kafka topic.

And once you’ve created a Hive table based on a Kafka topic, you can run SQL queries based on attributes of the messages on that topic.

I was having a play with Hive this evening, as a way of running SQL queries against messages on my Kafka topics. In this post, I’ll share a few queries that I tried.


(more…)

Using OpenWhisk in Machine Learning for Kids

Sunday, July 28th, 2019

I’ve moved a couple of bits of Machine Learning for Kids into OpenWhisk functions. In this post, I’ll describe what I’m trying to solve by doing this, and what I’ve done.

Background

I’ve talked before how I implemented Machine Learning for Kids, but the short version is that most of it is a Node.js app, hosted in Cloud Foundry so I can easily run multiple instances of it.

The most computationally expensive thing the site has to do is for projects that train a machine learning model to recognize images.

In particular, the expensive bit is when a student clicks on the Train new machine learning model button for a project to train the computer to recognize images.

(more…)