Using MirrorMaker 2

July 15th, 2020

I’ve been talking about MirrorMaker 2 this week – the Apache Kafka tool for replicating data across two Kafka clusters. You can use it to make a copy of messages on your Kafka cluster to a remote Kafka cluster running on a different data centre, and keep that copy up to date in the background.

For the discussion we had, I needed to give examples of how you might use MirrorMaker 2, which essentially meant I spent an afternoon drawing pictures. As some of them were a little pretty, I thought I’d tidy them up and share them here.

We went through several different use cases, but I’ll just describe two examples here.
Read the rest of this entry »

IBM Event Streams v10

June 30th, 2020

On Friday, we released the latest version of IBM Event Streams. This means I’ve been doing a variety of demo sessions to show people what we’ve made and how it works.

Here’s a recording of one of them:

In this session, I did a run-through of the new Event Streams Operator on Red Hat OpenShift, with a very quick intro to some of the features:

00m30s – installing the Operator
02m10s – creating custom Kafka clusters in the OpenShift console
05m10s – creating custom Kafka clusters in IBM Cloud Pak for Integration
08m00s – running the sample Kafka application
08m50s – creating topics
10m20s – creating credentials for client applications
11m45s – automating deployment of event-streaming infrastructure
12m30s – using schemas with the schema registry
13m10s – sending messages with HTTP POST requests
13m45s – viewing messages in the message browser
14m00s – command line administration
14m30s – running Kafka Connect
15m10s – geo-replication for disaster recovery
15m50s – monitoring Kafka clusters in the Event Streams UI
17m10s – monitoring with custom Grafana dashboards
17m30s – alerting using Prometheus

This is IBM

June 20th, 2020

The “This is IBM” videos are a nice intro to some of the things that we work on at Hursley.

They’re not too technical, they’re not “sales-y” for IBM products, they’re interesting stories, and each one is only a few minutes long.

I also like them as I’ve worked with all of these awesome people before, so it’s fun to see them being all serious on camera – even if it makes me a little jealous that they’re so much better at it than me!

Read the rest of this entry »

Pretrained models in Machine Learning for Kids

May 25th, 2020

I’ve started adding pretrained machine learning models to Machine Learning for Kids. In this post, I wanted to describe what I’m doing.

Read the rest of this entry »


May 23rd, 2020

For a while now, I’ve been playing a different Nintendo Switch game every day, sharing a video clip on Twitter.

In this post I’ll collect together all of the clips, and the answers to the questions I’ve been asked along the way.

Read the rest of this entry »

Using with Machine Learning for Kids

May 10th, 2020

Students can work on machine learning projects in Python entirely in the browser, without any need for setup, installs, or registration.

Read the rest of this entry »

Bringing AI into the classroom

February 28th, 2020

IBM and mindSpark are running a series of free webinars for teachers about artificial intelligence.

This evening’s 90 minute webinar was about bringing AI into the classroom, and I helped contribute some of the content.

The session was very interactive, but there were some pre-prepared presentations in there.

I’ve got a recording of one of the segments below, in which I shared some of my experiences of introducing AI and machine learning in schools, and what I’ve found works well.

Read the rest of this entry »

Why are Kafka messages still on the topic after the retention time has expired?

February 9th, 2020

We had an interesting Kafka question from an Event Streams user. The answer isn’t immediately obvious unless you know a bit about Kafka internals, and after a little searching I couldn’t find an explanation online, so I thought I’d share the answer here (obviously anonymised and heavily simplified).

What is retention?

Retention is a Kafka feature to help you manage the amount of disk space your topics use.

It lets you specify how long you want Kafka to keep messages on a topic for. You can specify this by time (e.g. “I want messages on this topic to be preserved for at least X days”) or by disk usage (e.g. “I want at least the last X gb of messages on this topic to be preserved”).

After the retention time or disk threshold is exceeded, messages become eligible for being automatically deleted by Kafka.

What was wrong in this case?


They had created a topic with a retention time of 7 days.

They had assumed that this meant messages older than 7 days would be deleted.

When they looked at the messages on their topic, they could see some messages older than 7 days were there, and were surprised.

They thought this might mean retention wasn’t working.

Read the rest of this entry »