From Event Streams to Smart Streams : Powering AI / ML with your Kafka topics

October 29th, 2025

In this series of posts, I will outline the most common patterns for how artificial intelligence and machine learning are used in event driven architectures.

I’m at a Kafka / Flink conference this week.

This morning, I gave a talk about how AI and ML are used with Kafka topics. I had a lot to say, so I’ll write it up over the next few days:

In this first post, I’ll outline the building blocks available when bringing AI into the event-driven world, and discuss some of the choices that are available for each block.

Read the rest of this entry »

Introducing LLM benchmarks using Scratch

October 18th, 2025

In this post, I want to share a recent worksheet I wrote for Machine Learning for Kids. It is perhaps a little on the technical side, but I think there is an interesting idea in here.

The lesson behind this project

The idea for this project was to get students thinking about the differences between different language models.

There isn’t a “best” model, that is the best at every task. Each model can be good at some tasks, and less good at other tasks.

The best model for a specific task isn’t always necessarily going to be the largest and most complex model. Smaller and simpler models can be better at some tasks than larger models.

And we can identify how good each model is at a specific task by testing it at that task.

Read the rest of this entry »

Event-driven sessions at IBM TechXchange 2025

September 30th, 2025

Next week, I’ll be at IBM TechXchange: our annual technical learning conference.

Our other big annual event Think had a business focus, but TechXchange is for technologists to advance their skills and expertise.

There are thousands of presentations, demos, workshops and hands-on labs to choose from, but naturally the most interesting ones will be about event-driven architectures and event stream processing technologies. šŸ˜‰

In this post, I’ll share what I’ll be up to – if you’re going to TechXchange next week, I hope to see you at some of these!

Read the rest of this entry »

What are we missing in AsyncAPI?

September 29th, 2025

I gave a presentation about What are we missing in AsyncAPI? in the AsyncAPI track at apidays in London last week. My aim for the talk was to start a discussion on where there are opportunities to enhance and extend AsyncAPI.

title slide: "What are we missing in AsyncAPI?"

The talk wasn’t recorded, so I’ll use this post to describe what I talked about.


Kafka and AsyncAPI

I use AsyncAPI to document and describe Kafka applications. That was the (admittedly narrow!) perspective I brought. I suspect that a lot of the specific examples I raised have equivalents in other protocols. Even where they don’t, I was trying to make a protocol-agnostic point that we need to think about the different types of people who work in event driven systems and what information they need for their roles.

Read the rest of this entry »

Fun things to do with Strava

August 10th, 2025

Ninety-nine percent of the time, I think of Strava as a running app. But every now and then, I do something a bit different with it.

Here are a few examples!

Record your way through a maze

When we explored the maze at Wildwood, Strava was a fun way to see just how lost we got.

Being able to see how many times we went round the same sections of the maze on an interactive map adds a great layer to the experience.

This is a great use of Strava… you just need to find a large outdoor maze.

link – Strava activity

Run on a moving surface

We were on a ship with a running track last week, so I thought this was another chance to do something weird with Strava.

My plan was to run laps of the ship deck while it was at sea, and use that to draw a cool spiral GPS trace with each lap’s GPS location slightly offset from the last.

I massively overestimated how fast I run compared with the speed of a ship.

My GPS trace was essentially a straight line. The waves in the line reflect the slight difference between when I was running in the same direction as the ship to when I was running the opposite, for each short lap.


The resulting stats are entertainingly ridiculous. I love that Strava was suspicious of my ability to run sub-3 minute miles.


Draw pictures

This is the most common “something weird with Strava” idea I’ve seen people try: find a field and use GPS as a massive virtual Etch A Sketch.


Our local park is ideal for this.

link – not actually Strava, but similar enough!

geo-tagged photos

We stumbled upon a ceramic tile mosaic styled to look like 8-bit video game pixel art. It was just on a random wall, with nothing announcing, explaining, or sign-posting it.

After accidentally finding a second different mosaic nearby, I decided to spend an evening finding more!

I took pictures of each mosaic.

With Strava, a random collection of photos becomes an annotated interactive map, showing where each photo was taken and where each mosaic is.

(I’ve since learned that these are the work of a street artist called Invaderthanks, Paul!).

link – Strava activity

What else?

If I played sports, I’d try recording that – I think a trace of where you’ve run during football sounds like a fun idea.

What other uses are there for a GPS trace?

Using time series models with IBM Event Automation

July 22nd, 2025

Intro

graphic of an e-bike hire park

Imagine you run a city e-bike hire scheme.

Let’s say that you’ve instrumented your bikes so you can track their location and battery level.

When a bike is on the move, it emits periodic updates to a Kafka topic, and you use these events for a range of maintenance, logistics, and operations reasons.

You also have other Kafka topics, such as a stream of events with weather sensor readings covering the area of your bike scheme.

Do you know how to use predictive models to forecast the likely demand for bikes in the next few hours?

Could you compare these forecasts with the actual usage that follows, and use this to identify unusual demand?

Time series models

A time series is how a machine learning or data scientist would describe a dataset that consists of data values, ordered sequentially over time, and labelled with timestamps.

A time series model is a specific type of machine learning model that can analyze this type of sequential time series data. These models are used to predict future values and to identify anomalies.

For those of us used to working with Kafka topics, the machine learning definition of a ā€œtime seriesā€ sounds exactly like our definition of a Kafka topic. Kafka topics are a sequential ordered set of data values, each labelled with timestamps.

Read the rest of this entry »

A mid-year (non-work) checkpoint

July 1st, 2025

We’re half-way through 2025, which means it’s a good time to check how I’m doing against some of the goals I set for the year.

Read the rest of this entry »

How to use kafka-console-consumer.sh to view the contents of Apache Avro-encoded events

June 12th, 2025

kafka-console-consumer.sh is one of the most useful tools in the Kafka user’s toolkit. But if your topic has Avro-encoded events, the output can be a bit hard to read.

You don’t have to put up with that, as the tool has a formatter plugin framework. With the right plugin, you can get nicely formatted output from your Avro-encoded events.

With this in mind, I’ve written a new Avro formatter for a few common Avro situations. You can find it at:

github.com/IBM/kafka-avro-formatters

The README includes instructions on how to add it to your Kafka console command, and configure it with how to find your schema.

Read the rest of this entry »