Maverick Dark Castle

November 17th, 2025

On Saturday, I ran the Maverick Dark Castle: an 11km night time trail run around Corfe Castle.

I’d not done a night run before, and had never run a trail run (needing to buy my first pair of trail shoes for this event!) so take my uninformed opinions here with a pinch of salt.

Running in the dark

The run started a little after 8pm. It is November, so that means it was dark. Very dark. The route isn’t lit, so head torches were essential to see where you’re going.

An uprooted tree was the biggest obstacle. Most of what we had to navigate were branches and tree roots… oh, and mud. So much mud.


Photo of me (I’m underneath the “ic” in “Maverick”) close to the start of the run. I think I love this photo not just because of the arty light from the flare but because you can’t see whatever stupid face I’m probably pulling…

Read the rest of this entry »

Introducing Generative AI into Code Clubs

November 14th, 2025

I recently spoke at Clubs Conference about how Generative AI can be introduced into Code Clubs.

The recording is a little fuzzy, but still watchable, I think.


youtu.be/yzdSJ6p0BP8

(The video includes all of the sessions from the first day of the Conference – my bit starts at 5:10:50).

My slides are here:

(I didn’t have a lot in my slides, as most of the talk was based around live demos.)

Flink SQL aggregate functions

November 3rd, 2025

In this post, I want to share a couple of very quick and simple examples for how to use LISTAGG and ARRAY_AGG in Flink SQL.

This started as an answer I gave to a colleague asking about how to output collections of events from Flink SQL. I’ve removed the details and used this post to share a more general version of my answer, so I can point others to it in future.

Windowed aggregations

One of the great things in Flink is that it makes it easy to do time-based aggregations on a stream of events.

Using this is one of the first things that I see people try when they start playing with Flink. The third tutorial we give to new Event Processing users is to take a stream of order events and count the number of orders per hour.

In Flink SQL, that looks like:

SELECT
    COUNT (*) AS `number of orders`,
    window_start,
    window_end,
    window_time
FROM
    TABLE (
        TUMBLE (
            TABLE orders,
            DESCRIPTOR (ordertime),
            INTERVAL '1' HOUR
        )
    )
GROUP BY
    window_start,
    window_end,
    window_time

In our low-code UI, it looks like this:

However you do it, the result is a flow that emits a result at the end of every hour, with a count of how many order events were observed during the last hour.

But what if you don’t just want a count of the orders?

What if you want the collection of the actual order events emitted at the end of every hour?

To dream up a scenario using this stream of order events:

At the end of each hour, emit a list of all products ordered during the last hour, so the warehouse pickers can prepare those items for delivery.

This is where some of the other aggregate functions in Flink SQL can help.

LISTAGG

If you just want a single property (e.g. the name / description of the product that was ordered) from all of the events that you collect within each hourly window, then LISTAGG can help.

For example:

SELECT
    LISTAGG (description) AS `products to pick`,
    window_start,
    window_end,
    window_time
FROM
    TABLE (
        TUMBLE (
            TABLE orders,
            DESCRIPTOR (ordertime),
            INTERVAL '1' HOUR
        )
    )
GROUP BY
    window_start,
    window_end,
    window_time

That gives you a concatenated string, with a comma-separated list of all of the product descriptions from each of the events within each hour.

You can use a different separator, but it’s a comma by default.

ARRAY_AGG

If you want to output an object, with some or even all of the properties (e.g. the name and quantity of the products that was ordered) from all of the events that you collect within each hourly window, then ARRAY_AGG can help.

For example:

SELECT
    ARRAY_AGG (
        CAST (
            ROW (description, quantity)
                AS
            ROW <description STRING, quantity INT>
        )
    ) AS `products to pick`,
    window_start,
    window_end,
    window_time
FROM
    TABLE (
        TUMBLE (
            TABLE orders,
            DESCRIPTOR (ordertime),
            INTERVAL '1' HOUR
        )
    )
GROUP BY
    window_start,
    window_end,
    window_time

The CAST isn’t necessary, but it lets you give names to the properties instead of the default names you get such as EXPR$0, so downstream processing is easier.

In each hour, an event is emitted that contains an array of objects, made up of properties from the events in that hour.

And naturally you could add additional GROUP BY, for example, if you wanted a separate pick list event for each region:

SELECT
    region,
    ARRAY_AGG (
        CAST (
            ROW (description, quantity)
                AS
            ROW <description STRING, quantity INT>
        )
    ) AS `products to pick`,
    window_start,
    window_end,
    window_time
FROM
    TABLE (
        TUMBLE (
            TABLE orders,
            DESCRIPTOR (ordertime),
            INTERVAL '1' HOUR
        )
    )
GROUP BY
    window_start,
    window_end,
    window_time,
    region

This outputs five events at the end of every hour, one with a list of each of the orders for the NA, SA, EMEA, APAC, ANZ regions.

Try it yourself

I’ve used the “Loosehanger” orders data generator in these examples, so if you want to try it for yourself, you can find it at
github.com/IBM/kafka-connect-loosehangerjeans-source.

If you want to try it in Event Processing, the instructions for setting up the tutorial environment can be found at
ibm.github.io/event-automation/tutorials.

AI patterns in event driven architectures

November 3rd, 2025

I gave a talk at Current last week about how artificial intelligence and machine learning are used with Kafka topics. I had a lot of examples to share, so I wrote up my slides across several posts.

I’ll use this post to recap and link to my write-ups of each bit of the talk:

I started by talking about the different building blocks that are needed, and the sorts of choices that teams make.

Next, I talked about how projects to introduce AI into event driven architectures typically fall into one or more of these common patterns:

The most common, and the simplest: using AI to improve and augment the sorts of processing we can do with events. This can be as simple as using off-the-shelf pre-trained models to enrich a stream of events, and using this to filter or route the event as part of processing.

Perhaps the newest (and the pattern that is recently getting the most interest and attention) is to use streams of events to trigger agents, so that they can autonomously take actions in repsonse.

Maybe the less obvious approach is to collect and store a projection of recent events, and use these to enhance an agentic AI, by making it available as a queryable or searchable form of real-time context.

And finally, the longest established pattern is to simply use the retained history of Kafka topics as a relevant source of historical training data, for training new custom and bespoke models.

Using streams of events to train machine learning models

November 2nd, 2025

In this post, I describe how event streams can be used as a source of training data for machine learning models.

I spoke at Current last week. I gave a talk about how artificial intelligence and machine learning are most commonly used with Kafka topics. I had a lot to say, so I didn’t manage to finish writing up my slides – but this post covers the last section of the talk.

It follows:

The talk covered the four main patterns for using AI/ML with events.

This pattern was where I talked about using events as a source of training data for models. This is perhaps the simplest and longest established approach – I’ve been writing about this for years, long pre-dating the current generative AI-inspired interest.

Read the rest of this entry »

Using event streams to provide real-time context for agentic AI

November 1st, 2025

In this post, I describe how event stream projections can be used to make agentic AI more effective.

I spoke at a Kafka / Flink conference on Wednesday. I gave a talk about how AI and ML are used with Kafka topics. I had a lot to say, so this is the fourth post I’ve needed to write up my slides (and I’ve still got more to go!).

The talk was a whistlestop tour through the four main patterns for using artificial intelligence and machine learning with event streams.

This pattern was where I talked about using events as a source of context data for agents.

Read the rest of this entry »

Triggering agentic AI from event streams

October 31st, 2025

In this post, I describe how agentic AI can respond autonomously to event streams.

I spoke at Current on Wednesday, about the most common patterns for how AI and ML are used with Kafka topics. I had a lot of content I wanted to cover in the session, so it’s taking me a while to write it all down:

The premise of the talk was to describe the four main patterns for using AI/ML with events. This pattern was where I started focusing on agents.

Read the rest of this entry »

Using AI to augment event stream processing

October 30th, 2025

In this post, I describe how artificial intelligence and machine learning are used to augment event stream processing.

I gave a talk at a Kafka / Flink conference yesterday about the four main patterns for using AI/ML with events. I had a lot to say, so it is taking me a few days to write up my slides.

The most common pattern for introducing AI into an event driven architecture is to use it to enhance event processing.

As part of event processing, you can have events, collections of events, or changes in events – and any of these can be sent to an AI service. The results can inform the processing or downstream workflows.

Read the rest of this entry »