Archive for the ‘code’ Category

Exploring Language Models in Scratch with Machine Learning for Kids

Sunday, March 2nd, 2025

In this post, I want to share the most recent section I’ve added to Machine Learning for Kids: support for generating text and an explanation of some of the ideas behind large language models.


youtu.be/Duw83OYcBik

After launching the feature, I recorded a video using it. It turned into a 45 minute end-to-end walkthrough… longer than I planned! A lot of people won’t have time to watch that, so I’ve typed up what I said to share a version that’s easier to skim. It’s not a transcript – I’ve written a shortened version of what I was trying to say in the demo! I’ll include timestamped links as I go if you want to see the full explanation for any particular bit.

The goal was to be able to use language models (the sort of technology behind tools like ChatGPT) in Scratch.

youtu.be/Duw83OYcBik – jump to 00:19

For example, this means I can ask the Scratch cat:

Who were the Tudor Kings of England?

Or I can ask:

Should white chocolate really be called chocolate?

Although that is fun, I think the more interesting bit is the journey for how you get there.

(more…)

Using MirrorMaker 2 for simple stream processing

Thursday, February 13th, 2025

Kafka Connect Single Message Transformations (SMTs) and MirrorMaker can be a simple way of doing stateless transformations on a stream of events.

There are many options available for processing a stream of events – the two I work with most frequently are Flink and Kafka Streams, both of which offer a range of ways to do powerful stateful processing on an event stream. In this post, I’ll share a third, perhaps overlooked, option: Kafka Connect.

(more…)

Running OpenMessaging benchmarks on your Kafka cluster

Monday, February 3rd, 2025

The OpenMessaging Benchmark Framework is typically used to benchmark messaging systems in the cloud, but in this post I want to show how useful it can also be for Kafka clusters that you run yourself in Kubernetes (whether that is using the open source Strimzi operator, or IBM’s Event Streams).

From openmessaging.cloud:

The OpenMessaging Benchmark Framework is a suite of tools that make it easy to benchmark distributed messaging systems.

As I’ve written about before (when illustrating the impact of setting quotas at the Kafka cluster level, and when adding quotas at the event gateway level), Apache Kafka comes with a good performance test tool. That is still my go-to option if I just want an easy way to push data through a Kafka cluster in bulk.

But – OpenMessaging’s benchmark has some interesting features that make it a useful complement and worth considering.

The benefit that OpenMessaging talk about the most is that can be used with a variety of messaging systems, such as RocketMQ, Pulsar, RabbitMQ, NATS, Redis and more – although in this post I’m only interested in using it to benchmark an Apache Kafka cluster.

More interesting for me was their focus on realistic workloads rather than relying on static data.

Quoting again from openmessaging.cloud:

Benchmarks should be largely oriented toward standard use cases rather than bizarre edge cases

(more…)

Understanding event processing behaviour with OpenTelemetry

Friday, January 31st, 2025

When using Apache Kafka, timely processing of events is an important consideration.

Understanding the throughput of your event processing solution is typically straightforward : by counting how many events you can process a second.

Understanding latency (how long it takes from when an event is first emitted, to when it has finished being processed and an action has been taken in response) requires more coordination to be able to measure.

OpenTelemetry helps with this, by collecting and correlating information from the different components that are producing and consuming events.

From opentelemetry.io:

OpenTelemetry is a collection of APIs, SDKs, and tools. Use it to instrument, generate, collect, and export telemetry data (metrics, logs, and traces) to help you analyze your software’s performance and behavior.

A distributed event processing solution

To understand what is possible, I’ll use a simple (if contrived!) example of an event processing architecture.

(more…)

Turning noise into actionable alerts using Flink

Tuesday, January 28th, 2025

In this post, I want to share two examples of how you can use pattern recognition in Apache Flink to turn a noisy stream of output into something useful and actionable.

Imagine that you have a Kafka topic with a stream of events that you sometimes need to respond to.

You can think of this conceptually as being a stream of values that could be plotted against time.

To make this less abstract, we can use the events from the sensor readings topic that the “Loosehanger” data generator produces to. Those events are a stream of temperature and humidity readings.

Imagine that these events represent something that you might need to respond to when the event for a sensor exceeds some given threshold.

You can think of it visually like this:

(more…)

“Shoebox”: an artificial intelligence history project

Saturday, January 11th, 2025

What was IBM Shoebox?

IBM Shoebox was the world’s first speech-recognition system, created in 1961. It was a voice controlled calculator: you input a sum by speaking the numbers zero through nine and six command words, including “plus”, “minus”, and “total”.

To calculate 12 + 34 you could say “one two plus three four total” and it would respond with the answer.

You can see it being used by inventor William Dersch in this two-minute demo video.


youtu.be/rQco1sa9AwU

(more…)

Using Kafka Streams for a Kafka Event Projection

Monday, December 2nd, 2024

In this post, I’ll walk through a sample implementation of Kafka Streams to maintain an Event Projection. I’ll use this to illustrate when this is a suitable approach to use.

I’ve written similar Event Projection posts about sample implementations that use an in-memory lookup table, and a PostgreSQL database.

The objective for this demo

I introduced the pattern of Event Projections in Comparing approaches to maintaining an Event Projection from Kafka topics.

I also explained the scenario that I’ve been using for each of my Event Projections demos. If you haven’t seen my other posts, it may help to go back and see the scenario detail and motivation first.

In short, I’m showing how to maintain a projection of two Kafka topics (one based on the event key, the other based on an attribute in the event payload). And I’m showing how an application could make an HTTP/REST call to retrieve the data from the most recent event that matches some query.

At a high-level, the goal for this demo is to:

  • use Kafka Streams to maintain a projection of the Kafka topics
  • provide an HTTP/REST API for querying the projection

(more…)

Using a database for a Kafka Event Projection

Friday, November 29th, 2024

In this post, I’ll walk through a sample implementation of using a database to maintain an Event Projection. I’ll use this to illustrate when this is a suitable approach to use.

The objective for this demo

In Comparing approaches to maintaining an Event Projection from Kafka topics, I introduced the pattern of Event Projections.

I also introduced the scenario that I’ll be using in these demos. Please see that post for the detail and motivation, but to recap, I want to maintain a projection of two Kafka topics (one based on the event key, the other based on an attribute in the event payload).

In both cases, I want to be able to make an HTTP/REST call to retrieve the data from the most recent event that matches my query.

At a high-level, the goal was to:

  • use Kafka Connect JDBC sink connectors to maintain a database projection of the Kafka topics
  • provide an HTTP/REST API for querying the projection

(more…)