Archive for the ‘code’ Category

Embedding Tiny Language Models in Flink SQL

Wednesday, May 20th, 2026

I gave a talk at Current yesterday about how to embed a tiny language model inside your Flink SQL pipeline.

I used a fun mix of demos to show what I think are the main approaches available for using generative AI with Kafka events from a Flink SQL job. Some demos were definitely more sensible than others!

These are the slides I used, and what I’d planned to say.

#

In this session, I’ll be talking about your options for running language models for Flink SQL jobs.

I’ll cover:

  • your options for where you run them, in relation to Flink
  • what sorts of choices you have for the models you run
  • how to use them – the sorts of prompts and settings we’d want for Flink
  • how to keep an eye on it that it’s working well
  • and finally, some thoughts on when it’s a good idea to do any of this

(more…)

Instrumenting a Kafka Connect connector with metrics

Saturday, May 2nd, 2026

Metrics can help provide operational insight over Kafka Connect connectors, informing users of how to better configure them. With simple updates, a Kafka Connect connector can be instrumented to make this possible by emitting useful metrics.

A couple years ago, I created a simple skeleton Connect connector project to help developers at a hackathon create their first Kafka connector.

I’ve updated the source connector from that sample to emit metrics. In this post, I’ll walk through what I did, as an example for how to add metrics to your own Kafka connector.

(more…)

How to create a Scratch extension

Monday, April 27th, 2026

A few years ago, I ran a workshop about how to create custom Scratch blocks.

I made a template repository, based on the Scratch Team repos, but with a skeleton extension and some extra scripts and automation to handle building and publishing it. I included step-by-step instructions for building different types of Scratch extensions, including Scratch blocks based on web APIs, and Scratch blocks based on JavaScript modules from npm.

(more…)

“How many Kafka events will Flink process per second?”

Saturday, April 11th, 2026

I’m often asked this. The specific question varies, but it’s typically some variation of asking how quickly a single CPU of Flink processes events from a Kafka topic.

Why “per CPU”? Maybe because enterprise software is typically charged per CPU? Maybe because I tend to talk to people who run everything in Kubernetes, who think of running software in terms of requests / limits? Not sure, but the question tends to be framed from the perspective of asking how much processing they can expect to get from a CPU.

I try to avoid doing the engineer thing of answering “it depends“… but… it really does depend!

That is the motivation behind this post: to give me something I can point at as an illustration of the degree to which Flink’s performance varies (and a taste of the range of interrelated factors that influence it).

(more…)

Extending Flink SQL

Sunday, March 29th, 2026

In this post, I’ll share examples of how writing user-defined functions (UDFs) extends what is possible using built-in Flink SQL functions alone.

I’ll share examples of how UDFs can:

(more…)

Processing JSON with Kafka Connect

Wednesday, February 18th, 2026

In this post, I’ll share examples of how to process JSON data in a Kafka Connect pipeline, and explain the schema format that Kafka uses to describe JSON events. 

Using sink connectors

Kafka Connect sink connectors let you send the events on your Kafka topics to external systems. I’ve talked about this before, but to recap the structure looks a bit like this:

Imagine that you have this JSON event on a Kafka topic. 

{
    "id": 12345678,
    "message": "Hello World",
    "isDemo": true
}

How should you configure Kafka Connect to send that somewhere? 

It depends…

(more…)

Improving support for older computers and mobile devices on Machine Learning for Kids

Friday, January 16th, 2026

In this post, I want to share some changes I’ve been making to how I train models in Machine Learning for Kids.

(more…)

Flink SQL examples with click tracking events

Monday, January 12th, 2026

In this post, I introduce a few core Flink SQL functions using worked examples of processing a stream of click tracking events from a retail website.

I find that a practical, real-world (ish) example can help to explain how to use Flink SQL in a way that abstract descriptions, such as processing coloured blocks sometimes doesn’t quite achieve.

I’ll use this post to give examples of my most-used Flink SQL functions, in the context of a retail scenario: a stream of events from customers on the website for a clothing retailer.

Note: I used Event Processing to create the flows, as the assistants in the canvas helped me create examples quickly. Everything I’ve created is standard Apache Flink SQL, so you don’t need to have Event Processing to try these examples.

(more…)