Social media updates with Kafka Connect

November 19th, 2024

In this post, I’ll show how to bring posts from open social media networks (Bluesky and Mastodon) into Kafka using Kafka Connect source connectors.

My goal is to be able to populate a Kafka topic with status updates posted to social media.

Rather than to try and do this with the full firehose of all status updates, this is done with status updates that match a search term or hashtag.

For example, the screenshot above is a Kafka topic with posts from Bluesky that mention the term “xbox”.

Read the rest of this entry »

Using IBM Event Processing with rules engines

November 12th, 2024

In this post, I’ll demonstrate how Event Processing can use parameters from an external source (such as a rules engine) in event processing flows.

A simple flow to demonstrate the idea

To illustrate the idea, I created a simple demo event processing flow. The flow takes a stream of order events, filters it to keep only orders for high value items, and then modifies the description property in some of the events:

The filter node is comparing the price with “40”, so only order events for items with a value above $40 are kept.

The transform node is modifying the description property of order events – any description that contains the string “Cargo Jeans” is replaced with “Combat Trousers”.

Hard-coded parameters

What if you wanted to modify the threshold for the filter, to change that $40 minimum value for an order to be considered “large”?

Or what if you wanted to modify the transformation, so that different strings would be used in the regular expression replacement?

With the values hard-coded in the flow as shown above, you would need to:

  • create a savepoint for the job
  • stop the job
  • modify the parameters in the job
  • resume the job from the savepoint

This is a workable approach, although it does require a little downtime and some administrative effort.

The aim for this post is to highlight an alternative approach.

Read the rest of this entry »

Creating custom record builders for the Kafka Connect MQ Source Connector

October 28th, 2024

In this post, I want to share an example of handling bespoke structured messages with the Kafka Connect MQ Source Connector.

The MQ Source Connector gets data from MQ messages and produces it as events on Kafka topics. The default record builder makes a copy of the data as-is. For example, this can mean taking a JMS TextMessage from MQ and producing a string to Kafka. Or it can mean taking a JMS BytesMessage from MQ and producing a byte array to Kafka.

In my last post, I showed an example of using the XML record builder, to read XML documents from MQ and turn them into structured Kafka Connect records. From this point, I could choose the format I want the data to be produced to Kafka in (e.g. JSON or Avro) by choosing an appropriate value converter (e.g. org.apache.kafka.connect.json.JsonConverter or io.apicurio.registry.utils.converter.AvroConverter).

But what if your MQ messages have a custom structure, but you still want Kafka Connect to be able to parse your messages and output them to Kafka in any format of your choice?

In that case, you need to use a record builder that can correctly parse your MQ messages. In this post, I’ll explain what that means, show you how to create one, and share a sample you can use to get started.

Read the rest of this entry »

Analysing IBM MQ messages in IBM Event Processing

October 27th, 2024

In this post, I’ll walk through a demo of using IBM Event Processing to create an Apache Flink job that calculates summaries of messages from IBM MQ queues.

This is a high-level overview of the demo:

  • A JMS/Jakarta application puts XML messages onto an MQ queue
  • A JSON version of these messages is copied onto a Kafka topic
  • The messages are processed by a Flink job, which outputs JSON results onto a Kafka topic
  • An XML version of the results are copied onto an MQ queue
  • The results are received by a JMS/Jakarta application

I’ve added instructions for how you can create a demo like this for yourself to my demos repo on Github.

The rest of this post is a walkthrough and explanation of how it all works.

Read the rest of this entry »

Event-driven tech at IBM TechXchange

October 19th, 2024

This week, I’m at IBM TechXchange: our annual technical learning conference.

Our other big annual event Think has a business focus, but TechXchange is for technologists to advance their skills and expertise.

There are thousands of presentations, demos, workshops and hands-on labs to choose from, but naturally the most interesting ones will be about event-driven architectures and event stream processing technologies. 😉

In this post, I’ll share a few of our sessions from each day – if you’re at TechXchange this week, I hope to see you at some of these!

Read the rest of this entry »

Analysing Wikipedia edits with IBM Event Processing

October 14th, 2024

In this post, I’ll share a demo I gave today to explain some of the processing nodes in the palette of IBM Event Processing.

I’ve found that demonstrations of Event Processing are easier to understand when I don’t need to explain the stream of events I’m processing in the first place. This means I’m always looking for interesting real-world event streams that are widely understood, as they can make for the most effective demos.

With this in mind, today I tried explaining a few of the Event Processing nodes by using them with a live stream of events representing pages that are being created and edited in the English Wikipedia.


Click on the image for a higher-resolution screenshot

Each event contains:

  • title of the page
  • who made the edit (user ID if logged in, or IP address if anonymous)
  • was this the creation of a new page, or an edit of an existing page?

Every edit on Wikipedia results in an event on the Kafka topic, so there are typically a few events a second. It’s not a super-high-throughput topic in Kafka terms, but there are enough events to try out interesting ideas.


Click on the image for a higher-resolution screenshot

Here are a few of the demos I gave today.

This is by no means an exhaustive list of what you could do with this data, but it was enough to let me show what the most commonly-used tools in the palette can do.

Read the rest of this entry »

Analysing social media sentiment with IBM Event Processing

October 10th, 2024

aka “Who wants a Mario alarm clock?”

In this post, I want to share a quick demo of using Event Processing to process social media posts.

diagram

Background

A fun surprise from Nintendo today: they’ve introduced a new product! “Alarmo” is a game-themed alarm clock, with some interesting gesture recognition features.

I was (unsurprisingly!) tempted…

But that got me wondering how the rest of the Internet was reacting.

In this post, I want to share a (super-simple!) demo for how to look at this – using IBM Event Processing to create an Apache Flink job that looks at the sentiment of social media posts about this unusual new product.

Read the rest of this entry »

Taming the Kafka topics Wild West

September 17th, 2024

aka Approaches to managing Kafka topic creation with IBM Event Streams

How can you best operate central Kafka clusters, that can be shared by multiple different development teams?

Administrators talk about wanting to enable teams to create Kafka topics when they need them, but worry about it resulting in their Kafka clusters turning into a sprawling “Wild West”. At best, they talk about the mess of anonymous topics that are named and configured inconsistently. At worst, they talk about topics being created or configured in ways that negatively affect their Kafka cluster and impact their other users.

With that in mind, I wanted to share a few ideas for how to control the topics that are created in your Event Streams cluster:

Read the rest of this entry »