{"id":5699,"date":"2025-11-02T08:37:18","date_gmt":"2025-11-02T08:37:18","guid":{"rendered":"https:\/\/dalelane.co.uk\/blog\/?p=5699"},"modified":"2026-03-14T21:21:33","modified_gmt":"2026-03-14T21:21:33","slug":"using-streams-of-events-to-train-machine-learning-models","status":"publish","type":"post","link":"https:\/\/dalelane.co.uk\/blog\/?p=5699","title":{"rendered":"Using streams of events to train machine learning models"},"content":{"rendered":"<p><strong>In this post, I describe how event streams can be used as a source of training data for machine learning models.<\/strong><\/p>\n<p>I spoke at <a href=\"https:\/\/current.confluent.io\/new-orleans\">Current<\/a> last week. I gave a talk about how artificial intelligence and machine learning are most commonly used with Kafka topics. I had a lot to say, so I didn&#8217;t manage to finish writing up my slides &#8211; but this post covers the last section of the talk.<\/p>\n<p>It follows:<\/p>\n<ul>\n<li>the <a href=\"https:\/\/dalelane.co.uk\/blog\/?p=5678\">building blocks used in AI\/ML Kafka projects<\/a><\/li>\n<li>how <a href=\"https:\/\/dalelane.co.uk\/blog\/?p=5682\">AI \/ ML is used to augment event stream processing<\/a><\/li>\n<li>how <a href=\"https:\/\/dalelane.co.uk\/blog\/?p=5686\">agentic AI is used to respond autonomously to events<\/a><\/li>\n<li>how <a href=\"https:\/\/dalelane.co.uk\/blog\/?p=5692\">events can provide real-time context to agents<\/a><\/li>\n<li>how <a href=\"https:\/\/dalelane.co.uk\/blog\/?p=5699\">events can be used as a source of training data<\/a> for models (<em>this post<\/em>)<\/li>\n<\/ul>\n<p><img decoding=\"async\" style=\"border: thin black solid; width: 100%; max-width: 600px; aspect-ratio: 1350 \/ 759;\" src=\"https:\/\/images.dalelane.co.uk\/2025-10-29-eda-ai\/Slide76.png?raw=true\"\/><\/p>\n<p>The talk covered the four main patterns for using AI\/ML with events.<\/p>\n<p>This pattern was where I talked about using events as a source of training data for models. This is perhaps the simplest and longest established approach &#8211; I&#8217;ve been <a href=\"https:\/\/dalelane.co.uk\/blog\/?p=3924\">writing about this<\/a> <a href=\"https:\/\/dalelane.co.uk\/blog\/?p=4124\">for years<\/a>, long pre-dating the current generative AI-inspired interest.<\/p>\n<p><!--more--><img decoding=\"async\" style=\"border: thin black solid; width: 100%; max-width: 600px; aspect-ratio: 1350 \/ 759;\" src=\"https:\/\/images.dalelane.co.uk\/2025-10-29-eda-ai\/Slide77.png?raw=true\"\/><\/p>\n<p>Our Kafka topics hold records of what has happened. By training a model on this historical record, we can create custom models that can recognise when something interesting happens in future.<\/p>\n<p><img decoding=\"async\" style=\"border: thin black solid; width: 100%; max-width: 600px; aspect-ratio: 1350 \/ 759;\" src=\"https:\/\/images.dalelane.co.uk\/2025-10-29-eda-ai\/Slide78.png?raw=true\"\/><\/p>\n<p>This can be as simple as reading the appropriate topics when creating the machine learning model \u2013 replaying the events as often as needed to tune the model.<\/p>\n<p>This is easier to explain with an example.<\/p>\n<p><img decoding=\"async\" style=\"border: thin black solid; width: 100%; max-width: 600px; aspect-ratio: 1350 \/ 759;\" src=\"https:\/\/images.dalelane.co.uk\/2025-10-29-eda-ai\/Slide79.png?raw=true\"\/><\/p>\n<p><strong>Imagine a stream of events with resolved customer support issues<\/strong> \u2013 events that contain the original customer support issue description and how the issue was resolved.<\/p>\n<p>This would be a great source of training data to create custom machine learning models that learn from those experiences to reduce the time to resolve future issues.<\/p>\n<p><img decoding=\"async\" style=\"border: thin black solid; width: 100%; max-width: 600px; aspect-ratio: 1350 \/ 759;\" src=\"https:\/\/images.dalelane.co.uk\/2025-10-29-eda-ai\/Slide80.png?raw=true\"\/><\/p>\n<p>For example, training a text classifier with the name of the department that resolved the issue would be a simple way to create a classifier that can suggest the most appropriate department for new support issues.<\/p>\n<p><img decoding=\"async\" style=\"border: thin black solid; width: 100%; max-width: 600px; aspect-ratio: 1350 \/ 759;\" src=\"https:\/\/images.dalelane.co.uk\/2025-10-29-eda-ai\/Slide81.png?raw=true\"\/><\/p>\n<p>This assumes that you have that sort of Kafka topic with a complete set of training data: the record of what happened, together with the label for what you want the model to learn for it.<\/p>\n<p>Often this won&#8217;t be the case, but event stream processing can help with this. Event stream processing can correlate across multiple streams of events to turn raw events into usable training data.<\/p>\n<p><img decoding=\"async\" style=\"border: thin black solid; width: 100%; max-width: 600px; aspect-ratio: 1350 \/ 759;\" src=\"https:\/\/images.dalelane.co.uk\/2025-10-29-eda-ai\/Slide82.png?raw=true\"\/><\/p>\n<p>That would look something like this: Event stream processing pre-processing the raw events before using it to train a model.<\/p>\n<p><img decoding=\"async\" style=\"border: thin black solid; width: 100%; max-width: 600px; aspect-ratio: 1350 \/ 759;\" src=\"https:\/\/images.dalelane.co.uk\/2025-10-29-eda-ai\/Slide83.png?raw=true\"\/><\/p>\n<p>Sticking with the same use case as before, imagine if this started as two topics:<\/p>\n<ul>\n<li>one topic with initial customer support issues as they are submitted<\/li>\n<li>a separate topic that later records how that issue gets resolved<\/li>\n<\/ul>\n<p>Event stream processing could correlate these two separate streams of events to create a single set of usable training data (the initial support issue and how it was resolved).<\/p>\n<p><img decoding=\"async\" style=\"border: thin black solid; width: 100%; max-width: 600px; aspect-ratio: 1350 \/ 759;\" src=\"https:\/\/images.dalelane.co.uk\/2025-10-29-eda-ai\/Slide84.png?raw=true\"\/><\/p>\n<p>This could be implemented in Flink SQL with a simple interval join. The output from this would be a clean set of labelled training data ready for creating a custom model.<\/p>\n<p>If you use a data platform that supports Kafka topics as an input source, such as <a href=\"https:\/\/www.ibm.com\/products\/cloud-pak-for-data\">Cloud Pak for Data<\/a> (amongst many others) that is an easy pipeline to get ready.<\/p>\n<p><img decoding=\"async\" style=\"border: thin black solid; width: 100%; max-width: 600px; aspect-ratio: 1350 \/ 759;\" src=\"https:\/\/images.dalelane.co.uk\/2025-10-29-eda-ai\/Slide85.png?raw=true\"\/><\/p>\n<p>This pattern doesn&#8217;t need to be exclusive from the others.<\/p>\n<p>In the earlier examples I went through, where <a href=\"https:\/\/dalelane.co.uk\/blog\/?p=5682\">event processing was enhanced by using custom models to recognise or predict something from new events<\/a> \u2013 these custom models will likely have been trained using historical events from the same topics. This pattern pairs nicely with those earlier use cases.<\/p>\n<p><img decoding=\"async\" style=\"border: thin black solid; width: 100%; max-width: 600px; aspect-ratio: 1350 \/ 759;\" src=\"https:\/\/images.dalelane.co.uk\/2025-10-29-eda-ai\/Slide76.png?raw=true\"\/><\/p>\n<p>This was the last of the four patterns that I covered, showing that Kafka topics can be useful in the creation of AI \/ ML services, not just the usage of them.<\/p>\n","protected":false},"excerpt":{"rendered":"<p>In this post, I describe how event streams can be used as a source of training data for machine learning models. I spoke at Current last week. I gave a talk about how artificial intelligence and machine learning are most commonly used with Kafka topics. I had a lot to say, so I didn&#8217;t manage [&hellip;]<\/p>\n","protected":false},"author":1,"featured_media":5700,"comment_status":"open","ping_status":"closed","sticky":false,"template":"","format":"standard","meta":{"footnotes":""},"categories":[3],"tags":[593,584,580],"class_list":["post-5699","post","type-post","status-publish","format-standard","has-post-thumbnail","hentry","category-tech","tag-apachekafka","tag-kafka","tag-machine-learning"],"_links":{"self":[{"href":"https:\/\/dalelane.co.uk\/blog\/index.php?rest_route=\/wp\/v2\/posts\/5699","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/dalelane.co.uk\/blog\/index.php?rest_route=\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/dalelane.co.uk\/blog\/index.php?rest_route=\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/dalelane.co.uk\/blog\/index.php?rest_route=\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/dalelane.co.uk\/blog\/index.php?rest_route=%2Fwp%2Fv2%2Fcomments&post=5699"}],"version-history":[{"count":1,"href":"https:\/\/dalelane.co.uk\/blog\/index.php?rest_route=\/wp\/v2\/posts\/5699\/revisions"}],"predecessor-version":[{"id":5887,"href":"https:\/\/dalelane.co.uk\/blog\/index.php?rest_route=\/wp\/v2\/posts\/5699\/revisions\/5887"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/dalelane.co.uk\/blog\/index.php?rest_route=\/wp\/v2\/media\/5700"}],"wp:attachment":[{"href":"https:\/\/dalelane.co.uk\/blog\/index.php?rest_route=%2Fwp%2Fv2%2Fmedia&parent=5699"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/dalelane.co.uk\/blog\/index.php?rest_route=%2Fwp%2Fv2%2Fcategories&post=5699"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/dalelane.co.uk\/blog\/index.php?rest_route=%2Fwp%2Fv2%2Ftags&post=5699"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}