{"id":5477,"date":"2025-01-28T22:36:14","date_gmt":"2025-01-28T22:36:14","guid":{"rendered":"https:\/\/dalelane.co.uk\/blog\/?p=5477"},"modified":"2026-03-14T21:29:02","modified_gmt":"2026-03-14T21:29:02","slug":"turning-noise-into-actionable-alerts-using-flink","status":"publish","type":"post","link":"https:\/\/dalelane.co.uk\/blog\/?p=5477","title":{"rendered":"Turning noise into actionable alerts using Flink"},"content":{"rendered":"<p><strong>In this post, I want to share two examples of how you can use pattern recognition in Apache Flink to turn a noisy stream of output into something useful and actionable.<\/strong><\/p>\n<p>Imagine that you have a Kafka topic with a stream of events that you sometimes need to respond to.<\/p>\n<p>You can think of this conceptually as being a stream of values that could be plotted against time.<\/p>\n<p><a href=\"https:\/\/images.dalelane.co.uk\/2025-01-28-actionable\/diagram-1.jpg?raw=true\"><img decoding=\"async\" style=\"width: 100%; max-width: 450px;\" src=\"https:\/\/images.dalelane.co.uk\/2025-01-28-actionable\/diagram-1.jpg?raw=true\"\/><\/a><\/p>\n<p>To make this less abstract, we can use the events from the sensor readings topic that <a href=\"https:\/\/github.com\/IBM\/kafka-connect-loosehangerjeans-source\">the \u201cLoosehanger\u201d data generator<\/a> produces to. Those events are a stream of temperature and humidity readings.<\/p>\n<p><a href=\"https:\/\/images.dalelane.co.uk\/2025-01-28-actionable\/screenshot-1.png?raw=true\"><img decoding=\"async\" style=\"width: 100%; max-width: 450px; border: thin black solid;\" src=\"https:\/\/images.dalelane.co.uk\/2025-01-28-actionable\/screenshot-1.png?raw=true\"\/><\/a><\/p>\n<p>Imagine that these events represent something that you might need to respond to when the event for a sensor exceeds some given threshold.<\/p>\n<p>You can think of it visually like this:<\/p>\n<p><a href=\"https:\/\/images.dalelane.co.uk\/2025-01-28-actionable\/diagram-2.jpg?raw=true\"><img decoding=\"async\" style=\"width: 100%; max-width: 450px;\" src=\"https:\/\/images.dalelane.co.uk\/2025-01-28-actionable\/diagram-2.jpg?raw=true\"\/><\/a><\/p>\n<p><!--more-->When the events contain values that are over the threshold, this is something that you care about. This is something you need to respond to.<\/p>\n<p>You might just think of that as a filtering use case.<\/p>\n<p>For the \u201cLoosehanger\u201d sensor readings, that could be filtering on sensor readings with temperature or humidity readings that exceed a threshold.<\/p>\n<p>In Flink SQL, that would be something like this:<\/p>\n<pre style=\"border: thin #AA0000 solid; color: #770000; background-color: #ffffc0; padding: 1em; overflow-y: scroll; overflow-x: scroll; font-size: 1em; white-space: pre; max-height: 520px;\">SELECT\n    *\nFROM\n    `ALL SENSORS`\nWHERE\n    temperature &gt; 23.5\n        OR\n    humidity &gt; 58;<\/pre>\n<p>If you use the <a href=\"https:\/\/www.ibm.com\/products\/event-automation\/event-processing\">Event Processing UI<\/a> to create that, it would look like this:<\/p>\n<p><a href=\"https:\/\/images.dalelane.co.uk\/2025-01-28-actionable\/screenshot-2.png?raw=true\"><img decoding=\"async\" style=\"width: 100%; max-width: 450px; border: thin black solid;\" src=\"https:\/\/images.dalelane.co.uk\/2025-01-28-actionable\/screenshot-2.png?raw=true\"\/><\/a><\/p>\n<p>A Flink job like this will output all events with values that indicate something needs attention.<\/p>\n<p><a href=\"https:\/\/images.dalelane.co.uk\/2025-01-28-actionable\/screenshot-3.png?raw=true\"><img decoding=\"async\" style=\"width: 100%; max-width: 450px; border: thin black solid;\" src=\"https:\/\/images.dalelane.co.uk\/2025-01-28-actionable\/screenshot-3.png?raw=true\"\/><\/a><\/p>\n<p>The result of a processing flow like this is likely a large and \u201cnoisy\u201d burst of output throughout the time while something needs attention.<\/p>\n<p><a href=\"https:\/\/images.dalelane.co.uk\/2025-01-28-actionable\/diagram-3.jpg?raw=true\"><img decoding=\"async\" style=\"width: 100%; max-width: 450px;\" src=\"https:\/\/images.dalelane.co.uk\/2025-01-28-actionable\/diagram-3.jpg?raw=true\"\/><\/a><\/p>\n<p>Is that helpful output?<\/p>\n<p>After the first event, are the next hundred events you get still useful?<\/p>\n<p><a href=\"https:\/\/images.dalelane.co.uk\/2025-01-28-actionable\/screenshot-4.png?raw=true\"><img decoding=\"async\" style=\"width: 100%; max-width: 450px; border: thin black solid;\" src=\"https:\/\/images.dalelane.co.uk\/2025-01-28-actionable\/screenshot-4.png?raw=true\"\/><\/a><\/p>\n<p>Probably not.<\/p>\n<p>You probably want output that looks more like this:<\/p>\n<p><a href=\"https:\/\/images.dalelane.co.uk\/2025-01-28-actionable\/diagram-4.jpg?raw=true\"><img decoding=\"async\" style=\"width: 100%; max-width: 450px;\" src=\"https:\/\/images.dalelane.co.uk\/2025-01-28-actionable\/diagram-4.jpg?raw=true\"\/><\/a><\/p>\n<p>You want to know when the events indicate that something has happened that needs your attention. While you\u2019re responding to that, you don\u2019t want to be disturbed with more notifications until the next time it happens.<\/p>\n<p>There are several different ways you can implement this. Here are two easy ways to approach it.<\/p>\n<h3>approach 1 : time-based<\/h3>\n<p>After emitting an actionable alert, you could ignore any subsequent events (that contain values over the threshold) for a fixed time interval.<\/p>\n<p><a href=\"https:\/\/images.dalelane.co.uk\/2025-01-28-actionable\/diagram-5.jpg?raw=true\"><img decoding=\"async\" style=\"width: 100%; max-width: 450px;\" src=\"https:\/\/images.dalelane.co.uk\/2025-01-28-actionable\/diagram-5.jpg?raw=true\"\/><\/a><\/p>\n<p>After an actionable alert, you can assume that the issue will be given attention. You can assume someone will look at it and fix it. This means there is no need to trigger additional alerts for a short time afterwards.<\/p>\n<p>For example, doing this with the \u201cLoosehanger\u201d sensor events using a time interval of 10 minutes:<\/p>\n<pre style=\"border: thin #AA0000 solid; color: #770000; background-color: #ffffc0; padding: 1em; overflow-y: scroll; overflow-x: scroll; font-size: 1em; white-space: pre; max-height: 540px;\">CREATE TEMPORARY VIEW MATCHREC AS\n  SELECT\n    *\n  FROM\n    `HIGH VALUE READINGS`\n\n    MATCH_RECOGNIZE (\n      <strong>PARTITION BY<\/strong>\n        sensorid\n      <strong>ORDER BY<\/strong>\n        event_time\n      <strong>MEASURES<\/strong>\n        LAST(AFTER_TIME_INTERVAL.event_time)  AS event_time,\n        LAST(AFTER_TIME_INTERVAL.temperature) AS temperature,\n        LAST(AFTER_TIME_INTERVAL.humidity)    AS humidity\n      ONE ROW PER MATCH\n      AFTER MATCH SKIP TO LAST AFTER_TIME_INTERVAL\n      <strong>PATTERN<\/strong> (\n        LAST_ALERT   DURING_TIME_INTERVAL*   AFTER_TIME_INTERVAL\n      )\n      <strong>DEFINE<\/strong>\n        DURING_TIME_INTERVAL AS\n          event_time &lt;= TIMESTAMPADD(MINUTE, 10, LAST_ALERT.event_time),\n        AFTER_TIME_INTERVAL  AS\n          event_time &gt;  TIMESTAMPADD(MINUTE, 10, LAST_ALERT.event_time)\n);<\/pre>\n<p><a href=\"https:\/\/images.dalelane.co.uk\/2025-01-28-actionable\/screenshot-5.png?raw=true\"><img decoding=\"async\" style=\"width: 100%; max-width: 450px; border: thin black solid;\" src=\"https:\/\/images.dalelane.co.uk\/2025-01-28-actionable\/screenshot-5.png?raw=true\"\/><\/a><\/p>\n<p>This is a pattern that you want Flink to recognize:<\/p>\n<ul>\n<li>the last alert that was emitted (for a given sensor id)<\/li>\n<li>followed by zero-or-more events from the same sensor &#8211; (which have timestamps within 10 minutes of the alert)<\/li>\n<li>followed by an event from the same sensor &#8211; (which has a timestamp that is more than 10 minutes after the alert)<\/li>\n<\/ul>\n<p>This results in a much cleaner output that will be easier to consume:<\/p>\n<p><a href=\"https:\/\/images.dalelane.co.uk\/2025-01-28-actionable\/screenshot-6.png?raw=true\"><img decoding=\"async\" style=\"width: 100%; max-width: 450px; border: thin black solid;\" src=\"https:\/\/images.dalelane.co.uk\/2025-01-28-actionable\/screenshot-6.png?raw=true\"\/><\/a><\/p>\n<h3>option 2 : threshold-based<\/h3>\n<p>After emitting an actionable alert, you can ignore any subsequent events until the next time the value goes below the threshold again.<\/p>\n<p><a href=\"https:\/\/images.dalelane.co.uk\/2025-01-28-actionable\/diagram-6.jpg?raw=true\"><img decoding=\"async\" style=\"width: 100%; max-width: 450px;\" src=\"https:\/\/images.dalelane.co.uk\/2025-01-28-actionable\/diagram-6.jpg?raw=true\"\/><\/a><\/p>\n<p>After the first actionable alert, you can assume that if the value returns under the threshold again, this means the issue has now been resolved.<\/p>\n<p>For example, I would infer that the issue in the screenshot below was resolved at 15:30:21 :<\/p>\n<p><a href=\"https:\/\/images.dalelane.co.uk\/2025-01-28-actionable\/screenshot-7.png?raw=true\"><img decoding=\"async\" style=\"width: 100%; max-width: 450px; border: thin black solid;\" src=\"https:\/\/images.dalelane.co.uk\/2025-01-28-actionable\/screenshot-7.png?raw=true\"\/><\/a><\/p>\n<p>If it increases again after 15:30:21, you want another notification.<\/p>\n<p>For example, doing this with the \u201cLoosehanger\u201d sensor events:<\/p>\n<pre style=\"border: thin #AA0000 solid; color: #770000; background-color: #ffffc0; padding: 1em; overflow-y: scroll; overflow-x: scroll; font-size: 1em; white-space: pre; max-height: 540px;\">CREATE TEMPORARY VIEW MATCHREC AS\n  SELECT\n    *\n  FROM\n    `ALL SENSORS`\n\n    MATCH_RECOGNIZE (\n      <strong>PARTITION BY<\/strong>\n        sensorid\n      <strong>ORDER BY<\/strong>\n        event_time\n      <strong>MEASURES<\/strong>\n        event_time  AS event_time,\n        temperature AS temperature,\n        humidity    AS humidity\n      ONE ROW PER MATCH\n      AFTER MATCH SKIP PAST LAST ROW\n      <strong>PATTERN<\/strong> (\n        BELOW_THRESHOLD ABOVE_THRESHOLD\n      )\n      <strong>DEFINE<\/strong>\n        BELOW_THRESHOLD AS temperature &lt;= 23.5 AND humidity &lt;= 58,\n        ABOVE_THRESHOLD AS temperature &gt; 23.5  OR  humidity &gt; 58\n);<\/pre>\n<p><a href=\"https:\/\/images.dalelane.co.uk\/2025-01-28-actionable\/screenshot-8.png?raw=true\"><img decoding=\"async\" style=\"width: 100%; max-width: 450px; border: thin black solid;\" src=\"https:\/\/images.dalelane.co.uk\/2025-01-28-actionable\/screenshot-8.png?raw=true\"\/><\/a><\/p>\n<p>This is a pattern that you want Flink to recognize: the value (for a given sensor id) has exceeded a threshold (where the last event for the same sensor id was previously below the threshold).<\/p>\n<p><a href=\"https:\/\/images.dalelane.co.uk\/2025-01-28-actionable\/screenshot-9.png?raw=true\"><img decoding=\"async\" style=\"width: 100%; max-width: 450px; border: thin black solid;\" src=\"https:\/\/images.dalelane.co.uk\/2025-01-28-actionable\/screenshot-9.png?raw=true\"\/><\/a><\/p>\n<h3>Choosing an approach<\/h3>\n<p>The best approach for your use case will depend on the nature of the events you are processing, and the way that you are responding to them.<\/p>\n<p>For example, is it reasonable to assume that if the value drops below the threshold, this means that the issue has been resolved? If the values in your stream of events are very spiky during a period that needs attention then the threshold approach may be less effective.<\/p>\n<p><a href=\"https:\/\/images.dalelane.co.uk\/2025-01-28-actionable\/diagram-7.jpg?raw=true\"><img decoding=\"async\" style=\"width: 100%; max-width: 450px;\" src=\"https:\/\/images.dalelane.co.uk\/2025-01-28-actionable\/diagram-7.jpg?raw=true\"\/><\/a><\/p>\n<p>Alternatively, if the time to resolution for the actionable alert can be long, or even just difficult to predict, then the time interval approach may be less effective.<\/p>\n<p><a href=\"https:\/\/images.dalelane.co.uk\/2025-01-28-actionable\/diagram-8.jpg?raw=true\"><img decoding=\"async\" style=\"width: 100%; max-width: 450px;\" src=\"https:\/\/images.dalelane.co.uk\/2025-01-28-actionable\/diagram-8.jpg?raw=true\"\/><\/a><\/p>\n<p>You will want to customize your event processing job to fit your specific requirements. <\/p>\n<p>The key is that the better job you can do to describe the output you want, the more useful and actionable the output from your Flink jobs.<\/p>\n","protected":false},"excerpt":{"rendered":"<p>In this post, I want to share two examples of how you can use pattern recognition in Apache Flink to turn a noisy stream of output into something useful and actionable. Imagine that you have a Kafka topic with a stream of events that you sometimes need to respond to. You can think of this [&hellip;]<\/p>\n","protected":false},"author":1,"featured_media":5478,"comment_status":"open","ping_status":"closed","sticky":false,"template":"","format":"standard","meta":{"footnotes":""},"categories":[7],"tags":[593,610,584],"class_list":["post-5477","post","type-post","status-publish","format-standard","has-post-thumbnail","hentry","category-code","tag-apachekafka","tag-flink","tag-kafka"],"_links":{"self":[{"href":"https:\/\/dalelane.co.uk\/blog\/index.php?rest_route=\/wp\/v2\/posts\/5477","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/dalelane.co.uk\/blog\/index.php?rest_route=\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/dalelane.co.uk\/blog\/index.php?rest_route=\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/dalelane.co.uk\/blog\/index.php?rest_route=\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/dalelane.co.uk\/blog\/index.php?rest_route=%2Fwp%2Fv2%2Fcomments&post=5477"}],"version-history":[{"count":1,"href":"https:\/\/dalelane.co.uk\/blog\/index.php?rest_route=\/wp\/v2\/posts\/5477\/revisions"}],"predecessor-version":[{"id":5902,"href":"https:\/\/dalelane.co.uk\/blog\/index.php?rest_route=\/wp\/v2\/posts\/5477\/revisions\/5902"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/dalelane.co.uk\/blog\/index.php?rest_route=\/wp\/v2\/media\/5478"}],"wp:attachment":[{"href":"https:\/\/dalelane.co.uk\/blog\/index.php?rest_route=%2Fwp%2Fv2%2Fmedia&parent=5477"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/dalelane.co.uk\/blog\/index.php?rest_route=%2Fwp%2Fv2%2Fcategories&post=5477"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/dalelane.co.uk\/blog\/index.php?rest_route=%2Fwp%2Fv2%2Ftags&post=5477"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}