{"id":5483,"date":"2025-01-31T14:01:15","date_gmt":"2025-01-31T14:01:15","guid":{"rendered":"https:\/\/dalelane.co.uk\/blog\/?p=5483"},"modified":"2026-03-14T21:28:42","modified_gmt":"2026-03-14T21:28:42","slug":"understanding-event-processing-behaviour-with-opentelemetry","status":"publish","type":"post","link":"https:\/\/dalelane.co.uk\/blog\/?p=5483","title":{"rendered":"Understanding event processing behaviour with OpenTelemetry"},"content":{"rendered":"<p>When using Apache Kafka, timely processing of events is an important consideration.<\/p>\n<p>Understanding the <strong>throughput<\/strong> of your event processing solution is typically straightforward : by counting how many events you can process a second.<\/p>\n<p>Understanding <strong>latency<\/strong> (how long it takes from when an event is first emitted, to when it has finished being processed and an action has been taken in response) requires more coordination to be able to measure.<\/p>\n<p><strong>OpenTelemetry<\/strong> helps with this, by collecting and correlating information from the different components that are producing and consuming events.<\/p>\n<p>From <a href=\"https:\/\/opentelemetry.io\">opentelemetry.io<\/a>:<\/p>\n<blockquote><p>OpenTelemetry is a collection of APIs, SDKs, and tools. Use it to instrument, generate, collect, and export telemetry data (metrics, logs, and traces) to help you analyze your software\u2019s performance and behavior.<\/p><\/blockquote>\n<h3>A distributed event processing solution<\/h3>\n<p>To understand what is possible, I\u2019ll use a simple (if contrived!) example of an event processing architecture.<\/p>\n<p><img decoding=\"async\" src=\"https:\/\/images.dalelane.co.uk\/2025-01-31-otel\/diagram-demo.png?raw=true\" style=\"width: 100%; max-width: 450px; border: thin black solid;\"\/><\/p>\n<p><!--more-->In this demo:<\/p>\n<ol>\n<li><code style=\"font-weight: bold; color: #770000;\">producer<\/code> <br \/> A <strong>Java application<\/strong> reads JSON files in a folder, and produces the contents of each file as an event to a Kafka topic called <code style=\"font-weight: bold;\">OTEL.INPUT<\/code><\/li>\n<li><code style=\"font-weight: bold; color: #770000;\">enrich<\/code> <br \/> A <strong>Flink SQL job<\/strong> consumes the events from <code style=\"font-weight: bold;\">OTEL.INPUT<\/code>, enriches each event with additional data it retrieves from a REST API, and produces the output to a Kafka topic called <code style=\"font-weight: bold;\">OTEL.PROCESSING<\/code><\/li>\n<li><code style=\"font-weight: bold; color: #770000;\">filter<\/code> <br \/> A <strong>Kafka Streams<\/strong> application consumes events from the <code style=\"font-weight: bold;\">OTEL.PROCESSING<\/code> topic, applies a filter and produces matching events to a Kafka topic called <code style=\"font-weight: bold;\">OTEL.OUTPUT<\/code><\/li>\n<li><code style=\"font-weight: bold; color: #770000;\">consumer<\/code> <br \/> A <strong>Java application<\/strong> consumes events from the <code style=\"font-weight: bold;\">OTEL.OUTPUT<\/code> Kafka topic and prints the contents to the console<\/li>\n<\/ol>\n<h3>Understanding the behaviour<\/h3>\n<p>A simple understanding of the latency for this solution means understanding how long it takes from when <code style=\"font-weight: bold; color: #770000;\">producer<\/code> sends the file contents to the <code style=\"font-weight: bold;\">OTEL.INPUT<\/code> topic, to when <code style=\"font-weight: bold; color: #770000;\">consumer<\/code> prints out the results from processing it.<\/p>\n<p>A detailed insight into the performance of this solution means understanding the time spent by each of the parts of the solution.<\/p>\n<p>Instrumenting each of the components lets them independently submit \u201cspans\u201d (recordings of what they did and when) to an OpenTelemetry collector. The collector correlates the spans it receives into end-to-end traces.<\/p>\n<p><img decoding=\"async\" src=\"https:\/\/images.dalelane.co.uk\/2025-01-31-otel\/diagram-demo-with-otel.png?raw=true\" style=\"width: 100%; max-width: 450px; border: thin black solid;\"\/><\/p>\n<h3>Deploying the OpenTelemetry collector<\/h3>\n<p>OpenTelemetry is the specification and protocol for how to do all of this, and there are several implementations available.<\/p>\n<p>For my demo, I used Grafana Tempo, because there is a Kubernetes Operator that made it very easy to create a quick demo.<\/p>\n<pre style=\"border: thin #AA0000 solid; color: #770000; background-color: #ffffc0; padding: 1em; overflow-y: scroll; overflow-x: scroll; font-size: 1em; white-space: pre; max-height: 600px; margin-bottom: 0;\">apiVersion: tempo.grafana.com\/v1alpha1\nkind: TempoMonolithic\nmetadata:\n  name: otel\n  namespace: monitoring\nspec:\n  jaegerui:\n    enabled: true\n    resources:\n      limits:\n        cpu: '2'\n        memory: 2Gi\n    route:\n      enabled: true\n  resources:\n    limits:\n      cpu: '2'\n      memory: 2Gi\n  storage:\n    traces:\n      backend: memory<\/pre>\n<div style=\"margin-top: 3px; text-align: right;\"><a href=\"https:\/\/github.com\/dalelane\/sample-kafka-java-apps\/blob\/opentelemetry\/deploy\/01-otel-collector.yaml\"><code style=\"font-weight: bold; color: #770000; font-size: 0.8em;\">01-otel-collector.yaml<\/code><\/a><\/div>\n<p>There are many alternative OpenTelemetry implementations that I could have used instead. You can see a list on the OpenTelemetry website at <a href=\"https:\/\/opentelemetry.io\/ecosystem\/vendors\/\">opentelemetry.io\/ecosystem\/vendors<\/a><\/p>\n<h3>Instrumenting the producer<\/h3>\n<p><img decoding=\"async\" src=\"https:\/\/images.dalelane.co.uk\/2025-01-31-otel\/diagram-producer.png?raw=true\" style=\"width: 100%; max-width: 450px; border: thin black solid;\"\/><\/p>\n<p>The <code style=\"font-weight: bold; color: #770000;\">producer<\/code> application uses the Kafka Java client to send events to Kafka.<\/p>\n<p>No code changes were needed to get it to submit trace spans to the OpenTelemetry collector, but a few config changes were needed. You can see them in <a href=\"https:\/\/github.com\/dalelane\/sample-kafka-java-apps\/commit\/72ae5b80612ea0166ecc3eac2f0b39d4a876be52\">this commit<\/a>, but I&#8217;ll walk through them here.<\/p>\n<p>The simplest way to add the <strong>additional Java libraries<\/strong> needed was to get Maven to fetch them for me, by adding this to the dependencies list in my pom.xml<\/p>\n<pre style=\"border: thin #AA0000 solid; color: #770000; background-color: #ffffc0; padding: 1em; overflow-y: scroll; overflow-x: scroll; font-size: 1em; white-space: pre; max-height: 600px; margin-bottom: 0;\">\n&lt;dependency&gt;\n    &lt;groupId&gt;io.opentelemetry.instrumentation&lt;\/groupId&gt;\n    &lt;artifactId&gt;opentelemetry-kafka-clients-2.6&lt;\/artifactId&gt;\n    &lt;version&gt;2.11.0-alpha&lt;\/version&gt;\n&lt;\/dependency&gt;\n<\/pre>\n<div style=\"margin-top: 3px; text-align: right; font-size: 0.8em;\"><a href=\"https:\/\/github.com\/dalelane\/sample-kafka-java-apps\/blob\/opentelemetry\/pom.xml#L76-L81\"><code style=\"font-weight: bold; color: #770000;\">pom.xml<\/code><\/a> <a href=\"https:\/\/github.com\/dalelane\/sample-kafka-java-apps\/commit\/72ae5b80612ea0166ecc3eac2f0b39d4a876be52#diff-9c5fb3d1b7e3b0f54bc5c4182965c4fe1f9023d449017cece3005d3f90e8e4d8R75-R81\">(commit)<\/a><\/div>\n<p>I updated the <strong>properties file<\/strong> that is used to configure the Kafka Producer by adding this line:<\/p>\n<pre style=\"border: thin #AA0000 solid; color: #770000; background-color: #ffffc0; padding: 1em; overflow-y: scroll; overflow-x: scroll; font-size: 1em; white-space: pre; max-height: 600px; margin-bottom: 0;\">\ninterceptor.classes=io.opentelemetry.instrumentation.kafkaclients.v2_6.TracingProducerInterceptor\n<\/pre>\n<div style=\"margin-top: 3px; text-align: right; font-size: 0.8em;\"><a href=\"https:\/\/github.com\/dalelane\/sample-kafka-java-apps\/blob\/opentelemetry\/testdata\/producer.properties#L7-L8\"><code style=\"font-weight: bold; color: #770000;\">producer.properties<\/code><\/a> <a href=\"https:\/\/github.com\/dalelane\/sample-kafka-java-apps\/commit\/72ae5b80612ea0166ecc3eac2f0b39d4a876be52#diff-5e2ed5d0988e855e6ceeafad8ad38b708978a3615389500700dfbe490dc104e1R7-R9\">(commit)<\/a><\/div>\n<p>Finally I updated the script used to run the application to add a Java agent and <strong>environment variables<\/strong> that configure:<\/p>\n<ul>\n<li>how the producer should identify itself in the traces<\/li>\n<li>what should be submitted to OpenTelemetry (OTel can be used to collect metrics and logs as well as traces &#8211; for this demo, I only want traces)<\/li>\n<li>where to submit the traces<\/li>\n<\/ul>\n<pre style=\"border: thin #AA0000 solid; color: #770000; background-color: #ffffc0; padding: 1em; overflow-y: scroll; overflow-x: scroll; font-size: 1em; white-space: pre; max-height: 600px; margin-bottom: 0;\">\nexport OTEL_SERVICE_NAME=demo-producer\n\nexport OTEL_TRACES_EXPORTER=otlp\nexport OTEL_METRICS_EXPORTER=none\nexport OTEL_LOGS_EXPORTER=none\n\nexport OTEL_EXPORTER_OTLP_ENDPOINT=http:\/\/$(oc get route -nmonitoring tempo-otel-http -ojsonpath='{.spec.host}')\n\njava \\\n    -javaagent:$(pwd)\/opentelemetry-javaagent.jar \\\n    -cp .\/target\/sample-kafka-apps-0.0.5-jar-with-dependencies.jar \\\n    com.ibm.eventautomation.demos.producers.JsonProducer\n<\/pre>\n<div style=\"margin-top: 3px; text-align: right; font-size: 0.8em;\"><a href=\"https:\/\/github.com\/dalelane\/sample-kafka-java-apps\/blob\/opentelemetry\/scripts\/produce-json.sh#L7-L16\"><code style=\"font-weight: bold; color: #770000;\">produce-json.sh<\/code><\/a> <a href=\"https:\/\/github.com\/dalelane\/sample-kafka-java-apps\/commit\/72ae5b80612ea0166ecc3eac2f0b39d4a876be52#diff-f18e2aab2501a745c9984453cda0404378922a2a571f66c50bbd19020920f511R7-R16\">(commit)<\/a><\/div>\n<h3>Instrumenting Apache Flink<\/h3>\n<p><img decoding=\"async\" src=\"https:\/\/images.dalelane.co.uk\/2025-01-31-otel\/diagram-enrich.png?raw=true\" style=\"width: 100%; max-width: 450px; border: thin black solid;\"\/><\/p>\n<p>The <code style=\"font-weight: bold; color: #770000;\">enrich<\/code> component is a Apache Flink SQL job, running in a Flink Session cluster managed by a Kubernetes Operator.<\/p>\n<p>I made a few small changes to the SQL, and updated the way the job is deployed and run. <\/p>\n<p>The simplest way to add the <strong>additional Java libraries<\/strong> needed was to download them (<a href=\"https:\/\/github.com\/dalelane\/sample-kafka-java-apps\/blob\/opentelemetry\/deploy\/flink\/download-jars.sh\"><code style=\"font-weight: bold; color: #770000; font-size: 0.9em;\">download-jars.sh<\/code><\/a>) and then build them (<a href=\"https:\/\/github.com\/dalelane\/sample-kafka-java-apps\/blob\/opentelemetry\/deploy\/flink\/build-and-push.sh\"><code style=\"font-weight: bold; color: #770000; font-size: 0.9em;\">build-and-push.sh<\/code><\/a>) into a custom container image:<\/p>\n<pre style=\"border: thin #AA0000 solid; color: #770000; background-color: #ffffc0; padding: 1em; overflow-y: scroll; overflow-x: scroll; font-size: 1em; white-space: pre; max-height: 600px; margin-bottom: 0;\">\nFROM &lt;my-standard-flink-image&gt;:&lt;my-flink-version&gt;\n\nCOPY opentelemetry-javaagent.jar \/opt\/flink\/lib\/\nCOPY target\/dependencies\/*.jar   \/opt\/flink\/lib\/\n<\/pre>\n<div style=\"margin-top: 3px; text-align: right; font-size: 0.8em;\"><a href=\"https:\/\/github.com\/dalelane\/sample-kafka-java-apps\/blob\/opentelemetry\/deploy\/flink\/Dockerfile\"><code style=\"font-weight: bold; color: #770000;\">Dockerfile<\/code><\/a><\/div>\n<p>The <strong>properties<\/strong> for the Kafka consumer and producer used by Flink are provided as part of the SQL table definitions for the source and sink tables, so I updated them by adding these lines:<\/p>\n<pre style=\"border: thin #AA0000 solid; color: #770000; background-color: #ffffc0; padding: 1em; overflow-y: scroll; overflow-x: scroll; font-size: 1em; white-space: pre; max-height: 600px; margin-bottom: 0;\">\n# source node\n\nCREATE TABLE INPUT__TABLE\n(\n    ...\n    # get access to the headers in the consumed Kafka messages\n    INPUT___KAFKA_HEADERS MAP&lt;STRING, STRING&gt; METADATA FROM 'headers',\n)\nWITH\n(\n    ...\n    # add the OTel support to the Kafka consumer used by Flink\n    'properties.interceptor.classes' = 'io.opentelemetry.instrumentation.kafkaclients.v2_6.TracingConsumerInterceptor'\n);\n\nCREATE TEMPORARY VIEW INPUT AS\nSELECT\n    ...\n    # extracting trace context identifier from the headers\n    INPUT___KAFKA_HEADERS ['traceparent'] AS `traceparent`\nFROM\n    INPUT__TABLE;\n<\/pre>\n<pre style=\"border: thin #AA0000 solid; color: #770000; background-color: #ffffc0; padding: 1em; overflow-y: scroll; overflow-x: scroll; font-size: 1em; white-space: pre; max-height: 600px; margin-bottom: 0;\">\n# sink node\n\nCREATE TEMPORARY VIEW TRANSFORM AS\nSELECT\n    ...\n    # prepare the trace context for the output Kafka events\n    MAP[ 'traceparent', traceparent ] AS `headers`\nFROM\n    ...\n\nCREATE TABLE OUTPUT\n(\n    ...\n    # add the trace context to the headers for the output Kafka events\n    headers MAP&lt;STRING, STRING&gt; METADATA\n)\nWITH\n(\n    ...\n    # add the OTel support to the Kafka producer used by Flink\n    'properties.interceptor.classes' = 'io.opentelemetry.instrumentation.kafkaclients.v2_6.TracingProducerInterceptor'\n);\n<\/pre>\n<p>I used the Event Processing UI to create the Flink job, but I could equally well have written the Flink SQL for myself. The changes above are the overrides I needed to add to the default SQL that the low-code UI generated.<\/p>\n<p><img decoding=\"async\" src=\"https:\/\/images.dalelane.co.uk\/2025-01-31-otel\/screenshot-eventprocessing.png?raw=true\" style=\"width: 100%; max-width: 450px; border: thin black solid;\"\/><\/p>\n<p>Finally I updated the Kubernetes custom resource used to deploy the Flink session cluster to configure:<\/p>\n<ul>\n<li>how Flink should identify itself in the traces<\/li>\n<li>what should be submitted to OpenTelemetry (OpenTelemetry can be used to collect metrics and logs as well as traces &#8211; for this demo, I only want traces)<\/li>\n<li>where to submit the traces<\/li>\n<li>the custom container image to use, with the Java agent<\/li>\n<\/ul>\n<p>These are the changes I made to my Flink deployment to add the OpenTelemetry support:<\/p>\n<pre style=\"border: thin #AA0000 solid; color: #770000; background-color: #ffffc0; padding: 1em; overflow-y: scroll; overflow-x: scroll; font-size: 1em; white-space: pre; max-height: 600px; margin-bottom: 0;\">\napiVersion: flink.apache.org\/v1beta1\nkind: FlinkDeployment\nmetadata:\n  name: my-flink\n  namespace: event-automation\nspec:\n  # custom container image with the additional OpenTelemetry dependencies\n  image: 'image-registry.openshift-image-registry.svc:5000\/event-automation\/flink-opentelemetry:1'\n\n  # use the OpenTelemetry agent\n  flinkConfiguration:\n    env.java.opts.taskmanager: '-javaagent:lib\/opentelemetry-javaagent.jar'\n    env.java.opts.jobmanager: '-javaagent:lib\/opentelemetry-javaagent.jar'\n\n  # environment variables to configure what to submit to OTel and where\n  podTemplate:\n    spec:\n      containers:\n        - env:\n            - name: OTEL_SERVICE_NAME\n              value: flink\n            - name: OTEL_TRACES_EXPORTER\n              value: otlp\n            - name: OTEL_METRICS_EXPORTER\n              value: none\n            - name: OTEL_LOGS_EXPORTER\n              value: none\n            - name: OTEL_EXPORTER_OTLP_ENDPOINT\n              value: http:\/\/tempo-otel.monitoring:4318\n          name: flink-main-container\n<\/pre>\n<h3>Instrumenting the REST API<\/h3>\n<p><img decoding=\"async\" src=\"https:\/\/images.dalelane.co.uk\/2025-01-31-otel\/diagram-lookup.png?raw=true\" style=\"width: 100%; max-width: 450px; border: thin black solid;\"\/><\/p>\n<p>The <code style=\"font-weight: bold; color: #770000;\">lookup<\/code> REST API that the Flink job uses to enrich the events is running in an OpenLiberty server.<\/p>\n<p>No code changes were needed to get it to submit trace spans to the OpenTelemetry collector, but a few config changes were needed. You can see them in <a href=\"https:\/\/github.com\/dalelane\/kafka-demos\/commit\/5941b963499167c5d972d188de9bb3389d4ca2b1\">this commit<\/a>, but I&#8217;ll walk through them here.<\/p>\n<p>The simplest way to instrument this was to add the telemetry feature to the server.xml, and configure it to specify what to submit to OpenTelemetry. (OpenTelemetry can be used to collect metrics and logs as well as traces &#8211; for this demo, I only want traces):<\/p>\n<pre style=\"border: thin #AA0000 solid; color: #770000; background-color: #ffffc0; padding: 1em; overflow-y: scroll; overflow-x: scroll; font-size: 1em; white-space: pre; max-height: 600px; margin-bottom: 0;\">\n&lt;server&gt;\n    &lt;featureManager&gt;\n        ...\n        &lt;!-- enable OpenTelemetry support --&gt;\n        &lt;feature&gt;mpTelemetry-2.0&lt;\/feature&gt;\n    &lt;\/featureManager&gt;\n\n    &lt;!-- submit traces to OpenTelemetry collector --&gt;\n    &lt;mpTelemetry\n        source=\"trace\"\/&gt;\n&lt;\/server&gt;\n<\/pre>\n<div style=\"margin-top: 3px; text-align: right; font-size: 0.8em;\"><a href=\"https:\/\/github.com\/dalelane\/kafka-demos\/blob\/opentelemetry\/apps\/loosehangerapi\/src\/main\/liberty\/config\/server.xml#L12-L27\"><code style=\"font-weight: bold; color: #770000;\">server.xml<\/code><\/a> <a href=\"https:\/\/github.com\/dalelane\/kafka-demos\/commit\/5941b963499167c5d972d188de9bb3389d4ca2b1#diff-545446b0c681fdb40c01948d6b9562e5ba706e86d624edec86f38140596a5d20R11-R27\">(commit)<\/a><\/div>\n<p>I configured OpenTelemetry with details of where to send the traces by adding this to <strong>bootstrap.properties<\/strong>. As I\u2019m running the Liberty server in Kubernetes, the simplest way to do this was by putting it in a Secret and mounting it into the Liberty deployment.<\/p>\n<pre style=\"border: thin #AA0000 solid; color: #770000; background-color: #ffffc0; padding: 1em; overflow-y: scroll; overflow-x: scroll; font-size: 1em; white-space: pre; max-height: 600px; margin-bottom: 0;\">\nkind: Secret\napiVersion: v1\nmetadata:\n  name: loosehangerapi-otel\ntype: Opaque\nstringData:\n  bootstrap.properties: |+\n    otel.sdk.disabled=false\n    otel.service.name=loosehanger-api\n    otel.exporter.otlp.endpoint=http:\/\/tempo-otel.monitoring:4317\n    otel.traces.exporter=otlp\n    otel.metrics.exporter=none\n    otel.logs.exporter=none\n---\napiVersion: apps\/v1\nkind: Deployment\nmetadata:\n  name: loosehanger-api\n  ...\nspec:\n  template:\n    spec:\n      containers:\n      - name: apiserver\n        ...\n        volumeMounts:\n          - name: loosehangerapi-otel\n            mountPath: \/opt\/ol\/wlp\/usr\/servers\/defaultServer\/bootstrap.properties\n            subPath: bootstrap.properties\n            readOnly: true\n      volumes:\n        - name: loosehangerapi-otel\n          secret:\n            secretName: loosehangerapi-otel\n<\/pre>\n<div style=\"margin-top: 3px; text-align: right; font-size: 0.8em;\"><a href=\"https:\/\/github.com\/dalelane\/kafka-demos\/blob\/opentelemetry\/apps\/loosehangerapi\/ocp-deploy.yaml#L52-L60\"><code style=\"font-weight: bold; color: #770000;\">ocp-deploy.yaml<\/code><\/a> <a href=\"https:\/\/github.com\/dalelane\/kafka-demos\/commit\/5941b963499167c5d972d188de9bb3389d4ca2b1#diff-57f3aaee230692e346fffa180d292b9c4fef3b89894d07b291e5411f59a9b855R52-R60\">(commit)<\/a><\/div>\n<h3>Instrumenting Kafka Streams<\/h3>\n<p><img decoding=\"async\" src=\"https:\/\/images.dalelane.co.uk\/2025-01-31-otel\/diagram-filter.png?raw=true\" style=\"width: 100%; max-width: 450px; border: thin black solid;\"\/><\/p>\n<p>The <code style=\"font-weight: bold; color: #770000;\">filter<\/code> application uses Kafka Streams to produce a subset of the input events.<\/p>\n<p>No code changes were needed to get it to submit trace spans to the OpenTelemetry collector, but a few config changes were needed. You can see them in <a href=\"https:\/\/github.com\/dalelane\/sample-kafka-java-apps\/commit\/72ae5b80612ea0166ecc3eac2f0b39d4a876be52\">this commit<\/a>, but I&#8217;ll walk through them here.<\/p>\n<p>The simplest way to add the <strong>additional Java libraries<\/strong> needed was to get Maven to fetch them for me, by adding this to the dependencies list in my pom.xml<\/p>\n<pre style=\"border: thin #AA0000 solid; color: #770000; background-color: #ffffc0; padding: 1em; overflow-y: scroll; overflow-x: scroll; font-size: 1em; white-space: pre; max-height: 600px; margin-bottom: 0;\">\n&lt;dependency&gt;\n    &lt;groupId&gt;io.opentelemetry.instrumentation&lt;\/groupId&gt;\n    &lt;artifactId&gt;opentelemetry-kafka-clients-2.6&lt;\/artifactId&gt;\n    &lt;version&gt;2.11.0-alpha&lt;\/version&gt;\n&lt;\/dependency&gt;\n<\/pre>\n<div style=\"margin-top: 3px; text-align: right; font-size: 0.8em;\"><a href=\"https:\/\/github.com\/dalelane\/sample-kafka-java-apps\/blob\/opentelemetry\/pom.xml#L76-L81\"><code style=\"font-weight: bold; color: #770000;\">pom.xml<\/code><\/a> <a href=\"https:\/\/github.com\/dalelane\/sample-kafka-java-apps\/commit\/72ae5b80612ea0166ecc3eac2f0b39d4a876be52#diff-9c5fb3d1b7e3b0f54bc5c4182965c4fe1f9023d449017cece3005d3f90e8e4d8R75-R81\">(commit)<\/a><\/div>\n<p>I updated the <strong>properties file<\/strong> that is used to configure the Kafka Streams application by adding these lines:<\/p>\n<pre style=\"border: thin #AA0000 solid; color: #770000; background-color: #ffffc0; padding: 1em; overflow-y: scroll; overflow-x: scroll; font-size: 1em; white-space: pre; max-height: 600px; margin-bottom: 0;\">\nconsumer.interceptor.classes=io.opentelemetry.instrumentation.kafkaclients.v2_6.TracingConsumerInterceptor\nproducer.interceptor.classes=io.opentelemetry.instrumentation.kafkaclients.v2_6.TracingProducerInterceptor\n<\/pre>\n<div style=\"margin-top: 3px; text-align: right; font-size: 0.8em;\"><a href=\"https:\/\/github.com\/dalelane\/sample-kafka-java-apps\/blob\/opentelemetry\/testdata\/streams.properties#L13-L15\"><code style=\"font-weight: bold; color: #770000;\">streams.properties<\/code><\/a> <a href=\"https:\/\/github.com\/dalelane\/sample-kafka-java-apps\/commit\/72ae5b80612ea0166ecc3eac2f0b39d4a876be52#diff-e02d207effe017087c87c864a18b6147d03d78fb1782816d546f961b6e7b0253R13-R16\">(commit)<\/a><\/div>\n<p>Finally I updated the script used to run the application to add a Java agent and <strong>environment variables<\/strong> that configure:<\/p>\n<ul>\n<li>how the application should identify itself in the traces<\/li>\n<li>what should be submitted to OpenTelemetry (OTel can be used to collect metrics and logs as well as traces &#8211; for this demo, I only want traces)<\/li>\n<li>where to submit the traces<\/li>\n<\/ul>\n<pre style=\"border: thin #AA0000 solid; color: #770000; background-color: #ffffc0; padding: 1em; overflow-y: scroll; overflow-x: scroll; font-size: 1em; white-space: pre; max-height: 600px; margin-bottom: 0;\">\nexport OTEL_SERVICE_NAME=demo-processor\n\nexport OTEL_TRACES_EXPORTER=otlp\nexport OTEL_METRICS_EXPORTER=none\nexport OTEL_LOGS_EXPORTER=none\n\nexport OTEL_EXPORTER_OTLP_ENDPOINT=http:\/\/$(oc get route -nmonitoring tempo-otel-http -ojsonpath='{.spec.host}')\n\njava \\\n    -javaagent:$(pwd)\/opentelemetry-javaagent.jar \\\n    -cp .\/target\/sample-kafka-apps-0.0.5-jar-with-dependencies.jar \\\n    com.ibm.eventautomation.demos.streamprocessors.JsonProcessor\n<\/pre>\n<div style=\"margin-top: 3px; text-align: right; font-size: 0.8em;\"><a href=\"https:\/\/github.com\/dalelane\/sample-kafka-java-apps\/blob\/opentelemetry\/scripts\/process-json.sh#L7-L16\"><code style=\"font-weight: bold; color: #770000;\">process-json.sh<\/code><\/a> <a href=\"https:\/\/github.com\/dalelane\/sample-kafka-java-apps\/commit\/72ae5b80612ea0166ecc3eac2f0b39d4a876be52#diff-52d1ca4f552a4b2a1ac101e8121ca1b56813ac21f67035bcab52bcbd3dca3a3bR7-R16\">(commit)<\/a><\/div>\n<h3>Instrumenting the consumer<\/h3>\n<p><img decoding=\"async\" src=\"https:\/\/images.dalelane.co.uk\/2025-01-31-otel\/diagram-consumer.png?raw=true\" style=\"width: 100%; max-width: 450px; border: thin black solid;\"\/><\/p>\n<p>The <code style=\"font-weight: bold; color: #770000;\">consumer<\/code> application uses the Kafka Java client to consume events from the output Kafka topic.<\/p>\n<p>No code changes were needed to get it to submit trace spans to the OpenTelemetry collector, but a few config changes were needed. You can see them in <a href=\"https:\/\/github.com\/dalelane\/sample-kafka-java-apps\/commit\/72ae5b80612ea0166ecc3eac2f0b39d4a876be52\">this commit<\/a>, but I&#8217;ll walk through them here.<\/p>\n<p>The simplest way to add the <strong>additional Java libraries<\/strong> needed was to get Maven to fetch them for me, by adding this to the dependencies list in my pom.xml<\/p>\n<pre style=\"border: thin #AA0000 solid; color: #770000; background-color: #ffffc0; padding: 1em; overflow-y: scroll; overflow-x: scroll; font-size: 1em; white-space: pre; max-height: 600px; margin-bottom: 0;\">\n&lt;dependency&gt;\n    &lt;groupId&gt;io.opentelemetry.instrumentation&lt;\/groupId&gt;\n    &lt;artifactId&gt;opentelemetry-kafka-clients-2.6&lt;\/artifactId&gt;\n    &lt;version&gt;2.11.0-alpha&lt;\/version&gt;\n&lt;\/dependency&gt;\n<\/pre>\n<div style=\"margin-top: 3px; text-align: right; font-size: 0.8em;\"><a href=\"https:\/\/github.com\/dalelane\/sample-kafka-java-apps\/blob\/opentelemetry\/pom.xml#L76-L81\"><code style=\"font-weight: bold; color: #770000;\">pom.xml<\/code><\/a> <a href=\"https:\/\/github.com\/dalelane\/sample-kafka-java-apps\/commit\/72ae5b80612ea0166ecc3eac2f0b39d4a876be52#diff-9c5fb3d1b7e3b0f54bc5c4182965c4fe1f9023d449017cece3005d3f90e8e4d8R75-R81\">(commit)<\/a><\/div>\n<p>I updated the <strong>properties file<\/strong> that is used to configure the Kafka Consumer by adding this line:<\/p>\n<pre style=\"border: thin #AA0000 solid; color: #770000; background-color: #ffffc0; padding: 1em; overflow-y: scroll; overflow-x: scroll; font-size: 1em; white-space: pre; max-height: 600px; margin-bottom: 0;\">\ninterceptor.classes=io.opentelemetry.instrumentation.kafkaclients.v2_6.TracingConsumerInterceptor\n<\/pre>\n<div style=\"margin-top: 3px; text-align: right; font-size: 0.8em;\"><a href=\"https:\/\/github.com\/dalelane\/sample-kafka-java-apps\/blob\/opentelemetry\/testdata\/consumer.properties#L15-L16\"><code style=\"font-weight: bold; color: #770000;\">consumer.properties<\/code><\/a> <a href=\"https:\/\/github.com\/dalelane\/sample-kafka-java-apps\/commit\/72ae5b80612ea0166ecc3eac2f0b39d4a876be52#diff-d9a016a17914aedbc401d912c3e0c58cfcc5aa03de3ecf38ab8c7a42354f2df7R15-R17\">(commit)<\/a><\/div>\n<p>Finally I updated the script used to run the application to add a Java agent and <strong>environment variables<\/strong> that configure:<\/p>\n<ul>\n<li>how the consumer should identify itself in the traces<\/li>\n<li>what should be submitted to OpenTelemetry (OTel can be used to collect metrics and logs as well as traces &#8211; for this demo, I only want traces)<\/li>\n<li>where to submit the traces<\/li>\n<\/ul>\n<pre style=\"border: thin #AA0000 solid; color: #770000; background-color: #ffffc0; padding: 1em; overflow-y: scroll; overflow-x: scroll; font-size: 1em; white-space: pre; max-height: 600px; margin-bottom: 0;\">\nexport OTEL_SERVICE_NAME=demo-consumer\n\nexport OTEL_TRACES_EXPORTER=otlp\nexport OTEL_METRICS_EXPORTER=none\nexport OTEL_LOGS_EXPORTER=none\n\nexport OTEL_EXPORTER_OTLP_ENDPOINT=http:\/\/$(oc get route -nmonitoring tempo-otel-http -ojsonpath='{.spec.host}')\n\njava \\\n    -javaagent:$(pwd)\/opentelemetry-javaagent.jar \\\n    -cp .\/target\/sample-kafka-apps-0.0.5-jar-with-dependencies.jar \\\n    com.ibm.eventautomation.demos.consumers.JsonConsumer\n<\/pre>\n<div style=\"margin-top: 3px; text-align: right; font-size: 0.8em;\"><a href=\"https:\/\/github.com\/dalelane\/sample-kafka-java-apps\/blob\/opentelemetry\/scripts\/consume-json.sh#L7-L16\"><code style=\"font-weight: bold; color: #770000;\">consume-json.sh<\/code><\/a> <a href=\"https:\/\/github.com\/dalelane\/sample-kafka-java-apps\/commit\/72ae5b80612ea0166ecc3eac2f0b39d4a876be52#diff-ddc287a2f52cc2f6ceb495a91974f011ade7e70d0dd613d213960d8a14776c65R7-R16\">(commit)<\/a><\/div>\n<h3>Viewing the traces<\/h3>\n<p>OpenTelemetry implementations offer different visualisations of the end-to-end traces. In general, they&#8217;ll show an end-to-end trace consisting of a series of spans, something like this:<\/p>\n<p><img decoding=\"async\" src=\"https:\/\/images.dalelane.co.uk\/2025-01-31-otel\/diagram-trace.png?raw=true\" style=\"width: 100%; max-width: 450px; border: thin black solid;\"\/><\/p>\n<p>When events are produced to a Kafka topic, this is described as a &#8220;publish&#8221;. When an event is consumed from a Kafka topic and processed, this is described as a &#8220;process&#8221;. And the relationship and dependencies between each of these steps can be displayed.<\/p>\n<p>In the Jaeger UI, traces for this demo looked like this:<\/p>\n<p><img decoding=\"async\" src=\"https:\/\/images.dalelane.co.uk\/2025-01-31-otel\/screenshot-trace.png?raw=true\" style=\"width: 100%; max-width: 450px; border: thin black solid;\"\/><\/p>\n<h3>Using end-to-end traces<\/h3>\n<p>The observability offered by the spans collected by OpenTelemetry shows you what is functionally happening in your infrastructure, even if the processing of events is asynchronous and distributed across multiple applications, servers, and systems &#8211; which is typical when working with Kafka events.<\/p>\n<p>The timing for the trace shows you what the end-to-end latency is for the overall processing. The timing for each individual span give you an insight into what is contributing to that latency. These timings are an ideal starting point for identifying where to optimize or improve performance.<\/p>\n<p>In this post, I&#8217;ve demonstrated how to instrument an existing application. This illustrates that it can be done with only minor configuration updates, however even this does leave the question of when is the best time to do this.<\/p>\n<p>The earlier you can prepare for this the better. But there is a cost to collecting all of these trace spans, and it is unlikely that you will constantly need to review end-to-end traces for all events flowing through your applications.<\/p>\n<p>This still leaves some useful approaches.<\/p>\n<p>One approach is to have the OTel dependencies and Java agent in place, but not activate them &#8211; and enable the trace collection when there is a need to investigate what the application is doing. This can be as simple as switching <code style=\"font-weight: bold;\">OTEL_TRACES_EXPORTER<\/code> from &#8220;none&#8221; to &#8220;otlp&#8221; or adding the Java agent property. Depending on the way your application is configured this might introduce a need to restart the app.<\/p>\n<p>An alternative approach is to leave the agent collecting trace spans all of the time, configured to collect traces for a sample of Kafka events, rather than every single event. Sampling means that some traces can be available for review whenever needed, and if the sampling rate is low enough this can minimise the overhead of tracing.<\/p>\n","protected":false},"excerpt":{"rendered":"<p>When using Apache Kafka, timely processing of events is an important consideration. Understanding the throughput of your event processing solution is typically straightforward : by counting how many events you can process a second. Understanding latency (how long it takes from when an event is first emitted, to when it has finished being processed and [&hellip;]<\/p>\n","protected":false},"author":1,"featured_media":5484,"comment_status":"open","ping_status":"closed","sticky":false,"template":"","format":"standard","meta":{"footnotes":""},"categories":[7],"tags":[593,610,584],"class_list":["post-5483","post","type-post","status-publish","format-standard","has-post-thumbnail","hentry","category-code","tag-apachekafka","tag-flink","tag-kafka"],"_links":{"self":[{"href":"https:\/\/dalelane.co.uk\/blog\/index.php?rest_route=\/wp\/v2\/posts\/5483","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/dalelane.co.uk\/blog\/index.php?rest_route=\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/dalelane.co.uk\/blog\/index.php?rest_route=\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/dalelane.co.uk\/blog\/index.php?rest_route=\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/dalelane.co.uk\/blog\/index.php?rest_route=%2Fwp%2Fv2%2Fcomments&post=5483"}],"version-history":[{"count":1,"href":"https:\/\/dalelane.co.uk\/blog\/index.php?rest_route=\/wp\/v2\/posts\/5483\/revisions"}],"predecessor-version":[{"id":5901,"href":"https:\/\/dalelane.co.uk\/blog\/index.php?rest_route=\/wp\/v2\/posts\/5483\/revisions\/5901"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/dalelane.co.uk\/blog\/index.php?rest_route=\/wp\/v2\/media\/5484"}],"wp:attachment":[{"href":"https:\/\/dalelane.co.uk\/blog\/index.php?rest_route=%2Fwp%2Fv2%2Fmedia&parent=5483"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/dalelane.co.uk\/blog\/index.php?rest_route=%2Fwp%2Fv2%2Fcategories&post=5483"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/dalelane.co.uk\/blog\/index.php?rest_route=%2Fwp%2Fv2%2Ftags&post=5483"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}