Posts Tagged ‘avro’

You need two schemas to deserialize an Avro message… but which two?

Friday, November 17th, 2023

In this post, I want to talk about what happens when you use Avro to deserialize messages on a Kafka topic, why it actually needs two schemas, and what those schemas need to be.

I should start by pointing out that if you’re using a schema registry, you probably don’t need to worry about any of this. In fact, a TLDR for this whole post could be “You should be using a good schema registry and SerDes client“.

But, there are times where this may be difficult to do, so knowing how to set a deserializer up correctly is helpful. (Even if you’re doing the right thing and using a Schema Registry, it is still interesting to poke at some of the details and know what is happening.)

The key thing to understand is that to deserialize binary-encoded Avro data, you need a copy of the schema that was used to serialize the data in the first place [1].


This gets interesting after your topic has been around for a while, and you have messages using a mixture of schema versions on the topic. Maybe over the lifetime of your app, you’ve needed to add new fields to your messages a couple of times.

If you want a consumer application to be able to consume all of the messages on this topic, what does that mean?

(more…)

Processing Apache Avro-serialized Kafka messages with IBM App Connect Enterprise

Monday, October 25th, 2021

IBM App Connect Enterprise (ACE) is a broker for developing and hosting high-throughput, high-scale integrations between a large number of applications and systems, including Apache Kafka.

In this post, I’ll describe how to use App Connect Enterprise to process Kafka messages that were serialized to a stream of bytes using Apache Avro schemas.

screenshot

Background

Best practice when using Apache Kafka is to define Apache Avro schemas with a definition of the structure of your Kafka messages.

(For more detail about this, see my last post on From bytes to objects: describing Kafka events, or the intro to Avro that I wrote a couple of years ago.)

In this post, I’m assuming that you have embraced Avro, and you have Kafka topics with messages that were serialized using Avro schemas.

Perhaps you used a Java producer with an Avro SerDe that handled the serialization automatically for you.

Or your messages are coming from a Kafka Connect source connector, with an Avro converter that is handling the serialization for you.

Or you are doing the serialization yourself, such as if you’re producing Avro-serialized messages from a Python app.

Now you want to use IBM App Connect Enterprise to develop and host integrations for processing those Kafka messages. But you need App Connect to know how to:

  • retrieve the Avro schemas it needs
  • use the schemas to turn the binary stream of bytes on your Kafka topics into structured objects that are easy for ACE to manipulate and process

(more…)

From bytes to objects: describing Kafka events

Saturday, October 23rd, 2021

The recording of the talk that Kate Stanley and I gave at Kafka Summit Americas is now available.

Events stored in Kafka are just bytes, this is one of the reasons Kafka is so flexible. But when developing a producer or consumer you want objects, not bytes. Documenting and defining events provides a common way to discuss and agree on an approach to using Kafka. It also informs developers how to consume events without needing access to the developers responsible for producing events.

In our talk, we introduced the most popular formats for documenting events that flow through Kafka, such as AsyncAPI, Avro, CloudEvents, JSON schemas, and Protobuf.

We discussed the differences between the approaches and how to decide on a documentation strategy. Alongside the formats, we also touched on the tooling available for the different approaches. Tools for testing and code generation can make a big difference to your day-to-day developer experience.

The talk was aimed at developers who maybe aren’t already documenting their Kafka events or who wanted to see other approaches.


watch the recording on the Kafka Summit website

(more…)

Describing Kafka with AsyncAPI

Friday, November 27th, 2020

In this post, I want to describe how to use AsyncAPI to document how you’re using Apache Kafka. There are already great AsyncAPI “Getting Started” guides, but it supports a variety of protocols, and I haven’t found an introduction written specifically from the perspective of a Kafka user.

I’ll start with a description of what AsyncAPI is.

“an open source initiative … goal is to make working with Event-Driven Architectures as easy as it is to work with REST APIs … from documentation to code generation, from discovery to event management”

asyncapi.com/docs

The most obvious initial aspect is that it is a way to document how you’re using Kafka topics, but the impact is broader than that: a consistent approach to documentation enables an ecosystem that includes things like automated code generation and discovery.

(more…)

Using Avro schemas from Python apps with IBM Event Streams

Thursday, October 17th, 2019

I’ve written before about how to write a schema for your developers using Kafka. The examples I used before were all in Java, but someone asked me yesterday if I could share some Python equivalents.

The principles are described in the Event Streams documentation, but in short, your Kafka producers use Apache Avro to serialize the message data that you send, and identify the schema that you’ve used in the Kafka message header. In your Kafka consumers, you look at the headers of the messages that you receive to know which schema to retrieve, and use that to deserialize message data.

(more…)

How to write your first Avro schema

Saturday, July 20th, 2019

Any time there is more than one developer using a Kafka topic, they will need a way to agree on the shape of the data that will go into messages. The most common way to document the schema of messages in Kafka is to use the Apache Avro serialization system.

This post is a beginner’s guide to writing your first Avro schema, and a few tips for how to use it in your Kafka apps.

(more…)