Describing Kafka with AsyncAPI

In this post, I want to describe how to use AsyncAPI to document how you’re using Apache Kafka. There are already great AsyncAPI “Getting Started” guides, but it supports a variety of protocols, and I haven’t found an introduction written specifically from the perspective of a Kafka user.

I’ll start with a description of what AsyncAPI is.

“an open source initiative … goal is to make working with Event-Driven Architectures as easy as it is to work with REST APIs … from documentation to code generation, from discovery to event management”

asyncapi.com/docs

The most obvious initial aspect is that it is a way to document how you’re using Kafka topics, but the impact is broader than that: a consistent approach to documentation enables an ecosystem that includes things like automated code generation and discovery.


screenshot - click to enlarge

what Kafka calls AsyncAPI describes as
broker message broker
producer publisher
consumer subscriber
event / record / message message
topic channel

Most terminology in AsyncAPI will be recognizable to Kafka users. Describing Kafka Producers as “publishers” and Kafka Consumers as “subscribers” isn’t that unusual, and something I’ve heard people do before, particularly for users who’ve come from messaging worlds like JMS.

Perhaps more unexpected is that, in AsyncAPI, Kafka topics are described as “channels”.

screenshot - click to enlarge

AsyncAPI specs are a YAML file (or you can use JSON if you prefer), that formally documents how to connect to the Kafka cluster, the details of the Kafka topic(s), and the type of data in the messages on the topics. It includes both formal schema definitions and space for free-text descriptions.

screenshot - click to enlarge

This graphic is used on the AsyncAPI web site to describe the structure of an AsyncAPI spec.

screenshot - click to enlarge

It’s a bit taller than I’m showing here, but these are the important sections.

The bottom bits in “Components” are where you can put definitions that can be reused by reference throughout the rest of your spec.

screenshot - click to enlarge

For this post, I want to go through the different sections in this structure, and describe what they mean through the lens of documenting Kafka.

screenshot - click to enlarge

I’ll start with the simplest bit: uniquely identifying your spec.

screenshot - click to enlarge

AsyncAPI recommends using URNs for this, although I have seen some examples that use URLs.

The important thing is that you have an id with a unique string.

screenshot - click to enlarge

Next, you can provide high-level info about what you’re documenting in this spec.

screenshot - click to enlarge

Aside from the obvious fields like a title, version number and contact details, the interesting bit here is a description. This is a place to capture some context for the use of Kafka being defined. And you can use markdown to include rich-text formatting.

screenshot - click to enlarge

The result can look something like this.

screenshot - click to enlarge

Next, you provide details of your Kafka cluster, so people know how to connect their client applications.

screenshot - click to enlarge

The servers section of the spec is made up of a list of server objects, each defining a Kafka broker and each identified by a unique name.

screenshot - click to enlarge

The most important bit is the URL field, where you provide the connection address for the Kafka broker.

The other bit you need to specify is the protocol. AsyncAPI can be used to document a variety of systems, so here is where you identify that you’re using Kafka.

screenshot - click to enlarge

Notice that there are two flavours of the protocol for Kafka.

If your Kafka cluster doesn’t have auth enabled, then you use the protocol kafka.

If client applications are required to provide credentials, then you identify this by using the protocol kafka-secure.

screenshot - click to enlarge

The result looks something like this.

screenshot - click to enlarge

As you’ll have more than one Kafka broker in your cluster, you’ll probably want to identify each of them.

screenshot - click to enlarge

This means that the bootstrap address for Kafka clients wanting to connect should be formed by combining the URLs for each of the server objects. (And this is what AsyncAPI’s Java Spring code generator does.)

screenshot - click to enlarge

This is a limitation, as it prevents you from including multiple Kafka clusters (such as a production cluster and dev/test/staging clusters) in a single AsyncAPI spec. I think it would help to extend the spec to enable identifying multiple clusters, which is a suggestion I’ve raised with the AsyncAPI community.

In the meantime, you could just list all brokers from all your clusters, and rely on using the description or extension fields to explain which ones are in which cluster. (You would have to be careful of code generators or other parts of the AsyncAPI ecosystem that will misinterpret them as all being members of one large cluster.)

screenshot - click to enlarge

For those of us who are running Kafka in Kubernetes, and fronting it with a single bootstrap service or route that round-robins each broker in the cluster, we can just use that address in a single server object.

screenshot - click to enlarge

The same limitation about multiple clusters will apply though.

You could identify multiple Kafka clusters, each as a separate server object, and rely on naming or description fields to make it clear that these are actually separate clusters. But this wouldn’t be consistent with some existing use of AsyncAPI, or the assumptions in supporting tools in the ecosystem, like the Java Spring code generator.

For now, I think better to avoid doing this.

screenshot - click to enlarge

As I mentioned above, if your Kafka cluster is secured, you identify this by specifying kafka-secure as the protocol.

You identify the type of credentials by adding a security section to the server object. The value you put in there is the name of a securityScheme object you define in the components section.

screenshot - click to enlarge

Notice that the contents of the security value in the server object is just []. The important bit is the name, which matches up with the details down in components.securitySchemes.

screenshot - click to enlarge

The types of security scheme that you can specify aren’t Kafka-specific, so the best option is to choose the value that describes your type of approach to security.

For example, if you’re using SASL/SCRAM, that is a username/password-based approach to auth, so you could describe this as userPassword.

screenshot - click to enlarge

If you want to be more specific about the security options that Kafka clients need to use, then you could explain that in the description field, or you could use extensions to document it.

As with OpenAPI, you can add additional attributes to the spec by prefixing them with x- to identify them as your own extensions to AsyncAPI.

screenshot - click to enlarge

The problem with either of these approaches is that it won’t lead to people documenting these standard aspects of configuring Kafka clients in a consistent way, and would be harder to exploit in things like code generators.

I think it would help to extend the spec to include Kafka-specific security config options, which is a suggestion I’ve raised with the AsyncAPI community.

screenshot - click to enlarge

The next thing to do is to identify your Kafka topics.

screenshot - click to enlarge

As I mentioned above, in AsyncAPI you describe your topics as channels.

The channels section is made up of channel objects, each named using the name of your topic.

screenshot - click to enlarge

For each topic, you need to identify the operations that you want to describe in the spec.

As I mentioned above, AsyncAPI describes producing and consuming as publish and subscribe operations.

screenshot - click to enlarge

You can start by describing the operation – giving it a unique id, a short one-line text summary, and a more detailed description (which can include markdown formatting).

screenshot - click to enlarge

AsyncAPI puts protocol-specific values in sections called bindings.

Next, you can specify the values that Kafka clients should use to perform this operation in a bindings section.

The values you can describe are the consumer group id, and the client id.

If there are expectations about the format of these values, then you can describe them here, such as by using regular expressions.

screenshot - click to enlarge

Alternatively, if there is a discrete set of valid values, then you can enumerate all of them here instead.

screenshot - click to enlarge

Note that these are the only Kafka-specific attributes that are included in the bindings for Kafka operations.

screenshot - click to enlarge

Next, you describe the messages on the topic.

screenshot - click to enlarge

As with all the other levels of the spec, you can provide background and narrative in a description field.

screenshot - click to enlarge

Again, Kafka-specific values go into a bindings section. For messages, the value you can describe is how keys are used in messages on this topic.

You can describe the type – such as identifying whether you are using numeric or string-based keys. You can provide a regex if there is a pattern to how keys are defined. Or if you’re using a predefined, discrete set of keys, you can list them all in an enum.

screenshot - click to enlarge

As before, note that this is the only Kafka-specific attribute that is included in the bindings for Kafka messages.

screenshot - click to enlarge

Next, you document the headers on the messages.

screenshot - click to enlarge

You can list each header as a property of the headers object, and for each header provide a description for what it is for, and the type of data in the value.

There is also space to include a set of examples of what the headers can look like.

screenshot - click to enlarge

Finally, you describe the message body.

screenshot - click to enlarge

By default, you do this using AsyncAPI’s own schema format.

screenshot - click to enlarge

This means identifying the type of data, any restrictions that apply, and providing some examples.

screenshot - click to enlarge

If messages contain multiple fields, you can identify all of these, and specify which ones are required and which are optional.

screenshot - click to enlarge

But you don’t have to use AsyncAPI’s own schema format – a few other approaches are supported.

For Kafka users, the most useful of these is likely to be Apache Avro. If you’re using Avro to serialize and deserialize your messages (and you really should), then you can include a reference to your Avro schema.

screenshot - click to enlarge

For example, if your AsyncAPI spec is in a file on a filesystem, you can provide the relative location of the Avro schema file.

screenshot - click to enlarge

Alternatively, you can provide an absolute URL for where the schema is hosted, such as in a schema registry.

Reusing your existing Avro schemas means you don’t need to define the data types for your Kafka payloads multiple times, and the AsyncAPI spec supplements what Avro captures with information about the topics and Kafka clusters.

screenshot - click to enlarge

And that’s it.

I think there is a lot of value in capturing this detail about how you’re using Kafka in a consistent way.

screenshot - click to enlarge

I’ve tried playing with a couple of examples of supporting tools in the ecosystem.

I’ve already mentioned the Java Spring code generation – point it at an AsyncAPI spec, and it generates the source code for a Java app ready to start running against your Kafka cluster, with the connection information already set up.

screenshot - click to enlarge

Microcks is another interesting tool. It deploys it’s own Kafka broker, and if you upload your AsyncAPI spec it creates Kafka topics based on the description in the spec, and starts generating and producing mock data to them on a frequency interval that you specify.

screenshot - click to enlarge

It means you can define and document the data you plan to produce to your Kafka topics, and let Microcks set up a topic with a live stream of mock data matching that spec. And that lets you start developing your Kafka consumers against the mock topic without needing to wait until you have a real topic with real data.

screenshot - click to enlarge

I’m sure there is more that we can do with this, but this is an intro to starting to use AsyncAPI to describe your Kafka topics.

For more information, look at asyncapi.com.

Tags: , , , ,

Leave a Reply