Comparing approaches to maintaining an Event Projection from Kafka topics

This is the first in a series of posts exploring different approaches to implementing the Event Projections pattern with Apache Kafka.

In this first post, I’ll introduce what Event Projections are, and outline some of the benefits of the Event Projections pattern.

Finally, I’ll introduce the scenario that I’ll be using to illustrate the pros and cons of different approaches in later posts.

What are Event Projections?

Event Projections are a common pattern in Event Driven Architectures. In “Event-driven architecture usage patterns for the Kafka era“, the pattern is described as using an event stream that contains changes in state to make a local copy of data available to an application.

… use data change events to re-create a local copy of the data in the back-end system, and to then keep it current.

This local copy is known as a projection.

Let’s walk through a simple example:

Core system represents some critical system in your architecture, that maintains a source of data that will be valuable to applications.
Core system publishes an event every time some data changes to a Kafka topic.
As part of creating an application that wants access to this data, an event processor subscribes to the stream of events.
The event processor maintains a local store of the useful data that it receives in the events.
The application has local synchronous access to query the local data store, without needing direct access to the Core system.

The benefits of Event Projections

“Event stream projections for performant and agile applications” discusses some of the common benefits that the Event Projections pattern brings.

Some benefits are a result of the control that it gives an application owner by maintaining a local projection within the domain boundary of the application.

Sometimes this can be down to pragmatic considerations. If direct access to the Core system isn’t available, it can be easier, quicker or more practical to maintain a private projection.

Application owners can store just the data they need to enable their application. This can include filtering, transforming, or enriching the data, so that the projection contains a custom representation of the data, as needed by the app.

Productivity: Rapid access to new datasets.

If the data is successfully exposed as events by the back-end teams, all the data is there in the event streams; the application team just needs to decide what they want to store. They own the local data store, and they can create new ones or amend existing ones without affecting anyone else.

Furthermore, access to the event stream is self-service. There is no provisioning or implementation lag while you wait for another team to provide access to a data store or build an integration – you can just do it yourself.

Indeed, when performing this pattern at an enterprise scale, application ownership of the projection is critical to agility.

Data Synergy: Application specific data model.

Since the application team are populating their own local data store, they can ensure it matches the data model of their application and change it whenever they want.

Some benefits are efficiency and performance-related, as a result of the low-latency access to the data that a local store brings.

Robustness: Low latency, highly available responses.

The application team can place their data store wherever they want (regionally, and from a network perspective), and store the data in the most efficient form for the type of queries they will want to do. Performance and availability are in their hands.

Scalability: Elastically scalable performance.

The choice of data store implementation is in the hands of the application development team rather than being dependent on the back-end system’s scalability. The team can choose a data store topology suited to their scalability needs, and update that topology and rebuild the data store from the event stream if required.

In hybrid cloud and multi-cloud scenarios, this can be complemented with mirroring. If the Core System is running in one environment, and applications that need to use data from the Core System are running in another environment, there are cost benefits in minimizing data transfer between environments.

Mirroring the topic between the environments means that the data is only transferred once between environments. The mirrored topic can then be consumed multiple times by multiple consumers, including for the purposes of maintaining projections. This local access avoids the ingress / egress costs of repeated transfers between environments.

(Setting this up is beyond the scope of this post, but for a tutorial on how to do this with Mirror Maker 2, please see “Using Mirror Maker 2 with IBM Event Streams to broadcast events to multiple regions“.)

A simplified Event Projection scenario

When demonstrating IBM Event Automation, we commonly use the fictional retailer “Loosehanger Jeans”. We have a synthetic random event generator which generates Kafka events relating to a clothing retailer, with topics relating to sales, stock management, and employee activities.

I’ll use this as the basis for demonstrating a few different Event Projection techniques. This means you can use the Event Automation demo Ansible playbook to create an environment to try these Event Projection demos for yourself.

To let me show different approaches in each case, I’ll show how to maintain projections for two of the topics.

Example one : Store the latest event for each key

The SENSOR.READINGS topic is a stream of events from sensors in locations where Loosehanger Jeans keeps stock.

Each event includes a temperature and humidity reading. The events are keyed by the ID that identifies where the sensor is.

For this example, my application owner wants to be able to query for the latest temperature and humidity for any given location.

In the following posts, I’ll illustrate this by being able to submit REST API calls like this:

curl --silent \
  http://projectionsdemo.apps.dale-lane.demo.ibm.com/sensorreadings/G-0-15 | jq

{
    "humidity": 43,
    "sensorId": "G-0-15",
    "sensorTime": "Thu Nov 28 13:47:09 GMT 2024",
    "temperature": 22.3
}

(In this example, I’m assuming that our application owner doesn’t need the history of previous temperatures / humidity readings in each location – they just need to be able to query for the current/latest value. That isn’t always the case for Event Projections, but it’s a helpful simplification to illustrate with demo apps.)

Example two : Store the latest event keyed by a payload value

The DOOR.BADGEIN topic is a stream of events, that each record a Loosehanger Jeans using their employee ID badge to unlock and open a door.

Each event includes the username for the employee, and the identifier of the door that they went through. The events are keyed by a unique event ID for the access record.

For this example, my application owner wants to be able to query for the latest employee to have gone through any particular door.

curl --silent \
  http://projectionsdemo.apps.dale-lane.demo.ibm.com/badgeins/DE-1-16 | jq

{
    "badgeTime": "2024-11-28 13:54:48.702",
    "doorId": "DE-1-16",
    "employee": "geralyn.littel",
    "recordId": "0fc23607-4527-4b6a-bb19-9e6aac5c22ba"
}

The previous example implies a particular approach based on Kafka log compaction. I’m using this additional example to show how the application owner has the flexibility to maintain projections that organise the data in different ways. The application owner can structure the projection to suit their application, without being restricted by decisions by the owner of the event stream (such as the choice of record keys).

In my next posts…

Over the next few posts, I’ll walk through an implementation of a few different approaches to satisfy those two examples:

For each approach, I’ll use a sample implementation to illustrate the pros and cons.

More information

I’ve quoted from both of these already, but for more theory behind the Event Projection pattern, I highly recommend:

Event-driven architecture usage patterns for the Kafka era
This gives a high-level overview of common Event-driven architecture patterns. Event Projections is pattern number 3 there.

Event stream projections for performant and agile applications
This is a deep-dive into the considerations of when and how to apply the Event Projections pattern.

Tags: apachekafka, kafka

This entry was posted on Thursday, November 28th, 2024 at 2:41 pm and is filed under code. You can follow any responses to this entry through the RSS 2.0 feed. You can skip to the end and leave a response. Pinging is currently not allowed.

dale lane