Using quotas with Event Endpoint Management

In this post, we share examples of using quotas with IBM Event Endpoint Management, give you some pointers to help you try them for yourself, and most importantly get you thinking about where this might be useful for your own catalog.

Event Endpoint Management makes it easy for you to share your Kafka topics. Put some of your Kafka topics in the catalog, and allow colleagues and partners to discover the topics, so they can use the self-service catalog page to get started with them immediately.

Increasing reuse of your streams of events makes it possible for your business to unlock even more value from them. Innovative new ways to use them, that you might not have even thought of, will be enabled the more widely you share.

But before you invite colleagues and partners to start using your topics, you want to make sure that you’re ready. Event Endpoint Management offers a range of tools to make sure that you remain in control. Quotas are just one of these, and we dig into what they offer in this post.

Co-authored with Chris Patmore


Quotas – for producers

Your Kafka cluster’s disk and network bandwidth aren’t unlimited resources.

Quotas are a useful tool when you start widely sharing your Kafka topics. Quotas are a way to set an upper limit on how fast each application that uses your Kafka cluster is allowed to produce data to it.

We’re starting by looking at sharing a topic to allow other teams to produce messages to it. (Starting with this means we’ll have some messages on the topic to consume from in a moment!)

Our aim is to enable something like this:

Before we start adding quotas, we ran one producer, to see how fast it could put messages on our topic.

We ran a simple test app to produce 5,000,000 messages (each message containing randomly generated data).

Our app produced over 95,000 messages per second.

(This is only a small development cluster that we’re sharing with other people, so this is by no means a scientific performance test. Don’t read too much into the absolute numbers, we’ll just use this to give us an idea of the relative impact of adding controls on a busy cluster.)

Maybe you’d be concerned about lots of applications all producing that much data, that quickly, at a sustained rate.

Quotas let you control this. You can specify an upper limit of how fast each producer is allowed to produce messages to your topic. This makes sure that no individual application is allowed to flood your topic, or use a disproportionate amount of your cluster bandwidth.

With five apps producing to your topic, a quota means that each application will have a limit applied to it.

Trying this for yourself

Let’s take a step back and walk through how can set this up and see it in action.

Step 1: You need a Kafka cluster and an instance of IBM Event Endpoint Management.

You can use the Event Automation demo for this. It has an Ansible playbook that you can point at a Red Hat OpenShift cluster. It sets up a small development Event Streams (Kafka cluster), and an Event Endpoint Management catalog and Event Gateway. They’re all setup, connected, and ready to use.

Step 2: You need a Kafka topic.

You can apply this KafkaTopic spec to create a topic called “quotatest”.

oc apply -f topic.yaml

Step 3: Add the topic to Event Endpoint Management.

We’re going to assume you’ve done this before, but if not there are instructions to walk you through it.

We didn’t document the topic in any detail, as we’re just going to be using it to illustrate this post.

Step 4: Add options with quota controls.

To publish the topic in the Event Endpoint Management catalog you need to create an option. Again, if you’re new to this, there are instructions available that you can follow.

The important thing is that when you’re defining the option, you need to add a Quota enforcement control.

Quota controls can be defined in megabytes per second, or in messages per second, or a combination of both.

As every (random data) message your test application produces is the same size, it won’t make much difference which you choose. We went with messages-per-second as it is a little easier to understand.

To illustrate this post, we created a range of different options – each with a quota control defined slightly differently so we could compare.

We gave each option a different topic alias, so the topic names would make it easy for us to remember what quota we had given to each.

You don’t need to add as many options as we did. We recommend adding at least two:

  • one with a quota control
  • one without any quota control so you have something to compare with

(You might notice in the screenshot that we also added Approval controls. That is because we were doing this in a shared Catalog and we didn’t want our colleagues to play around with this topic while we were running our app. This meant no-one else could use this topic without us approving it first. You probably won’t need to do this.)

Step 5: Create credentials to access the topic through the Event Gateway.

Now that the topic is in the Catalog, you can start generating access credentials for your test applications to use.

You will need to create credentials for each option.

Step 6: Configure a producer application.

There is a simple test application included with IBM Event Streams, so we used that.

It generates messages containing random data, and records how long it took to produce them.

To set this up, we needed to:

You can see how we did this in configs.yaml.

dalelane@dales-mbp eem-quotas % oc apply -f configs.yaml
secret/eem-ca-cert created
configmap/quotas-produce-unlimited created
configmap/quotas-produce-25000 created
configmap/quotas-produce-50000 created
configmap/quotas-produce-75000 created

You can copy our config – just change the bootstrap.servers property to match your Event Gateway and put the username and password you created in the Catalog in the sasl.jaas.config property.

Step 7: Run a producer application – with no quota.

We defined an application as a Kubernetes Job – so it would start up, produce 5 million messages, and then stop.

You can see the configuration for this in produce-unlimited.yaml.

The bits that you might want to modify are the topic name (--topic), the number of messages to produce (--num-records), and the size of each message (--record-size). (It doesn’t matter what you pick, as long as you are consistent.)

To run the application, apply the job spec.

oc apply -f produce-unlimited.yaml

To see the output, you can tail the log.

oc logs -f --selector app=producer-unlimited

The application incrementally outputs information as it goes, so you can see how it progresses.

For our purposes, we just ignored everything except the final line with the details of how quickly it produced the overall 5,000,000 messages.

324846 records sent, 64969.2 records/sec (7.93 MB/sec), 1325.4 ms avg latency, 2179.0 ms max latency.
480187 records sent, 96037.4 records/sec (11.72 MB/sec), 3121.5 ms avg latency, 3849.0 ms max latency.
467674 records sent, 93516.1 records/sec (11.42 MB/sec), 3779.6 ms avg latency, 3897.0 ms max latency.
594235 records sent, 118847.0 records/sec (14.51 MB/sec), 3266.6 ms avg latency, 3951.0 ms max latency.
477995 records sent, 95599.0 records/sec (11.67 MB/sec), 3235.2 ms avg latency, 3936.0 ms max latency.
455858 records sent, 91116.9 records/sec (11.12 MB/sec), 4118.0 ms avg latency, 4391.0 ms max latency.
484633 records sent, 96887.8 records/sec (11.83 MB/sec), 3567.3 ms avg latency, 3796.0 ms max latency.
452487 records sent, 90479.3 records/sec (11.04 MB/sec), 4172.5 ms avg latency, 4688.0 ms max latency.
523942 records sent, 104767.4 records/sec (12.79 MB/sec), 3104.9 ms avg latency, 3720.0 ms max latency.
505591 records sent, 101098.0 records/sec (12.34 MB/sec), 3594.4 ms avg latency, 3750.0 ms max latency.
5000000 records sent, 95472.685265 records/sec (11.65 MB/sec), 3387.08 ms avg latency, 4688.00 ms max latency, 3559 ms 50th, 4353 ms 95th, 4596 ms 99th, 4686 ms 99.9th.

(Again, it’s important to be clear that we were not doing scientific performance testing here – as this is a small dev cluster that is used by multiple people. To prove this point, we ran the same application again, and got a slightly different result:)

dalelane@dales-mbp eem-quotas % oc delete -f produce-unlimited.yaml
job.batch "producer-unlimited" deleted

dalelane@dales-mbp eem-quotas % oc apply -f produce-unlimited.yaml
job.batch/producer-unlimited created

dalelane@dales-mbp eem-quotas % oc logs -f --selector app=producer-unlimited
423800 records sent, 84760.0 records/sec (10.35 MB/sec), 932.4 ms avg latency, 1815.0 ms max latency.
588410 records sent, 117682.0 records/sec (14.37 MB/sec), 2655.5 ms avg latency, 3121.0 ms max latency.
578210 records sent, 115503.4 records/sec (14.10 MB/sec), 2958.5 ms avg latency, 3159.0 ms max latency.
441696 records sent, 88145.3 records/sec (10.76 MB/sec), 3790.4 ms avg latency, 4193.0 ms max latency.
485335 records sent, 97067.0 records/sec (11.85 MB/sec), 3639.6 ms avg latency, 3977.0 ms max latency.
407379 records sent, 81459.5 records/sec (9.94 MB/sec), 4074.2 ms avg latency, 4318.0 ms max latency.
506870 records sent, 101374.0 records/sec (12.37 MB/sec), 3919.6 ms avg latency, 4410.0 ms max latency.
558956 records sent, 111791.2 records/sec (13.65 MB/sec), 3061.4 ms avg latency, 3315.0 ms max latency.
505741 records sent, 101128.0 records/sec (12.34 MB/sec), 3394.8 ms avg latency, 3664.0 ms max latency.
461381 records sent, 92257.7 records/sec (11.26 MB/sec), 3680.8 ms avg latency, 3936.0 ms max latency.
5000000 records sent, 99064.828023 records/sec (12.09 MB/sec), 3209.45 ms avg latency, 4410.00 ms max latency, 3308 ms 50th, 4253 ms 95th, 4351 ms 99th, 4397 ms 99.9th.

For our purposes, that is fine. The point is that running this let us see the sort of rate that our application could produce at.

Step 8: Run a producer application – with a quota.

You can see in our configs.yaml that we created separate ConfigMaps, each with the username and password we had created for a different option.

And we have variations of our producer job for each quota option. If you compare them, you’ll see that the main difference is to specify the correct topic alias and the correct credentials.

You can see the output that we got from running these here:

The results are what you would expect.

With a quota of 25,000 messages per second:

dalelane@dales-mbp eem-quotas % oc apply -f produce-25000.yaml
job.batch/producer-25000 created

dalelane@dales-mbp eem-quotas % oc logs -f --selector app=producer-25000
...
5000000 records sent, 24893.827824 records/sec (3.04 MB/sec), 13505.16 ms avg latency, 14421.00 ms max latency, 14100 ms 50th, 14137 ms 95th, 14159 ms 99th, 14342 ms 99.9th.

With a quota of 50,000 messages per second:

dalelane@dales-mbp eem-quotas % oc apply -f produce-50000.yaml
job.batch/producer-50000 created

dalelane@dales-mbp eem-quotas % oc logs -f --selector app=producer-50000
...
5000000 records sent, 49593.334656 records/sec (6.05 MB/sec), 6685.01 ms avg latency, 7708.00 ms max latency, 7053 ms 50th, 7162 ms 95th, 7571 ms 99th, 7665 ms 99.9th.

With a quota of 75,000 messages per second:

dalelane@dales-mbp eem-quotas % oc apply -f produce-75000.yaml
job.batch/producer-75000 created

dalelane@dales-mbp eem-quotas % oc logs -f --selector app=producer-75000
...
5000000 records sent, 74056.519936 records/sec (9.04 MB/sec), 4394.07 ms avg latency, 5149.00 ms max latency, 4702 ms 50th, 4836 ms 95th, 5025 ms 99th, 5140 ms 99.9th.

The same application each time.

The same code was attempting to produce the same number of equally sized random messages, and these were all going to the same topic.

But it ran at different speeds each time.

You don’t need to rely on the different application developers using your topics from the Catalog to be well-behaved. You don’t have to ask them to change how their application behaves. Quota controls applied by the Event Gateway can limit how quickly the applications can each produce data to the topic – controlling the impact of each application using topics from Event Endpoint Management.

Quotas – for consumers

Let’s see the same for sharing a topic for other teams to consume messages from.

(Now that we had repeatedly produced 5 million messages to the test topic, we had plenty of messages for applications to consume!)

The principle is similar to before.

Quotas let you set an upper limit on how fast each consuming application is allowed to consume data from your topic, so you can stay in control of the impact of each application on your cluster.

The aim this time is to enable something like this:

Setting a limit on how fast each application is able to consume from the Kafka cluster is a useful part in controlling the bandwidth used for the topics that you share.

We ran a similar application to before, but this time we configured it to consume 5,000,000 messages and report how quickly it was able to do that.

Step 8: Add the topic to Event Endpoint Management (for consuming this time).

Step 9: Add options with quota controls.

As before, the important thing is to include the Quota enforcement control.

As with produce, you can define this in megabytes per second, or in messages per second, or a combination of both.

We went with messages per second, to mirror what we did with the producers.

We created a range of different options – each with a quota control defined slightly differently so we could compare.

We gave each option a different topic alias, so the topic names would make it easy for us to remember what quota we had given to each.

You don’t need to add as many options as we did. We recommend adding at least two:

  • one with a quota control
  • one without any quota control so you have something to compare with

Step 10: Create credentials to access the topic through the Event Gateway.

Time to go to the Catalog and generate access credentials for each topic option.

Step 11: Configure a consumer application.

As before, you can use the simple test application included with IBM Event Streams.

It consumes a predefined number of messages from the topic, and then outputs how long it took to consume them.

To set this up, we needed to:

You can see how we did this in configs.yaml.

dalelane@dales-mbp eem-quotas % oc apply -f configs.yaml
secret/eem-ca-cert unchanged
configmap/quotas-produce-unlimited unchanged
configmap/quotas-produce-25000 unchanged
configmap/quotas-produce-50000 unchanged
configmap/quotas-produce-75000 unchanged
configmap/quotas-consume-unlimited created
configmap/quotas-consume-25000 created
configmap/quotas-consume-50000 created
configmap/quotas-consume-75000 created

You can copy our config – just change the bootstrap.servers property to match your Event Gateway and put the username and password you created in the Catalog in the sasl.jaas.config property.

Step 12: Run a consumer application – with no quota.

We defined an application as a Kubernetes Job – so it would start up, consume 5 million messages, and then stop.

You can see the configuration for this in consume-unlimited.yaml.

The bits that you might want to modify are the topic name (--topic) and the number of messages to consume (--messages). It doesn’t matter what you pick, as long as you are consistent.

To run the application, apply the job spec.

oc apply -f consume-unlimited.yaml

To see the output, you can tail the log.

dalelane@dales-mbp eem-quotas % oc logs -f --selector app=consumer-unlimited
start.time,              end.time,                data.consumed.in.MB, MB.sec,  data.consumed.in.nMsg, nMsg.sec,    rebalance.time.ms, fetch.time.ms, fetch.MB.sec, fetch.nMsg.sec
2024-09-06 16:16:51:674, 2024-09-06 16:17:15:679, 610.3760,            25.6374, 5000200,               210021.8414, 3759,              20049,         30.4442,      249398.9725

You can see the output from when we ran this a couple of times at consume-unlimited.txt.

The interesting value is fetch.nMsg.sec (on the far right), which reported that the application fetched approximately 250,000 messages per second.

Step 13: Run a consumer application – with a quota.

You can see in our configs.yaml that we created separate ConfigMaps, each with the username and password we had created for a different option.

And we have variations of our consumer job for each quota option. If you compare them, you’ll see that the main difference is to specify the correct topic alias and the correct credentials.

You can see the full output that we got from running these here:

The same code consuming the same messages from the same topic, but running at very different speeds each time.

You don’t need to depend on application developers who find your topics in the Catalog to write their applications in a way that will share the cluster evenly. Quota controls applied by the Event Gateway can controlled how quickly applications are able to fetch messages from the topic.

Working with multiple partitions

All of the values we’ve shown so far has been using a topic with one partition.

For example, consuming from a Kafka topic using an option with a 50,000 messages per second quota control, gives us results like this:

dalelane@dales-mbp eem-quotas % oc apply -f consume-50000.yaml
job.batch/consumer-50000 created

dalelane@dales-mbp eem-quotas % oc logs -f --selector app=consumer-50000
start.time,              end.time,                data.consumed.in.MB, MB.sec,  data.consumed.in.nMsg, nMsg.sec,    rebalance.time.ms, fetch.time.ms, fetch.MB.sec, fetch.nMsg.sec
2024-09-09 14:15:16:400, 2024-09-09 14:17:00:349, 610.3516,            5.8716,  5000000,               48100.5108,  4036,              99913,         6.1088,       50043.5379

The fetch.nMsg.sec value (on the far right) is approximately 50,000 messages per second, which is what you would expect with a quota of 50,000 messages per second.

But if you repeat this with a topic that has two partitions

dalelane@dales-mbp eem-quotas % oc apply -f consume-50000.yaml
job.batch/consumer-50000 created

dalelane@dales-mbp eem-quotas % oc logs -f --selector app=consumer-50000
start.time,              end.time,                data.consumed.in.MB, MB.sec,  data.consumed.in.nMsg, nMsg.sec,    rebalance.time.ms, fetch.time.ms, fetch.MB.sec, fetch.nMsg.sec
2024-09-09 14:23:11:727, 2024-09-09 14:24:05:793, 610.3962,            11.2898, 5000366,               92486.3315,  3860,              50206,         12.1578,      99596.9804

The application fetches messages twice as quickly as it did before. It was able to fetch approximately 99,600 messages per second.

This is because when an application uses a topic with two partitions, what happens is more like this:

The two topic partitions will each be hosted on different Kafka brokers, and the application will consume from each broker in parallel. This is why Kafka has topic partitions – to enable parallel processing!

These two connections that the application will make to the Event Gateway will each, independently, have the quota enforcement control applied.

The consumer application makes two connections to the Event Gateway, one for each partition. Each one is subjected to the 50,000 messages per second quota control.

That is why cumulatively the application was able to fetch nearly 100,000 messages per second.

This would be equally true if we had run two separate consumers as part of a consumer group, such as this:

Each member of the consumer group would separately have the quota control applied to its connection. It is perhaps less surprising when this happens, compared to when you are running a single application that is implicitly making multiple connections.

The key to remember is that the Event Gateway will apply a quota control to every connection that is made. If an application makes multiple connections, such as to produce or consume to multiple topic partitions, the quota is applied to each connection independently.

How the Event Gateway does all of this

Because the Event Gateway can front multiple Kafka clusters and expose multiple options over a single Kafka topic, it allows much finer grained control over what a client can do.

For example, a single topic could be exposed over the Event Gateway as an open-to-anyone-to-consume option but with its rate tightly controlled. Simultaneously, the topic could be exposed to a more select number of clients with a more generous quota (or even have no limit at all) by applying an Approval control.

This is because the quotas are applied per-connection per-option. Each unique connection to a particular option is tracked and controlled separately to the others.

The Event Gateway is able to enforce quotas on Kafka clients like this through its deep understanding of the Kafka protocol. It uses this knowledge to limit client applications by using a mechanism which is already baked into most popular Kafka client libraries.

Kafka clients make requests and process responses using the Kafka protocol. Responses from Kafka can define a throttle time, which is a request for the client to pause before making its next request. It is expected that clients should honor these requests, and not make any further requests until that throttle time has expired. This then effectively limits the rate at which they are able to make requests and thus consume or produce data.

The Event Gateway utilizes this mechanism to bring client rates down under their quota. By looking at how many messages or how much data has been produced or consumed by a client connection for a particular option, the Event Gateway can make decisions on what should be done. If it observes that a connection has exceeded the configured quota, it uses the throttle time value in the response metadata to tell the client to slow down so that its rate goes back below the quota.

Because the Event Gateway uses this standard client mechanism, client applications do not need to be changed to make use of quotas. Most Kafka clients will behave appropriately and wait for the throttle time. This benefits the client and the server as there is less network traffic and the client doesn’t have to deal with the server ignoring it (such as handling connection timeouts or other such errors associated with being throttled by the server).

It is important to note however, that in the event of a client ignoring the Event Gateway’s instructions to slow down, the client will still find itself unable to produce or consume more data. The Event Gateway will ignore clients that misbehave to ensure the quota is not exceeded. This is a necessary protection against malicious clients.

Adding quotas to the back-end

Quotas applied at the Event Gateway are an effective way to stop individual applications using a disproportionate amount of your disk or network bandwidth.

But what about the cumulative effect of a large number of applications all using your topic? Each individual application might be keeping to their quota, but if there are enough of them, they might lead to a collective impact that was more than you would like.

For example, perhaps you have a quota being enforced by the Event Gateway that keeps each consumer application to 25 MB per second.

We created this using a new catalog option, with a new quota enforcement control.

We used the catalog to create a new set of credentials for this new option.

We used these credentials to run five instances of the consumer application at once.

dalelane@dales-mbp eem-quotas % oc apply -f consume-25mbs-five-instances.yaml
job.batch/consumer-25mbs created

dalelane@dales-mbp eem-quotas % oc get pods -oname --selector app=consumer-25mbs | xargs -I {} oc logs {}
start.time,              end.time,                data.consumed.in.MB, MB.sec,  data.consumed.in.nMsg, nMsg.sec,    rebalance.time.ms, fetch.time.ms, fetch.MB.sec, fetch.nMsg.sec
2024-09-07 19:34:36:552, 2024-09-07 19:35:00:349, 610.3619,            25.6487, 5000085,               210114.0900, 3906,              19891,         30.6853,      251374.2396
2024-09-07 19:34:36:588, 2024-09-07 19:35:00:464, 610.3619,            25.5638, 5000085,               209418.8725, 3825,              20051,         30.4405,      249368.3607
2024-09-07 19:34:36:469, 2024-09-07 19:35:00:941, 610.3619,            24.9412, 5000085,               204318.6090, 3897,              20575,         29.6652,      243017.4970
2024-09-07 19:34:36:539, 2024-09-07 19:35:00:580, 610.3619,            25.3884, 5000085,               207981.5731, 3916,              20125,         30.3285,      248451.4286
2024-09-07 19:34:36:602, 2024-09-07 19:35:00:121, 610.3619,            25.9519, 5000085,               212597.6870, 3866,              19653,         31.0569,      254418.4094

The interesting values are in the MB.sec column. Each of our five consumer applications consumed messages from the Kafka topic at approximately 25 MB per second.

Cumulatively, these applications are consuming approximately 125 MB per second from the Kafka cluster through the Event Gateway.

What if this is higher than you wanted? Perhaps you want to limit the cumulative impact to your Kafka cluster (of sharing your topic in the Catalog) to something such as 75 MB per second.

Adding a quota to the connection between the back-end Kafka cluster and the Event Gateway is a way to control that.

With quotas defined for both sides of the Event Gateway, you can:

  • protect against any individual application using a disproportionate level of cluster resources so that all applications using topics from the Catalog get their fair share
  • protect against a high number of applications resulting in a negative combined impact

To apply the quota, we modified the Kafka credentials used to add the topic to the Event Endpoint Management catalog, from unlimited to this limited (quota) version.

The difference between them is:

spec:
  quotas:
    consumerByteRate: 75000000
    producerByteRate: 75000000

With the quota applied, we re-ran the five consumer applications:

dalelane@dales-mbp eem-quotas % oc apply -f consume-25mbs-five-instances.yaml
job.batch/consumer-25mbs created

dalelane@dales-mbp eem-quotas % oc get pods -oname --selector app=consumer-25mbs | xargs -I {} oc logs {}
start.time,              end.time,                data.consumed.in.MB, MB.sec,  data.consumed.in.nMsg, nMsg.sec,    rebalance.time.ms, fetch.time.ms, fetch.MB.sec, fetch.nMsg.sec
2024-09-07 20:19:35:811, 2024-09-07 20:20:15:012, 610.3619,            15.5701, 5000085,               127549.9350, 3841,              35360,         17.2614,      141405.1188
2024-09-07 20:19:35:889, 2024-09-07 20:20:15:102, 610.3619,            15.5653, 5000085,               127510.9020, 3977,              35236,         17.3221,      141902.7415
2024-09-07 20:19:36:132, 2024-09-07 20:20:15:497, 610.3619,            15.5052, 5000085,               127018.5444, 3776,              35589,         17.1503,      140495.2373
2024-09-07 20:19:36:194, 2024-09-07 20:20:15:276, 610.3619,            15.6175, 5000085,               127938.3092, 3776,              35306,         17.2878,      141621.3958
2024-09-07 20:19:36:265, 2024-09-07 20:20:14:508, 610.3619,            15.9601, 5000085,               130745.1037, 3832,              34411,         17.7374,      145304.8444

The interesting values are again in the MB.sec column. Each of our five consumer applications this time consumed messages from the Kafka topic at a little over 15 MB per second.

While individually each application is not permitted to exceed 25 MB per second, cumulatively these applications are now limited to approximately 55 MB per second.

Other controls

Quotas are just one tool available to you, and they complement a range of options that Event Endpoint Management provides to enable you to remain in control when you share your Kafka topics.

Our colleague Adam has written an overview of these different options, which puts our deep dive here into quotas in a broader perspective.

Tags: ,

Leave a Reply