Using client quotas with IBM Event Streams

In this post, I want to highlight a feature that I often see under-used in IBM Event Streams, and show how you can easily give it a try.

Kafka can enforce quotas to limit the impact that client applications can have on your cluster. To quote the Kafka documentation:

It is possible for producers and consumers to produce/consume very high volumes of data or generate requests at a very high rate and thus monopolize broker resources, cause network saturation and generally DOS other clients and the brokers themselves.

Having quotas protects against these issues and is all the more important in large multi-tenant clusters where a small set of badly behaved clients can degrade user experience for the well behaved ones.

In fact, when running Kafka as a service this even makes it possible to enforce API limits according to an agreed upon contract.

The Event Streams documentation adds:

… quotas protect from any single client producing or consuming significantly larger amounts of data than the other clients … This prevents issues with broker resources not being available to other clients, DoS attacks on the cluster, or badly behaved clients impacting other users of the cluster

There are different types of quotas: quotas based on a network bandwidth usage threshold (available for both producers and consumers) and quotas based on CPU utilization.

In this post, I’ll use a network bandwidth usage threshold for producer applications as my example, but everything I show applies to other quota types as well.

To demonstrate the way this works, I started with the kafka-producer-perf-test.sh and kafka-consumer-perf-test.sh tools that come with Apache Kafka.

I started up six producers and six consumers – all pointing at the same Kafka topic. The six producers each sent 1,000,000 messages (128-byte messages) – as fast as they could, with no throttling. And then set acks=all to verify that the messages were being produced successfully.

The summary report from the six producers said:

1000000 records sent, 13647.405628 records/sec (1.67 MB/sec), 15097.44 ms avg latency, 28727.00 ms max latency, 14045 ms 50th, 27803 ms 95th, 28416 ms 99th, 28617 ms 99.9th.
1000000 records sent, 13649.454704 records/sec (1.67 MB/sec), 15184.59 ms avg latency, 28819.00 ms max latency, 14072 ms 50th, 28081 ms 95th, 28602 ms 99th, 28749 ms 99.9th.
1000000 records sent, 13681.762211 records/sec (1.67 MB/sec), 15020.49 ms avg latency, 28231.00 ms max latency, 13950 ms 50th, 27848 ms 95th, 28072 ms 99th, 28171 ms 99.9th.
1000000 records sent, 13761.972916 records/sec (1.68 MB/sec), 14979.00 ms avg latency, 28559.00 ms max latency, 13921 ms 50th, 27947 ms 95th, 28281 ms 99th, 28453 ms 99.9th.
1000000 records sent, 13876.169067 records/sec (1.69 MB/sec), 14781.38 ms avg latency, 28420.00 ms max latency, 13916 ms 50th, 27910 ms 95th, 28173 ms 99th, 28333 ms 99.9th.
1000000 records sent, 13627.691469 records/sec (1.66 MB/sec), 15077.66 ms avg latency, 28504.00 ms max latency, 14083 ms 50th, 27925 ms 95th, 28224 ms 99th, 28390 ms 99.9th.

That gave me an idea of what my current Event Streams cluster setup can comfortably do.

Let’s start adding quotas to slow things down.

I added a producer network bandwidth threshold quota to the credentials I was using for the performance test.

For example:

spec:
  quotas:
    producerByteRate: 250000

See this setting in the context of my KafkaUser client credentials configuration

This told the Kafka brokers to each restrict my producer applications to only being able to produce 250,000 bytes per second.

Then I ran the same test again.

This time, aside from the most obvious difference that the test took much longer to run, the summary report from the six producers said:

1000000 records sent, 2122.578933 records/sec (0.26 MB/sec), 101142.87 ms avg latency, 121005.00 ms max latency, 114136 ms 50th, 120002 ms 95th, 120014 ms 99th, 120050 ms 99.9th.
1000000 records sent, 2121.507204 records/sec (0.26 MB/sec), 100072.10 ms avg latency, 120916.00 ms max latency, 112556 ms 50th, 120002 ms 95th, 120026 ms 99th, 120060 ms 99.9th.
1000000 records sent, 2120.823304 records/sec (0.26 MB/sec), 99296.82 ms avg latency, 121015.00 ms max latency, 111734 ms 50th, 120002 ms 95th, 120011 ms 99th, 120029 ms 99.9th.
1000000 records sent, 2121.061719 records/sec (0.26 MB/sec), 99230.30 ms avg latency, 121110.00 ms max latency, 111043 ms 50th, 120002 ms 95th, 120057 ms 99th, 120144 ms 99.9th.
1000000 records sent, 2121.493701 records/sec (0.26 MB/sec), 98743.10 ms avg latency, 121152.00 ms max latency, 111780 ms 50th, 120001 ms 95th, 120006 ms 99th, 120031 ms 99.9th.
1000000 records sent, 2173.161885 records/sec (0.27 MB/sec), 95485.01 ms avg latency, 121043.00 ms max latency, 108325 ms 50th, 120001 ms 95th, 120009 ms 99th, 120041 ms 99.9th.

Notice that the same messages were still successfully sent. Setting this quota doesn’t prevent my applications from being able to produce messages, the Kafka brokers just slows them down by delaying responses sent to them.

To illustrate this further, I re-ran the test with a variety of different producerByteRate values.

You can see the shell output of how I did this, which byte rates I tried, and the results showing the impact in docs/log.txt.

%
% #
% # producerByteRate == 250000 bytes/sec limit
% #
% ./reset.sh
% oc patch kafkauser quotas-user --type='merge' -p '{"spec":{"quotas":{"producerByteRate":250000}}}'
% ./run.sh
1000000 records sent, 2122.578933 records/sec (0.26 MB/sec), 101142.87 ms avg latency, 121005.00 ms max latency, 114136 ms 50th, 120002 ms 95th, 120014 ms 99th, 120050 ms 99.9th.
1000000 records sent, 2121.507204 records/sec (0.26 MB/sec), 100072.10 ms avg latency, 120916.00 ms max latency, 112556 ms 50th, 120002 ms 95th, 120026 ms 99th, 120060 ms 99.9th.
1000000 records sent, 2120.823304 records/sec (0.26 MB/sec), 99296.82 ms avg latency, 121015.00 ms max latency, 111734 ms 50th, 120002 ms 95th, 120011 ms 99th, 120029 ms 99.9th.
1000000 records sent, 2121.061719 records/sec (0.26 MB/sec), 99230.30 ms avg latency, 121110.00 ms max latency, 111043 ms 50th, 120002 ms 95th, 120057 ms 99th, 120144 ms 99.9th.
1000000 records sent, 2121.493701 records/sec (0.26 MB/sec), 98743.10 ms avg latency, 121152.00 ms max latency, 111780 ms 50th, 120001 ms 95th, 120006 ms 99th, 120031 ms 99.9th.
1000000 records sent, 2173.161885 records/sec (0.27 MB/sec), 95485.01 ms avg latency, 121043.00 ms max latency, 108325 ms 50th, 120001 ms 95th, 120009 ms 99th, 120041 ms 99.9th.
%
% ./run.sh
1000000 records sent, 2130.874021 records/sec (0.26 MB/sec), 96729.87 ms avg latency, 120859.00 ms max latency, 111611 ms 50th, 120001 ms 95th, 120006 ms 99th, 120032 ms 99.9th.
1000000 records sent, 2124.048692 records/sec (0.26 MB/sec), 98258.01 ms avg latency, 120895.00 ms max latency, 112053 ms 50th, 120001 ms 95th, 120007 ms 99th, 120038 ms 99.9th.
1000000 records sent, 2122.317656 records/sec (0.26 MB/sec), 99487.62 ms avg latency, 120985.00 ms max latency, 110657 ms 50th, 120002 ms 95th, 120018 ms 99th, 120031 ms 99.9th.
1000000 records sent, 2126.071806 records/sec (0.26 MB/sec), 99166.82 ms avg latency, 121025.00 ms max latency, 110048 ms 50th, 120002 ms 95th, 120017 ms 99th, 120073 ms 99.9th.
1000000 records sent, 2122.529376 records/sec (0.26 MB/sec), 99514.36 ms avg latency, 121008.00 ms max latency, 113264 ms 50th, 120003 ms 95th, 120103 ms 99th, 120146 ms 99.9th.
1000000 records sent, 2123.381187 records/sec (0.26 MB/sec), 99084.95 ms avg latency, 121115.00 ms max latency, 110173 ms 50th, 120001 ms 95th, 120013 ms 99th, 120033 ms 99.9th.
%
%
%
% #
% # producerByteRate == 500000 bytes/sec limit
% #
% ./reset.sh
% oc patch kafkauser quotas-user --type='merge' -p '{"spec":{"quotas":{"producerByteRate":500000}}}'
% ./run.sh
1000000 records sent, 3625.014047 records/sec (0.44 MB/sec), 57351.54 ms avg latency, 81996.00 ms max latency, 66271 ms 50th, 71862 ms 95th, 76286 ms 99th, 81890 ms 99.9th.
1000000 records sent, 3621.981079 records/sec (0.44 MB/sec), 57360.29 ms avg latency, 80986.00 ms max latency, 65806 ms 50th, 71908 ms 95th, 75999 ms 99th, 78439 ms 99.9th.
1000000 records sent, 3580.469256 records/sec (0.44 MB/sec), 58633.57 ms avg latency, 79258.00 ms max latency, 66751 ms 50th, 72502 ms 95th, 75654 ms 99th, 79041 ms 99.9th.
1000000 records sent, 3579.072448 records/sec (0.44 MB/sec), 58131.28 ms avg latency, 81238.00 ms max latency, 67639 ms 50th, 74700 ms 95th, 77092 ms 99th, 80335 ms 99.9th.
1000000 records sent, 3641.368863 records/sec (0.44 MB/sec), 56661.88 ms avg latency, 81507.00 ms max latency, 64300 ms 50th, 71344 ms 95th, 77198 ms 99th, 80228 ms 99.9th.
1000000 records sent, 3518.425997 records/sec (0.43 MB/sec), 60580.03 ms avg latency, 81170.00 ms max latency, 68228 ms 50th, 75916 ms 95th, 79228 ms 99th, 80329 ms 99.9th.
%
% ./run.sh
1000000 records sent, 3549.661540 records/sec (0.43 MB/sec), 58796.44 ms avg latency, 82366.00 ms max latency, 66147 ms 50th, 72120 ms 95th, 76364 ms 99th, 81376 ms 99.9th.
1000000 records sent, 3546.225043 records/sec (0.43 MB/sec), 59841.45 ms avg latency, 81538.00 ms max latency, 68377 ms 50th, 72789 ms 95th, 76276 ms 99th, 78989 ms 99.9th.
1000000 records sent, 3623.083389 records/sec (0.44 MB/sec), 57333.89 ms avg latency, 82825.00 ms max latency, 66017 ms 50th, 72653 ms 95th, 77360 ms 99th, 81942 ms 99.9th.
1000000 records sent, 3503.805132 records/sec (0.43 MB/sec), 58509.54 ms avg latency, 83158.00 ms max latency, 69264 ms 50th, 75921 ms 95th, 80552 ms 99th, 82687 ms 99.9th.
1000000 records sent, 3616.989724 records/sec (0.44 MB/sec), 58343.11 ms avg latency, 81113.00 ms max latency, 65664 ms 50th, 71950 ms 95th, 75684 ms 99th, 80835 ms 99.9th.
1000000 records sent, 3600.463740 records/sec (0.44 MB/sec), 58221.30 ms avg latency, 82160.00 ms max latency, 68847 ms 50th, 73830 ms 95th, 77611 ms 99th, 80940 ms 99.9th.
%
%
%
% #
% # producerByteRate == 750000 bytes/sec limit
% #
% ./reset.sh
% oc patch kafkauser quotas-user --type='merge' -p '{"spec":{"quotas":{"producerByteRate":750000}}}'
% ./run.sh
1000000 records sent, 5236.781055 records/sec (0.64 MB/sec), 39424.59 ms avg latency, 56214.00 ms max latency, 45986 ms 50th, 49080 ms 95th, 53597 ms 99th, 56021 ms 99.9th.
1000000 records sent, 5236.068132 records/sec (0.64 MB/sec), 38870.12 ms avg latency, 58028.00 ms max latency, 45819 ms 50th, 49326 ms 95th, 55073 ms 99th, 57581 ms 99.9th.
1000000 records sent, 5241.282437 records/sec (0.64 MB/sec), 39229.21 ms avg latency, 55941.00 ms max latency, 45992 ms 50th, 49271 ms 95th, 53100 ms 99th, 55810 ms 99.9th.
1000000 records sent, 5238.070295 records/sec (0.64 MB/sec), 39347.61 ms avg latency, 56800.00 ms max latency, 46124 ms 50th, 49228 ms 95th, 53577 ms 99th, 56582 ms 99.9th.
1000000 records sent, 5237.850805 records/sec (0.64 MB/sec), 39811.21 ms avg latency, 56845.00 ms max latency, 46449 ms 50th, 49447 ms 95th, 53660 ms 99th, 56311 ms 99.9th.
1000000 records sent, 5232.698084 records/sec (0.64 MB/sec), 39574.37 ms avg latency, 57791.00 ms max latency, 46249 ms 50th, 49303 ms 95th, 54138 ms 99th, 56316 ms 99.9th.
%
% ./run.sh
1000000 records sent, 5148.826840 records/sec (0.63 MB/sec), 39459.94 ms avg latency, 59085.00 ms max latency, 46438 ms 50th, 51260 ms 95th, 56444 ms 99th, 58588 ms 99.9th.
1000000 records sent, 5165.502707 records/sec (0.63 MB/sec), 40144.82 ms avg latency, 57586.00 ms max latency, 46431 ms 50th, 50429 ms 95th, 55524 ms 99th, 57415 ms 99.9th.
1000000 records sent, 5220.024012 records/sec (0.64 MB/sec), 39222.53 ms avg latency, 57482.00 ms max latency, 45839 ms 50th, 49290 ms 95th, 50452 ms 99th, 56647 ms 99.9th.
1000000 records sent, 5156.659310 records/sec (0.63 MB/sec), 39521.10 ms avg latency, 58664.00 ms max latency, 46040 ms 50th, 50611 ms 95th, 55709 ms 99th, 58535 ms 99.9th.
1000000 records sent, 5154.346918 records/sec (0.63 MB/sec), 39630.16 ms avg latency, 59011.00 ms max latency, 46280 ms 50th, 50704 ms 95th, 54500 ms 99th, 57645 ms 99.9th.
1000000 records sent, 5156.659310 records/sec (0.63 MB/sec), 41200.30 ms avg latency, 59267.00 ms max latency, 46453 ms 50th, 51327 ms 95th, 56035 ms 99th, 58313 ms 99.9th.
%
%
%
% #
% # producerByteRate == 1000000 bytes/sec limit
% #
% ./reset.sh
% oc patch kafkauser quotas-user --type='merge' -p '{"spec":{"quotas":{"producerByteRate":1000000}}}'
% ./run.sh
1000000 records sent, 6697.295632 records/sec (0.82 MB/sec), 30449.36 ms avg latency, 49013.00 ms max latency, 35699 ms 50th, 40167 ms 95th, 45597 ms 99th, 48644 ms 99.9th.
1000000 records sent, 6709.878283 records/sec (0.82 MB/sec), 30482.55 ms avg latency, 47882.00 ms max latency, 35524 ms 50th, 39514 ms 95th, 42008 ms 99th, 46190 ms 99.9th.
1000000 records sent, 6696.398677 records/sec (0.82 MB/sec), 30684.84 ms avg latency, 48765.00 ms max latency, 35861 ms 50th, 39964 ms 95th, 44186 ms 99th, 48466 ms 99.9th.
1000000 records sent, 6709.833261 records/sec (0.82 MB/sec), 30685.77 ms avg latency, 48233.00 ms max latency, 35584 ms 50th, 39333 ms 95th, 40803 ms 99th, 46214 ms 99.9th.
1000000 records sent, 6699.583956 records/sec (0.82 MB/sec), 30733.74 ms avg latency, 48472.00 ms max latency, 35849 ms 50th, 39633 ms 95th, 44065 ms 99th, 48253 ms 99.9th.
1000000 records sent, 6687.352878 records/sec (0.82 MB/sec), 31242.13 ms avg latency, 48639.00 ms max latency, 35785 ms 50th, 39763 ms 95th, 43744 ms 99th, 48542 ms 99.9th.
%
% ./run.sh
1000000 records sent, 6706.188471 records/sec (0.82 MB/sec), 30555.52 ms avg latency, 47610.00 ms max latency, 35554 ms 50th, 39519 ms 95th, 40293 ms 99th, 44658 ms 99.9th.
1000000 records sent, 6745.271565 records/sec (0.82 MB/sec), 29729.98 ms avg latency, 47815.00 ms max latency, 35595 ms 50th, 39463 ms 95th, 43008 ms 99th, 46042 ms 99.9th.
1000000 records sent, 6681.633526 records/sec (0.82 MB/sec), 31100.08 ms avg latency, 48359.00 ms max latency, 35704 ms 50th, 39660 ms 95th, 42185 ms 99th, 46458 ms 99.9th.
1000000 records sent, 6699.539072 records/sec (0.82 MB/sec), 30908.30 ms avg latency, 47988.00 ms max latency, 35631 ms 50th, 39574 ms 95th, 41632 ms 99th, 46136 ms 99.9th.
1000000 records sent, 6692.678879 records/sec (0.82 MB/sec), 30801.47 ms avg latency, 48565.00 ms max latency, 35695 ms 50th, 39837 ms 95th, 42976 ms 99th, 48388 ms 99.9th.
1000000 records sent, 6697.205926 records/sec (0.82 MB/sec), 30740.56 ms avg latency, 48329.00 ms max latency, 35759 ms 50th, 39802 ms 95th, 43005 ms 99th, 45924 ms 99.9th.
%
%
%
% #
% # producerByteRate == 1500000 bytes/sec limit
% #
% ./reset.sh
% oc patch kafkauser quotas-user --type='merge' -p '{"spec":{"quotas":{"producerByteRate":1500000}}}'
% ./run.sh
1000000 records sent, 9450.902561 records/sec (1.15 MB/sec), 22155.11 ms avg latency, 34709.00 ms max latency, 23814 ms 50th, 30887 ms 95th, 33272 ms 99th, 34534 ms 99.9th.
1000000 records sent, 9476.337585 records/sec (1.16 MB/sec), 21851.55 ms avg latency, 34040.00 ms max latency, 23266 ms 50th, 30136 ms 95th, 31863 ms 99th, 33562 ms 99.9th.
1000000 records sent, 9435.653561 records/sec (1.15 MB/sec), 22030.29 ms avg latency, 34222.00 ms max latency, 23246 ms 50th, 29952 ms 95th, 30890 ms 99th, 33592 ms 99.9th.
1000000 records sent, 9471.311398 records/sec (1.16 MB/sec), 22008.52 ms avg latency, 34231.00 ms max latency, 23128 ms 50th, 30083 ms 95th, 31007 ms 99th, 34124 ms 99.9th.
1000000 records sent, 9458.053533 records/sec (1.15 MB/sec), 21812.25 ms avg latency, 34122.00 ms max latency, 23346 ms 50th, 30110 ms 95th, 31801 ms 99th, 33788 ms 99.9th.
1000000 records sent, 9459.753479 records/sec (1.15 MB/sec), 22023.81 ms avg latency, 33985.00 ms max latency, 23449 ms 50th, 30226 ms 95th, 32030 ms 99th, 33260 ms 99.9th.
%
% ./run.sh
1000000 records sent, 9451.438509 records/sec (1.15 MB/sec), 22098.03 ms avg latency, 35401.00 ms max latency, 23224 ms 50th, 31281 ms 95th, 33989 ms 99th, 35216 ms 99.9th.
1000000 records sent, 9448.937939 records/sec (1.15 MB/sec), 22411.13 ms avg latency, 33742.00 ms max latency, 22921 ms 50th, 30623 ms 95th, 31627 ms 99th, 33628 ms 99.9th.
1000000 records sent, 9496.044897 records/sec (1.16 MB/sec), 21786.40 ms avg latency, 34255.00 ms max latency, 22791 ms 50th, 30475 ms 95th, 31653 ms 99th, 33447 ms 99.9th.
1000000 records sent, 9472.477716 records/sec (1.16 MB/sec), 21853.43 ms avg latency, 34533.00 ms max latency, 22961 ms 50th, 30879 ms 95th, 31863 ms 99th, 34321 ms 99.9th.
1000000 records sent, 9427.203137 records/sec (1.15 MB/sec), 22387.16 ms avg latency, 34809.00 ms max latency, 22923 ms 50th, 30616 ms 95th, 31772 ms 99th, 34317 ms 99.9th.
1000000 records sent, 9487.936089 records/sec (1.16 MB/sec), 21877.94 ms avg latency, 33996.00 ms max latency, 23042 ms 50th, 30528 ms 95th, 31546 ms 99th, 33742 ms 99.9th.
%
%
%
% #
% # producerByteRate == 2000000 bytes/sec limit
% #
% ./reset.sh
% oc patch kafkauser quotas-user --type='merge' -p '{"spec":{"quotas":{"producerByteRate":2000000}}}'
% ./run.sh
1000000 records sent, 11046.550162 records/sec (1.35 MB/sec), 17811.04 ms avg latency, 28974.00 ms max latency, 17074 ms 50th, 28096 ms 95th, 28274 ms 99th, 28783 ms 99.9th.
1000000 records sent, 11142.309578 records/sec (1.36 MB/sec), 17206.00 ms avg latency, 28668.00 ms max latency, 16789 ms 50th, 28049 ms 95th, 28266 ms 99th, 28567 ms 99.9th.
1000000 records sent, 11046.062079 records/sec (1.35 MB/sec), 17604.50 ms avg latency, 28958.00 ms max latency, 17190 ms 50th, 28110 ms 95th, 28441 ms 99th, 28690 ms 99.9th.
1000000 records sent, 11045.086041 records/sec (1.35 MB/sec), 18031.84 ms avg latency, 28980.00 ms max latency, 17270 ms 50th, 28213 ms 95th, 28457 ms 99th, 28853 ms 99.9th.
1000000 records sent, 11064.517200 records/sec (1.35 MB/sec), 17624.96 ms avg latency, 28885.00 ms max latency, 17074 ms 50th, 28036 ms 95th, 28274 ms 99th, 28720 ms 99.9th.
1000000 records sent, 11076.773114 records/sec (1.35 MB/sec), 17571.82 ms avg latency, 28971.00 ms max latency, 17095 ms 50th, 28130 ms 95th, 28405 ms 99th, 28720 ms 99.9th.
%
% ./run.sh
1000000 records sent, 11145.041571 records/sec (1.36 MB/sec), 17565.70 ms avg latency, 29577.00 ms max latency, 16727 ms 50th, 28413 ms 95th, 29348 ms 99th, 29499 ms 99.9th.
1000000 records sent, 11222.085063 records/sec (1.37 MB/sec), 16965.30 ms avg latency, 29518.00 ms max latency, 16482 ms 50th, 28629 ms 95th, 29162 ms 99th, 29447 ms 99.9th.
1000000 records sent, 11164.327740 records/sec (1.36 MB/sec), 17499.19 ms avg latency, 29785.00 ms max latency, 16860 ms 50th, 28746 ms 95th, 29405 ms 99th, 29735 ms 99.9th.
1000000 records sent, 11185.181871 records/sec (1.37 MB/sec), 17140.81 ms avg latency, 29557.00 ms max latency, 16782 ms 50th, 28800 ms 95th, 29257 ms 99th, 29497 ms 99.9th.
1000000 records sent, 11157.725609 records/sec (1.36 MB/sec), 17320.91 ms avg latency, 29626.00 ms max latency, 16742 ms 50th, 28686 ms 95th, 29356 ms 99th, 29527 ms 99.9th.
1000000 records sent, 11205.109530 records/sec (1.37 MB/sec), 17126.51 ms avg latency, 29499.00 ms max latency, 16709 ms 50th, 28600 ms 95th, 29258 ms 99th, 29412 ms 99.9th.
%
%
%
% #
% # producerByteRate == 2500000 bytes/sec limit
% #
% ./reset.sh
% oc patch kafkauser quotas-user --type='merge' -p '{"spec":{"quotas":{"producerByteRate":2500000}}}'
% ./run.sh
1000000 records sent, 13439.595737 records/sec (1.64 MB/sec), 15209.93 ms avg latency, 28944.00 ms max latency, 13914 ms 50th, 26966 ms 95th, 28132 ms 99th, 28829 ms 99.9th.
1000000 records sent, 11995.297843 records/sec (1.46 MB/sec), 15264.28 ms avg latency, 28830.00 ms max latency, 13997 ms 50th, 26800 ms 95th, 28066 ms 99th, 28620 ms 99.9th.
1000000 records sent, 13479.087196 records/sec (1.65 MB/sec), 15229.56 ms avg latency, 28986.00 ms max latency, 13863 ms 50th, 26827 ms 95th, 28251 ms 99th, 28904 ms 99.9th.
1000000 records sent, 13494.184007 records/sec (1.65 MB/sec), 15138.03 ms avg latency, 28278.00 ms max latency, 13811 ms 50th, 26839 ms 95th, 27886 ms 99th, 28204 ms 99.9th.
1000000 records sent, 13694.690568 records/sec (1.67 MB/sec), 14699.72 ms avg latency, 29074.00 ms max latency, 13814 ms 50th, 27014 ms 95th, 28203 ms 99th, 28943 ms 99.9th.
1000000 records sent, 13559.322034 records/sec (1.66 MB/sec), 15082.32 ms avg latency, 29343.00 ms max latency, 14393 ms 50th, 26678 ms 95th, 28575 ms 99th, 29286 ms 99.9th.
%
% ./run.sh
1000000 records sent, 13187.045047 records/sec (1.61 MB/sec), 15581.39 ms avg latency, 32534.00 ms max latency, 13971 ms 50th, 29532 ms 95th, 32097 ms 99th, 32424 ms 99.9th.
1000000 records sent, 13282.682039 records/sec (1.62 MB/sec), 15350.16 ms avg latency, 30509.00 ms max latency, 13918 ms 50th, 29496 ms 95th, 30103 ms 99th, 30401 ms 99.9th.
1000000 records sent, 13218.246467 records/sec (1.61 MB/sec), 15505.20 ms avg latency, 30162.00 ms max latency, 13994 ms 50th, 29560 ms 95th, 29939 ms 99th, 30084 ms 99.9th.
1000000 records sent, 13181.830165 records/sec (1.61 MB/sec), 15574.85 ms avg latency, 30545.00 ms max latency, 14027 ms 50th, 29652 ms 95th, 30150 ms 99th, 30424 ms 99.9th.
1000000 records sent, 13244.857684 records/sec (1.62 MB/sec), 15367.83 ms avg latency, 30545.00 ms max latency, 13997 ms 50th, 29587 ms 95th, 30193 ms 99th, 30493 ms 99.9th.
1000000 records sent, 13250.824864 records/sec (1.62 MB/sec), 15480.74 ms avg latency, 30510.00 ms max latency, 13992 ms 50th, 29755 ms 95th, 30180 ms 99th, 30429 ms 99.9th.
%
%
%
% #
% # producerByteRate == unlimited
% #
% ./reset.sh
% oc patch kafkauser quotas-user --type='json' -p '[{"op":"remove", "path":"/spec/quotas"}]'
% ./run.sh
1000000 records sent, 13575.152042 records/sec (1.66 MB/sec), 15001.90 ms avg latency, 28496.00 ms max latency, 13710 ms 50th, 27965 ms 95th, 28274 ms 99th, 28452 ms 99.9th.
1000000 records sent, 13573.677924 records/sec (1.66 MB/sec), 14931.06 ms avg latency, 29120.00 ms max latency, 13766 ms 50th, 28054 ms 95th, 28301 ms 99th, 28805 ms 99.9th.
1000000 records sent, 13589.726167 records/sec (1.66 MB/sec), 14906.53 ms avg latency, 28544.00 ms max latency, 13653 ms 50th, 28083 ms 95th, 28249 ms 99th, 28384 ms 99.9th.
1000000 records sent, 13601.001034 records/sec (1.66 MB/sec), 14855.02 ms avg latency, 28855.00 ms max latency, 13894 ms 50th, 28184 ms 95th, 28485 ms 99th, 28683 ms 99.9th.
1000000 records sent, 13573.309444 records/sec (1.66 MB/sec), 15082.81 ms avg latency, 28992.00 ms max latency, 13940 ms 50th, 28026 ms 95th, 28695 ms 99th, 28911 ms 99.9th.
1000000 records sent, 13581.051717 records/sec (1.66 MB/sec), 14864.68 ms avg latency, 28691.00 ms max latency, 13645 ms 50th, 27992 ms 95th, 28328 ms 99th, 28481 ms 99.9th.
%
% ./run.sh
1000000 records sent, 13647.405628 records/sec (1.67 MB/sec), 15097.44 ms avg latency, 28727.00 ms max latency, 14045 ms 50th, 27803 ms 95th, 28416 ms 99th, 28617 ms 99.9th.
1000000 records sent, 13649.454704 records/sec (1.67 MB/sec), 15184.59 ms avg latency, 28819.00 ms max latency, 14072 ms 50th, 28081 ms 95th, 28602 ms 99th, 28749 ms 99.9th.
1000000 records sent, 13681.762211 records/sec (1.67 MB/sec), 15020.49 ms avg latency, 28231.00 ms max latency, 13950 ms 50th, 27848 ms 95th, 28072 ms 99th, 28171 ms 99.9th.
1000000 records sent, 13761.972916 records/sec (1.68 MB/sec), 14979.00 ms avg latency, 28559.00 ms max latency, 13921 ms 50th, 27947 ms 95th, 28281 ms 99th, 28453 ms 99.9th.
1000000 records sent, 13876.169067 records/sec (1.69 MB/sec), 14781.38 ms avg latency, 28420.00 ms max latency, 13916 ms 50th, 27910 ms 95th, 28173 ms 99th, 28333 ms 99.9th.
1000000 records sent, 13627.691469 records/sec (1.66 MB/sec), 15077.66 ms avg latency, 28504.00 ms max latency, 14083 ms 50th, 27925 ms 95th, 28224 ms 99th, 28390 ms 99.9th.
%
%

Throwing the summary reports into Excel and generating default graphs for a few of the columns makes it easier to see the impact.

graph showing records sent per second with different producer quota thresholds
Showing the number of records sent per second by each producer with different producer quota thresholds set.

graph showing throughput with different producer quota thresholds
Showing the total throughput of each producer with different producer quota thresholds set.

graph showing average latency for each producer app with different producer quota thresholds
Showing the average latency for each producer with different producer quota thresholds set.

As you can see, this is very easy to set, and has an immediate impact. If you’re running an Event Streams cluster being used by different teams or projects, I recommend considering applying quotas to protect your teams.

One last pointer: I mentioned above that quotas don’t prevent applications from working, they just slow the applications down if they exceed their quota. So how can you tell when this is happening?

The easiest place to check is the Producers tab for your topic in the Event Streams admin interface.

screenshot
see larger image

Scroll down to the list of applications, and if everything is happy with none of the applications being throttled by their quota, you’ll see “Quota reached 0 times”.

screenshot
see larger image

To show how this will change, I set a very very small producerByteRate quota and then re-ran my test one final time.

Checking back on the list of applications, I can see that my applications are trying to exceed this quota.

screenshot
see larger image

This makes it easy for you to keep an eye on whether you’ve set the quota to a level that is sufficient for what your project team’s applications are trying to do.

To save you having to constantly watch the UI, Event Streams also makes these metrics available through the Grafana/Prometheus monitoring stack, so you could configure an alert to send you a notification if the “quota reached” metric value starts to increase.

I’ve put the details of how I ran these performance tests in github.com/dalelane/event-streams-quotas-demo so you can see how you could run the same tests against your own Event Streams cluster.

For more information about any of this, please see the Kafka documentation on quotas or the Event Streams documentation on how to set your client quotas.

Tags: , , ,

Comments are closed.