We had an interesting Kafka question from an Event Streams user. The answer isn’t immediately obvious unless you know a bit about Kafka internals, and after a little searching I couldn’t find an explanation online, so I thought I’d share the answer here (obviously anonymised and heavily simplified).
What is retention?
Retention is a Kafka feature to help you manage the amount of disk space your topics use.
It lets you specify how long you want Kafka to keep messages on a topic for. You can specify this by time (e.g. “I want messages on this topic to be preserved for at least X days”) or by disk usage (e.g. “I want at least the last X gb of messages on this topic to be preserved”).
After the retention time or disk threshold is exceeded, messages become eligible for being automatically deleted by Kafka.
What was wrong in this case?
They had created a topic with a retention time of 7 days.
They had assumed that this meant messages older than 7 days would be deleted.
When they looked at the messages on their topic, they could see some messages older than 7 days were there, and were surprised.
They thought this might mean retention wasn’t working.
(more…)