dkduckkit.dev

Log retention (Kafka)

Kafka

Kafka log retention controls how long messages are kept on disk before being eligible for deletion. It can be configured by time (log.retention.hours, default 168 = 7 days), by size (log.retention.bytes, default unlimited), or both (whichever limit is reached first). Retention is enforced per partition at the segment level — Kafka only deletes closed (inactive) segments. The active segment (the one currently being written to) is never deleted regardless of its age.

Formula

Active segment caveat: A low-volume partition may have a single active segment that never reaches log.segment.bytes (default 1 GB), meaning it is never closed and therefore never deleted — even after log.retention.hours has elapsed. On such partitions, data can persist significantly longer than the configured retention period. The fix is to set log.segment.ms (default 7 days) to force segment rotation based on time rather than size, ensuring the segment is closed and becomes eligible for retention-based deletion.

Why it matters in practice

Log retention directly bounds how far a slow consumer can lag before messages are lost. With 7-day retention and a lag growing at 200 messages/s, a consumer has at most 7 days worth of messages as a buffer. But if a consumer group is paused for 8 days (a holiday weekend + monitoring blind spot), the committed offset can fall outside the retention window. On resume, the consumer receives an OffsetOutOfRangeException — the previously consumed position no longer exists. No error is surfaced at produce time; the data is simply gone.

Common mistakes

  • Setting very short retention (< 24 hours) for topics consumed by batch jobs that run daily — a delay in the batch job leaves no buffer for recovery.
  • Not monitoring the ratio of consumer lag to retention — lag_in_time / retention_hours gives a clear signal of how close a consumer is to data loss.
  • Confusing log retention with consumer group offset expiration (offsets.retention.minutes, default 7 days) — offsets of inactive consumer groups expire independently of message retention.