dkduckkit.dev

Kafka time lag

Kafka

Kafka time lag is the elapsed time between when a message was produced and when it was consumed (or committed) by a consumer group, expressed in seconds or milliseconds rather than message counts. It is a more intuitive SLO metric than offset lag because it directly answers "how stale is this data?" without requiring knowledge of the production rate.

Formula

time_lag = consumer_timestamp − message_timestamp. Requires reading message timestamps from the Kafka record metadata.

Why it matters in practice

A consumer group with 50,000 messages of offset lag at 10,000 msg/s is 5 seconds behind — probably fine. The same 50,000 messages at 100 msg/s is 8 minutes behind — potentially a serious data freshness problem. Time lag makes this distinction explicit. For SLOs that require "data processed within 60 seconds of production", time lag is the direct metric to alert on. Tools like Burrow and Kafka Lag Exporter compute time lag from message timestamps, eliminating the need to correlate offset lag with throughput separately.

Common mistakes

  • Using offset lag as a proxy for time lag without knowing the production rate — the two diverge significantly when throughput varies.
  • Not including message timestamps in produced records — without record.timestamp.type=CreateTime, time lag cannot be computed.
  • Setting time lag alerts too tight for batch-processing consumers that legitimately process data hours or days after production.