dkduckkit.dev

Consumer rebalance (Kafka)

Kafka

A Kafka consumer group rebalance is the process of redistributing partition assignments among the active consumers in a group. It is triggered by any membership change: a consumer joining, leaving, or being considered dead due to missed heartbeats or exceeded max.poll.interval.ms. During an Eager Rebalance (the default pre-Kafka 2.4 protocol), all consumers revoke their partition assignments simultaneously — creating a stop-the-world pause for the entire group.

Why it matters in practice

Rebalances are the leading cause of consumer lag spikes in production. A group with 100 partitions and 10 consumers, during a rolling deployment that restarts consumers one at a time, may rebalance 10 times — each rebalance pausing all consumers for 5–30 seconds. With Cooperative Sticky Assignor (Kafka 2.4+), only the partitions that need to move are revoked; the rest continue processing. This is the single most impactful configuration change for production Kafka deployments with frequent scaling events.

Common mistakes

  • Using the default Eager Rebalance Protocol (RangeAssignor or RoundRobinAssignor) in environments with frequent scaling — every deployment triggers a full stop-the-world rebalance.
  • Not monitoring rebalance frequency and duration — a group that rebalances every few minutes may have a bug in consumer logic that triggers repeated evictions.
  • Having more consumers than partitions in a group — idle consumers still participate in rebalances, making them longer without contributing throughput.