dkduckkit.dev

Kafka rebalance protocol (Eager vs Cooperative)

Kafka

Kafka supports two rebalance protocols. The Eager (Stop-the-World) protocol, the default before Kafka 2.4, requires all consumers to revoke all partition assignments before any new assignments are made — causing a full consumption pause. The Cooperative Sticky protocol (Kafka 2.4+, CooperativeStickyAssignor) only revokes partitions that need to move, allowing consumers to continue processing unaffected partitions during the rebalance.

Why it matters in practice

For a group with 100 partitions and 10 consumers, an Eager rebalance means all 100 partitions stop processing simultaneously — potentially for 10–30 seconds. During a rolling deployment that restarts one consumer at a time, this causes 10 full stop-the-world pauses. With Cooperative Sticky, only the partitions being moved are paused; the others continue uninterrupted. For production Kafka deployments, migrating to Cooperative Sticky assignor is one of the highest-impact reliability improvements available with zero application code changes.

Common mistakes

  • Still using the default Eager protocol in Kafka 2.4+ deployments — missing out on significant reliability improvements.
  • Not testing Cooperative Sticky in staging before production — while it's generally beneficial, some applications may have assumptions about rebalance behavior.
  • Having idle consumers in the group during rebalance — idle consumers still participate in rebalances and can make them longer.