Kafka rebalance protocol (Eager vs Cooperative)
Kafka supports two rebalance protocols. The Eager (Stop-the-World) protocol, the default before Kafka 2.4, requires all consumers to revoke all partition assignments before any new assignments are made — causing a full consumption pause. The Cooperative Sticky protocol (Kafka 2.4+, CooperativeStickyAssignor) only revokes partitions that need to move, allowing consumers to continue processing unaffected partitions during the rebalance.
Why it matters in practice
For a group with 100 partitions and 10 consumers, an Eager rebalance means all 100 partitions stop processing simultaneously — potentially for 10–30 seconds. During a rolling deployment that restarts one consumer at a time, this causes 10 full stop-the-world pauses. With Cooperative Sticky, only the partitions being moved are paused; the others continue uninterrupted. For production Kafka deployments, migrating to Cooperative Sticky assignor is one of the highest-impact reliability improvements available with zero application code changes.
Common mistakes
- •Still using the default Eager protocol in Kafka 2.4+ deployments — missing out on significant reliability improvements.
- •Not testing Cooperative Sticky in staging before production — while it's generally beneficial, some applications may have assumptions about rebalance behavior.
- •Having idle consumers in the group during rebalance — idle consumers still participate in rebalances and can make them longer.