dkduckkit.dev

Replication factor (Kafka)

Kafka

The Kafka replication factor (RF) determines how many brokers store a copy of each partition. With RF=3, there is one leader and two followers per partition. The leader handles all reads and writes; followers fetch from the leader to stay in sync. A higher replication factor provides greater durability (can survive N-1 broker failures) but multiplies storage cost and replication network traffic by RF.

Why it matters in practice

For acks=all producers, every write must be acknowledged by all in-sync replicas (ISR) before the producer receives a success response. With RF=3 and replicas across availability zones, the produce latency includes at least one cross-AZ RTT for the follower acknowledgement. replica.fetch.max.bytes is the most dangerous related configuration: if it is smaller than message.max.bytes, brokers will accept messages from producers but fail to replicate them silently — the message exists only on the leader and is lost if the leader fails.

Common mistakes

  • Not increasing replica.fetch.max.bytes when increasing message.max.bytes — this is the most common silent data durability bug in Kafka deployments.
  • Using RF=1 in development and RF=3 in production without testing the latency difference — the cross-AZ replication RTT changes p99 produce latency significantly.
  • Not monitoring under-replicated partitions (kafka.server:type=ReplicaManager,name=UnderReplicatedPartitions) — this metric shows when replicas have fallen behind the leader.