Replication factor (Kafka)
The Kafka replication factor (RF) determines how many brokers store a copy of each partition. With RF=3, there is one leader and two followers per partition. The leader handles all reads and writes; followers fetch from the leader to stay in sync. A higher replication factor provides greater durability (can survive N-1 broker failures) but multiplies storage cost and replication network traffic by RF.
Why it matters in practice
For acks=all producers, every write must be acknowledged by all in-sync replicas (ISR) before the producer receives a success response. With RF=3 and replicas across availability zones, the produce latency includes at least one cross-AZ RTT for the follower acknowledgement. replica.fetch.max.bytes is the most dangerous related configuration: if it is smaller than message.max.bytes, brokers will accept messages from producers but fail to replicate them silently — the message exists only on the leader and is lost if the leader fails.
Common mistakes
- •Not increasing replica.fetch.max.bytes when increasing message.max.bytes — this is the most common silent data durability bug in Kafka deployments.
- •Using RF=1 in development and RF=3 in production without testing the latency difference — the cross-AZ replication RTT changes p99 produce latency significantly.
- •Not monitoring under-replicated partitions (kafka.server:type=ReplicaManager,name=UnderReplicatedPartitions) — this metric shows when replicas have fallen behind the leader.
Related Terms
replica.fetch.max.bytes
Maximum bytes follower fetches from leader per request.
message.max.bytes
Maximum size of single message batch broker accepts.
Kafka partition
Fundamental unit of parallelism and ordering within a topic.
Cross-AZ latency
Round-trip time between availability zones; adds cost and latency to replication.