dkduckkit.dev

Throughput vs bandwidth (Kafka)

Kafka

Throughput is the rate of data actually processed or transferred — measured in messages/second or MB/s achieved under real workload conditions. Bandwidth is the maximum capacity of a link or storage device — the theoretical ceiling. In Kafka sizing, throughput is what your cluster delivers; bandwidth is what your NICs, disks, and replication links can sustain. The gap between them is your headroom.

Formula

Sizing formula: required_bandwidth = producer_throughput × replication_factor + consumer_throughput. A topic producing 100 MB/s with RF=3 and 2 consumer groups requires: 100 × 3 (replication) + 200 (consumers) = 500 MB/s of broker network bandwidth.

Why it matters in practice

Kafka broker NIC saturation is a common production bottleneck that appears as elevated produce latency and growing consumer lag. A 10 GbE NIC (1,250 MB/s) handles 100 MB/s comfortably, but adding replication and multiple consumer groups can push total bandwidth above the NIC limit. Monitoring per-broker BytesInPerSec and BytesOutPerSec JMX metrics reveals when bandwidth is the constraint rather than CPU or disk.

Common mistakes

  • Sizing only for producer throughput without accounting for replication and consumer traffic — total broker bandwidth is a multiple of producer throughput.
  • Using raw disk IOPS as the sizing metric — sequential write throughput (MB/s) is the relevant metric for Kafka, not random IOPS.
  • Not leaving headroom for rebalances and partition leader elections — these create short bursts of elevated replication traffic.