dkduckkit.dev
← Blog
·10 min read·Related tool →

Kafka Broker Sizing and Capacity Planning: NIC Headroom, Disk, and Partition Limits

Why average load misleads, the 70% NIC rule, replication and consumer fanout math, and how partition counts — not headline throughput — often drive broker count.

A cluster running at nominal load looks fine. Then a seasonal traffic spike hits. The primary broker's NIC saturates at 100%. The kernel starts dropping packets — TCP retransmissions, producer backpressure, buffers fill. Follower brokers can't complete replication fetches within replica.lag.time.max.ms, so the controller marks them out-of-sync. The ISR shrinks. When the network clears, consumers try to catch up on accumulated lag — but the data is no longer in the page cache, forcing random disk reads. The cluster is now stuck in degraded performance it cannot exit without manual throttling.

Kafka broker sizing is not about meeting average demand. It is about provisioning for the recovery phase after peak load.

The four variables that drive broker sizing

Four variables determine every resource in a Kafka cluster. They interact — getting one wrong cascades through the others.

VariablePrimary impactKey metric
ThroughputNetwork bandwidthBytesInPerSec / BytesOutPerSec
Replication factorStorage + internal networkUnderReplicatedPartitions
Log retentionDisk capacityDiskUsagePercentage
Consumer fanoutEgress networkFetchLatency

RF=3 means every byte produced generates 2× more internal replication traffic. A 100 MB/s produce rate becomes 300 MB/s of total broker write I/O. Consumer fanout multiplies egress: four consumer groups reading a 100 MB/s topic require 400 MB/s of egress capacity — before accounting for consumer lag catch-up reads, which bypass the page cache and force disk I/O.

The 70% NIC rule

Never design a Kafka broker for more than 70% NIC utilization at peak. The 30% headroom is not spare capacity — it is reserved for three specific operational events:

  • Broker failure recovery — when a broker fails, its partitions must be re-replicated to a replacement. This generates a burst of internal replication traffic on top of normal production load. At 100% NIC, replication throttles and the under-replicated window extends — maximising the data-loss risk window.
  • Consumer catch-up — after an outage or a new consumer deployment, consumer groups replay historical data. This egress burst easily consumes 20–30% of available bandwidth.
  • Ingress spikes — real traffic is not a flat line. Marketing events, IoT bursts, and daily cycles create peaks. Headroom absorbs them without triggering producer backpressure.
NIC speedUsable at 70%Safe produce rate (RF=3)
1 Gbps87 MB/s~29 MB/s
10 Gbps875 MB/s~290 MB/s
25 Gbps2,187 MB/s~730 MB/s

1 Gbps NICs are insufficient for production Kafka clusters with meaningful throughput. 10 Gbps is the standard baseline; 25 Gbps for high-scale deployments.

JVM heap: why 6 GB is the sweet spot

Kafka's performance comes from its zero-copy architecture — messages are written to the OS page cache and served to consumers via sendfile(), bypassing the JVM heap entirely. Every gigabyte allocated to the heap is a gigabyte unavailable to the page cache.

On a 64 GB machine, a 6 GB heap leaves ~58 GB for the page cache. A 32 GB heap leaves ~32 GB — half the hot log tail that can be served from RAM. The performance impact is disproportionate: when the page cache fills up, reads fall through to disk, and the sequential read advantage that makes Kafka fast disappears.

GC pause duration is the other constraint. With G1GC and a 6 GB heap, stop-the-world pauses stay under 200ms — within Kafka's internal timeout margins. At 32 GB heaps, pauses can exceed 2 seconds, causing the controller to mark the broker as dead and trigger a partition reassignment.

# Kafka broker JVM flags
KAFKA_HEAP_OPTS="-Xmx6g -Xms6g"
KAFKA_JVM_PERFORMANCE_OPTS="-XX:+UseG1GC -XX:MaxGCPauseMillis=20"

# Kernel settings for high partition counts
vm.max_map_count=262144   # memory-mapped log segments
ulimit -n 100000          # open file descriptors

HDD vs SSD vs NVMe

Kafka's sequential write pattern makes it unusually friendly to spinning disks — a 7,200 RPM SATA drive sustains ~150 MB/s sequential write. In practice, multi-tenant clusters with multiple consumer groups reading different topics create mixed sequential and random I/O, degrading HDD throughput significantly.

Storage typeSequential writeLatencyUse case
HDD~150 MB/s4–8 msArchival / tiered storage
SATA SSD~500 MB/s0.1–0.5 msMid-tier production
NVMe (PCIe 4)3,000–5,000 MB/s<0.1 msHigh-throughput / low-latency

JBOD over RAID — since KIP-112, Kafka manages disk failure at the partition level. If one disk in a JBOD array fails, only the partitions on that disk go offline — the broker stays up, serving the remaining partitions. RAID-10 halves usable capacity at the hardware level, which is redundant since Kafka already replicates across brokers. RAID rebuild I/O also degrades the broker to the point of becoming a traffic black hole during recovery.

Sizing formula — worked example

Inputs: 400 MB/s peak produce rate, RF=3, 4 consumer groups, 5-day retention, 10 Gbps NICs.

// Network requirement per broker
totalEgressMbps = produceRate × (RF + consumerGroups)
                = 400 × (3 + 4) = 2,800 MB/s cluster total

// At 70% NIC utilization: 10 Gbps = 1,250 MB/s usable
brokersForNetwork = ceil(2,800 / 1,250) = 3 brokers (minimum)

// Storage requirement
diskPerBroker = (produceRate × RF / brokerCount) × retentionSeconds / 1024
              = (400 × 3 / 3) × (5 × 86400) / 1024
              = 400 × 432000 / 1024 ≈ 168 TB per broker (3 brokers)

// With 30% headroom:
diskWithHeadroom = 168 × 1.30 ≈ 219 TB per broker

// Partition limit check (conservative: 4,000 per broker)
// If cluster has 12,000 partitions × RF=3 = 36,000 replicas:
brokersForPartitions = ceil(36,000 / 4,000) = 9 brokers

The final broker count is the maximum across all three constraints. In this example, partition count drives the answer to 9 brokers:

ConstraintWith 3 brokersWith 9 brokers
Network egress per broker933 MB/s (75% — tight)311 MB/s (25% — safe)
Disk per broker219 TB73 TB
Partitions per broker12,000 (exceeded)4,000 (healthy)

With 9 brokers, every constraint is within safe limits. The network headroom absorbs a broker failure without saturating the remaining nodes, and partition management stays within the conservative limit.

Use the Kafka Broker Sizing Calculator to run this calculation for your specific throughput, retention, and consumer configuration — and pair it with the Kafka Message Size Calculator to account for per-message overhead before cluster-level sizing.