Methodology
The UI calls calculateAll in order: message size (payload, key, headers, format overhead, batching) → throughput → storage (replication, retention, small index allowance) → recommendations. Compression ratios use a browser-side simulation, not broker-measured stats. The tool does not model exact on-disk compaction, tiered storage, or cluster-wide controller overhead.
What Affects Kafka Message Size
The total Kafka message size depends on several factors: your payload (body), the serialization format (JSON, Avro, Protobuf), optional key and headers, and Kafka's internal record overhead. Each record adds roughly 18 bytes of protocol overhead, plus a shared batch header. Understanding these components helps you optimize for throughput and storage.
How Compression Works in Kafka
Kafka supports multiple compression codecs: none, gzip, snappy, lz4, and zstd. Compression happens on the producer side before batching. For typical JSON payloads, compression.type=zstd achieves ~70% size reduction, while snappy (~50%) is faster but less efficient. Enable linger.ms=5 or higher when using compression to improve batch efficiency.
Producer vs. broker perspective: The producer compresses a batch before sending — this determines producer CPU cost and network egress. The broker stores the compressed batch on disk and replicates it RF times. These are two different resource constraints: producer CPU scales with compression codec complexity (zstd costs more CPU than snappy), while broker disk I/O scales with message volume × replication factor.
Kafka Storage Formula
Storage per second = messages per second × compressed message size × replication factor. Add ~2% for index files (.index, .timeindex). Total retention storage = storage per hour × retention hours. Use this kafka storage calculator to estimate disk requirements before provisioning brokers.
When to Increase Partition Count
A practical rule: one partition can handle roughly 10 MB/s of producer throughput. If your kafka message size and message rate exceed that, add partitions. For kafka partition sizing, consider both throughput (producer bandwidth) and parallelism (one consumer per partition). Over-partitioning increases metadata overhead; under-partitioning limits consumer parallelism.
Once you know your message size, plan your consumer group capacity with the Kafka Consumer Lag Predictor — model throughput, partition ceilings, and time-to-threshold before lag becomes an incident.
Copy-paste solution
# broker: log retention sanity (example — tune for your cluster) log.retention.hours=168 compression.type=lz4 # Validate with real producer throughput after using the calculator
Broker and topic defaults are documented on kafka.apache.org.