Producer compression (Kafka)
Kafka producer compression reduces the size of message batches before they are sent to the broker. Compression is applied at the batch level — the entire RecordBatch is compressed as a unit, not individual records. This means larger batches (higher linger.ms or batch.size) compress more efficiently than small batches. The compressed batch is stored on the broker and decompressed by consumers.
Formula
Codec comparison for typical JSON payloads:
none: 1× size, 0 CPU overhead
snappy: ~0.5× size, low CPU, fast decompression
lz4: ~0.45× size, very low latency overhead
gzip: ~0.35× size, higher CPU, good for large batches
zstd: ~0.30× size, best ratio, Kafka 2.1+, recommended for new deploymentsWhy it matters in practice
Compression can reduce Kafka storage and network costs by 50–70% for typical JSON workloads, but only if batches are large enough to compress efficiently. A batch of 1 message compresses poorly because compression algorithms need repetition to find patterns. With linger.ms=0 (the default), every message is sent as its own batch — compression overhead exceeds savings for small messages. For compression to be effective, set linger.ms to 5–20 ms and batch.size to at least 64 KB.
Common mistakes
- •Enabling compression with linger.ms=0 — each single-message batch compresses to near its original size while adding CPU overhead.
- •Not matching compression.type between producer and consumer expectations — consumers decompress transparently, but monitoring tools may show compressed sizes without indicating codec.
- •Using gzip for latency-sensitive producers — gzip is the slowest codec; use lz4 or zstd when compression latency matters.