dkduckkit.dev

Kafka RecordBatch

Kafka

A Kafka RecordBatch is a container for multiple ProducerRecord instances that are sent together to the broker as a single network request. Each RecordBatch contains one or more records, all of which share the same partition, producer timestamp, and compression codec. The batch is the unit of compression, offset assignment, and replication — not individual records. Consumers receive and decompress entire batches, then process individual records.

Why it matters in practice

RecordBatch is the fundamental throughput lever in Kafka. A single record sent alone pays the full network RTT overhead. A batch of 100 records pays the same RTT but carries 100× the data. This is why linger.ms and batch.size are critical for throughput — they control how many records accumulate into each batch. For compression, the batch is even more important: compression algorithms find patterns across multiple records, achieving much better ratios than compressing individual records.

Common mistakes

  • Treating individual records as the unit of network transfer — the actual unit is RecordBatch, which explains why single messages have high overhead.
  • Not understanding that compression is per-batch — small batches compress poorly regardless of codec.
  • Assuming batch size equals message count — batch.size is in bytes, not record count; a few large messages can fill a batch meant for many small ones.