dkduckkit.dev

Kafka Message Size Calculator

Calculate Kafka message size, storage, bandwidth and optimal configuration. Compression, batching and replication.

Last updated: March 2026

TL;DR

This calculator turns message size, batching, compression, replication, and retention into disk and bandwidth estimates so you can sanity-check brokers before you provision.

Formula: Approximate bytes/sec ≈ (messages/sec × effective message size × replication factor).

When to use this

  • Sizing new topics or clusters from expected producer traffic.
  • Comparing compression codecs and batch sizes for cost tradeoffs.

How the math works

How the math worksLaTeX model and TypeScript reference — same logic as the calculator on this page.

This describes the implementation behind the numbers as of 2026-03-26. It is engineering documentation, not legal or compliance advice.

Specification citation

Logic reflects our proprietary implementation of the following public specifications: Apache Kafka documentation.

This snippet represents the core logic of our proprietary calculation engine, verified against Apache Kafka documentation and common message-size / throughput sizing practice.

Model (LaTeX source)
Kafka message size (duckkit.dev model)

Let B_body be average message body bytes, f_format the serialization overhead factor,
B_key wire key bytes, H headers bytes, O_record record framing floor, O_batch batch overhead
amortized per message, ρ_c compression ratio (compressed/raw, capped at 1).

B_raw = round(B_body · f_format) + O_record + O_batch + B_key + H
B_comp = min(B_raw, ceil(B_raw · ρ_c))

Throughput and storage modules consume B_comp and cluster inputs (replication factor,
messages/s, retention) for bandwidth and disk estimates.
Reference implementation (TypeScript, excerpt from shipped modules)
// lib/kafka-calculator/calculate-all.ts — pipeline
export function calculateAll(inputs: KafkaInputs): KafkaResults {
  const messageSize = calculateMessageSize(inputs)
  const throughput = calculateThroughput(inputs, messageSize)
  const storage = calculateStorage(inputs, messageSize)
  const recommendations = calculateRecommendations(
    inputs,
    throughput,
    messageSize,
    storage,
  )
  return { messageSize, throughput, storage, recommendations }
}

// lib/kafka-calculator/message-size.ts — format + compression cap
const rawPayloadBytes = Math.max(
  0,
  Math.round(inputs.averageBodyBytes * FORMAT_OVERHEAD[inputs.dataFormat]),
)
const compressedBytes = Math.min(
  totalRawBytes,
  Math.ceil(totalRawBytes * COMPRESSION_RATIOS[inputs.compressionType]),
)

Compressed message size 552 B. Cluster bandwidth 1.66 megabytes per second. Storage about 5.66 gibibytes per hour.

At a glance

Compressed
552 B
Cluster MB/s
1.66
GiB / hour
5.66

Configuration

Adjust values — results update automatically (short debounce).

Live results

Quick presets

Applies batching, linger, and compression only — payload and cluster fields stay as you set them.

Payload

Size and shape of each value (before batch framing).

Batching & compression

Producer pacing controls how records are grouped before send.

Producer batching

Cluster & retention

Replication and how long data stays on disk (sizing only).

Results

Estimates from your inputs — expand sections to focus.

How one record adds up on the wire before batching effects. Total = payload + key + headers + Kafka record overhead

Payload (512B)+Key (16 B)+Headers (0 B)+Kafka overhead (24 B)=552B raw

Compression is off — wire size equals raw total. Enable a codec above to see compressed size and savings.

552B
Total raw (on wire)
24B
Kafka record overhead
1.656MB/s
Total cluster bandwidth required
Ingress to brokers: msgs/s × compressed size × replication factor (3×)
×3 RF

✓ Within a typical range for many Kafka workloads

Producer & batching

0.552MB/s
Producer network (compressed)
Bytes sent by producer/sec. CPU cost: zstd > gzip > snappy.
1.656MB/s
Broker disk write rate
Actual bytes written to broker disk/sec (compressed × 3 RF).
1.104MB/s
Inter-broker replication
Leader → followers (RF−1 paths); see note below
30msg
Messages / batch
33.3batch/s
Batches / second

Consumer & cross-AZ (FinOps model)

0.182MB/s
Est. paid cross-AZ consumer egress
~33% of consumer fetch (0.33×); not broker replication
0.552MB/s
Consumer fetch (all replicas)
msgs/s × compressed — baseline before cross-AZ factor
Formulas & broker replication detail
  • Producer MB/s = (msg/s × compressed bytes) / 10⁶ — same on-wire basis as cluster ingress/replication/fetch below.
  • Cluster network MB/s = (msg/s × compressed × RF) / 10⁶ — matches bytes brokers must accept from clients for this topic/partition mix (simplified).
  • Replication MB/s = (msg/s × compressed × (RF−1)) / 10⁶ — traffic from leader to follower brokers only.

💡 MSK / Confluent Cloud: broker-to-broker replication is included in the service; cross-AZ fees usually apply to client connections, not this internal line item. Self-hosted: counts toward broker NIC utilization.

5.663GiB/h
Storage per hour
Binary GiB (2³⁰); includes ~2% index overhead
135.917GiB/day
Storage per day
.index + .timeindex estimate
951.420GiB
Total (168h retention)
3× replicas + indexes
Enable compression (zstd recommended) to reduce network and storage by up to 70%.
Consider Avro or Protobuf schema — typically 3-4x smaller than JSON.
1partitions
Min. recommended partitions
16KB
Suggested batch size

Frequently asked questions

What is Kafka Record V2 format overhead?
Each Kafka message in Record V2 format has a fixed per-record overhead of approximately 21 bytes (upper bound per KIP-98), plus a 61-byte RecordBatch header shared across all messages in a batch. For a single-message batch, total overhead is 82 bytes. As batch size increases, the 61-byte batch header is amortized across more messages, reducing per-message overhead.
RecordBatch header61 bytesshared per batchRecord overhead21 bytesper messageYour payloadN bytesaverageBodyBytes+ …more recordsSingle-message batch total: 61 + 21 + N bytes1000-message batch: (61/1000) + 21 + N ≈ 21 + N bytes
Why does replica.fetch.max.bytes matter for message size?
replica.fetch.max.bytes (default: 1 MB) limits how much data a follower replica fetches per request. If your message.max.bytes exceeds this value, the broker will accept the message from the producer but replication will silently fail — the message exists only on the leader. This is one of the most dangerous misconfiguration patterns in Kafka because there is no error returned to the producer.
Which Kafka compression algorithm should I use?
For JSON payloads: zstd offers ~70% size reduction and is the best choice for new deployments (Kafka 2.1+). gzip achieves ~65% reduction but is slower. snappy and lz4 are faster but achieve only 50–55% reduction. Note that Kafka compression is applied at the batch level, not per-message — larger batches (higher linger.ms) improve compression efficiency significantly.
What is the Claim Check pattern and when should I use it?
When messages exceed ~1 MB, the Claim Check pattern stores the payload in external object storage (S3, Azure Blob) and puts only a pointer (URL + metadata) in Kafka. Use it for: messages over 10 MB (chunking becomes too complex), binary assets like images or ML model weights, or payloads that don't need to be in the Kafka retention window. The trade-off is added latency for the storage fetch.
How does batch.size affect Kafka throughput?
batch.size (default: 16 KB) controls the maximum size of a message batch before it is sent. Larger batches improve throughput and compression efficiency but increase latency. Combined with linger.ms (default: 0), the producer waits up to linger.ms milliseconds or until batch.size is reached before sending. For high-throughput workloads, 64 KB–1 MB batch sizes with linger.ms 5–20ms is a common tuning pattern.
Can I change the number of Kafka partitions after topic creation?
You can increase partitions but not decrease them. Increasing partitions may break message ordering for keyed messages (existing keys may be routed to different partitions). For topics where ordering matters, plan partition count upfront. A common rule of thumb: provision 2–3× your current consumer count to leave room for scaling.

Related tools