Kafka Message Size Calculator

Calculate Kafka message size, storage, bandwidth and optimal configuration. Compression, batching and replication.

Last updated: March 2026

TL;DR

This calculator turns message size, batching, compression, replication, and retention into disk and bandwidth estimates so you can sanity-check brokers before you provision.

Formula: Approximate bytes/sec ≈ (messages/sec × effective message size × replication factor).

When to use this

Sizing new topics or clusters from expected producer traffic.
Comparing compression codecs and batch sizes for cost tradeoffs.

Related glossary

How the math worksLaTeX model and TypeScript reference — same logic as the calculator on this page.

This describes the implementation behind the numbers as of 2026-03-26. It is engineering documentation, not legal or compliance advice.

Specification citation

Logic reflects our proprietary implementation of the following public specifications: Apache Kafka documentation.

This snippet represents the core logic of our proprietary calculation engine, verified against Apache Kafka documentation and common message-size / throughput sizing practice.

Model (LaTeX source)

Kafka message size (duckkit.dev model)

Let B_body be average message body bytes, f_format the serialization overhead factor,
B_key wire key bytes, H headers bytes, O_record record framing floor, O_batch batch overhead
amortized per message, ρ_c compression ratio (compressed/raw, capped at 1).

B_raw = round(B_body · f_format) + O_record + O_batch + B_key + H
B_comp = min(B_raw, ceil(B_raw · ρ_c))

Throughput and storage modules consume B_comp and cluster inputs (replication factor,
messages/s, retention) for bandwidth and disk estimates.

Reference implementation (TypeScript, excerpt from shipped modules)

// lib/kafka-calculator/calculate-all.ts — pipeline
export function calculateAll(inputs: KafkaInputs): KafkaResults {
  const messageSize = calculateMessageSize(inputs)
  const throughput = calculateThroughput(inputs, messageSize)
  const storage = calculateStorage(inputs, messageSize)
  const recommendations = calculateRecommendations(
    inputs,
    throughput,
    messageSize,
    storage,
  )
  return { messageSize, throughput, storage, recommendations }
}

// lib/kafka-calculator/message-size.ts — format + compression cap
const rawPayloadBytes = Math.max(
  0,
  Math.round(inputs.averageBodyBytes * FORMAT_OVERHEAD[inputs.dataFormat]),
)
const compressedBytes = Math.min(
  totalRawBytes,
  Math.ceil(totalRawBytes * COMPRESSION_RATIOS[inputs.compressionType]),
)

At a glance

Compressed

552 B

Cluster MB/s

1.66

GiB / hour

5.66

Configuration

Adjust values — results update automatically (short debounce).

Live results

Quick presets

Applies batching, linger, and compression only — payload and cluster fields stay as you set them.

Payload

Size and shape of each value (before batch framing).

Average message body (bytes)

Data formatKey size (bytes, 0 = no key)

Total headers (bytes)

Header count

Batching & compression

Producer pacing controls how records are grouped before send.

Compression type

Producer batching

Batch size (bytes)

Default broker-friendly value: 16384 (16 KiB)linger.ms

No artificial delay — records may be sent immediately (lower latency, often smaller batches).

Cluster & retention

Replication and how long data stays on disk (sizing only).

Replication factor

Retention (hours)

Messages per second

Rack-aware consumer routing (assume kafka.client.rack / KIP-392 — model paid cross-AZ consumer fetch toward zero)

Results

Estimates from your inputs — expand sections to focus.

How one record adds up on the wire before batching effects. Total = payload + key + headers + Kafka record overhead

Payload (512B)+Key (16 B)+Headers (0 B)+Kafka overhead (24 B)=552B raw

Compression is off — wire size equals raw total. Enable a codec above to see compressed size and savings.

552B

Total raw (on wire)

24B

Kafka record overhead

1.656MB/s

Total cluster bandwidth required

Ingress to brokers: msgs/s × compressed size × replication factor (3×)

×3 RF

✓ Within a typical range for many Kafka workloads

Producer & batching

0.552MB/s

Producer network (compressed)

Bytes sent by producer/sec. CPU cost: zstd > gzip > snappy.

1.656MB/s

Broker disk write rate

Actual bytes written to broker disk/sec (compressed × 3 RF).

1.104MB/s

Inter-broker replication

Leader → followers (RF−1 paths); see note below

30msg

Messages / batch

33.3batch/s

Batches / second

Consumer & cross-AZ (FinOps model)

0.182MB/s

Est. paid cross-AZ consumer egress

~33% of consumer fetch (0.33×); not broker replication

0.552MB/s

Consumer fetch (all replicas)

msgs/s × compressed — baseline before cross-AZ factor

Formulas & broker replication detail

Producer MB/s = (msg/s × compressed bytes) / 10⁶ — same on-wire basis as cluster ingress/replication/fetch below.
Cluster network MB/s = (msg/s × compressed × RF) / 10⁶ — matches bytes brokers must accept from clients for this topic/partition mix (simplified).
Replication MB/s = (msg/s × compressed × (RF−1)) / 10⁶ — traffic from leader to follower brokers only.

💡 MSK / Confluent Cloud: broker-to-broker replication is included in the service; cross-AZ fees usually apply to client connections, not this internal line item. Self-hosted: counts toward broker NIC utilization.

5.663GiB/h

Storage per hour

Binary GiB (2³⁰); includes ~2% index overhead

135.917GiB/day

Storage per day

.index + .timeindex estimate

951.420GiB

Total (168h retention)

3× replicas + indexes

Enable compression (zstd recommended) to reduce network and storage by up to 70%.

Consider Avro or Protobuf schema — typically 3-4x smaller than JSON.

1partitions

Min. recommended partitions

16KB

Suggested batch size

Frequently asked questions

What is Kafka Record V2 format overhead?▾

Each Kafka message in Record V2 format has a fixed per-record overhead of approximately 21 bytes (upper bound per KIP-98), plus a 61-byte RecordBatch header shared across all messages in a batch. For a single-message batch, total overhead is 82 bytes. As batch size increases, the 61-byte batch header is amortized across more messages, reducing per-message overhead.

Why does replica.fetch.max.bytes matter for message size?▾

replica.fetch.max.bytes (default: 1 MB) limits how much data a follower replica fetches per request. If your message.max.bytes exceeds this value, the broker will accept the message from the producer but replication will silently fail — the message exists only on the leader. This is one of the most dangerous misconfiguration patterns in Kafka because there is no error returned to the producer.

Which Kafka compression algorithm should I use?▾

For JSON payloads: zstd offers ~70% size reduction and is the best choice for new deployments (Kafka 2.1+). gzip achieves ~65% reduction but is slower. snappy and lz4 are faster but achieve only 50–55% reduction. Note that Kafka compression is applied at the batch level, not per-message — larger batches (higher linger.ms) improve compression efficiency significantly.

What is the Claim Check pattern and when should I use it?▾

When messages exceed ~1 MB, the Claim Check pattern stores the payload in external object storage (S3, Azure Blob) and puts only a pointer (URL + metadata) in Kafka. Use it for: messages over 10 MB (chunking becomes too complex), binary assets like images or ML model weights, or payloads that don't need to be in the Kafka retention window. The trade-off is added latency for the storage fetch.

How does batch.size affect Kafka throughput?▾

batch.size (default: 16 KB) controls the maximum size of a message batch before it is sent. Larger batches improve throughput and compression efficiency but increase latency. Combined with linger.ms (default: 0), the producer waits up to linger.ms milliseconds or until batch.size is reached before sending. For high-throughput workloads, 64 KB–1 MB batch sizes with linger.ms 5–20ms is a common tuning pattern.

Can I change the number of Kafka partitions after topic creation?▾

You can increase partitions but not decrease them. Increasing partitions may break message ordering for keyed messages (existing keys may be routed to different partitions). For topics where ordering matters, plan partition count upfront. A common rule of thumb: provision 2–3× your current consumer count to leave room for scaling.

Related glossary

Kafka partition Replication factor (Kafka)Log retention (Kafka)Producer compression (Kafka)Throughput vs bandwidth (Kafka)Cross-AZ latency Consumer lag (Kafka)

Kafka Message Size Calculator

TL;DR

How the math works

Configuration

Results

Frequently asked questions

Related glossary

Related tools