dkduckkit.dev

Kafka Message Size Calculator

Calculate Kafka message size, storage, bandwidth and optimal configuration. Compression, batching and replication.

Last updated: March 2026

TL;DR

This calculator turns message size, batching, compression, replication, and retention into disk and bandwidth estimates so you can sanity-check brokers before you provision.

Formula: Approximate bytes/sec ≈ (messages/sec × effective message size × replication factor).

When to use this

  • Sizing new topics or clusters from expected producer traffic.
  • Comparing compression codecs and batch sizes for cost tradeoffs.

How the math works

How the math worksLaTeX model and TypeScript reference — same logic as the calculator on this page.

This describes the implementation behind the numbers as of 2026-03-26. It is engineering documentation, not legal or compliance advice.

Specification citation

Logic reflects our proprietary implementation of the following public specifications: Apache Kafka documentation.

This snippet represents the core logic of our proprietary calculation engine, verified against Apache Kafka documentation and common message-size / throughput sizing practice.

Model (LaTeX source)
Kafka message size (duckkit.dev model)

Let B_body be average message body bytes, f_format the serialization overhead factor,
B_key wire key bytes, H headers bytes, O_record record framing floor, O_batch batch overhead
amortized per message, ρ_c compression ratio (compressed/raw, capped at 1).

B_raw = round(B_body · f_format) + O_record + O_batch + B_key + H
B_comp = min(B_raw, ceil(B_raw · ρ_c))

Throughput and storage modules consume B_comp and cluster inputs (replication factor,
messages/s, retention) for bandwidth and disk estimates.
Reference implementation (TypeScript, excerpt from shipped modules)
// lib/kafka-calculator/calculate-all.ts — pipeline
export function calculateAll(inputs: KafkaInputs): KafkaResults {
  const messageSize = calculateMessageSize(inputs)
  const throughput = calculateThroughput(inputs, messageSize)
  const storage = calculateStorage(inputs, messageSize)
  const recommendations = calculateRecommendations(
    inputs,
    throughput,
    messageSize,
    storage,
  )
  return { messageSize, throughput, storage, recommendations }
}

// lib/kafka-calculator/message-size.ts — format + compression cap
const rawPayloadBytes = Math.max(
  0,
  Math.round(inputs.averageBodyBytes * FORMAT_OVERHEAD[inputs.dataFormat]),
)
const compressedBytes = Math.min(
  totalRawBytes,
  Math.ceil(totalRawBytes * COMPRESSION_RATIOS[inputs.compressionType]),
)

Compressed message size 549 B. Cluster bandwidth 1.65 megabytes per second. Storage about 5.63 gibibytes per hour.

At a glance

Compressed
549 B
Cluster MB/s
1.65
GiB / hour
5.63

Configuration

Adjust values — results update automatically (short debounce).

Live results

Quick presets

Applies batching, linger, and compression only — payload and cluster fields stay as you set them.

Payload

Size and shape of each value (before batch framing).

Batching & compression

Producer pacing controls how records are grouped before send.

Producer batching

Cluster & retention

Replication and how long data stays on disk (sizing only).

Results

Estimates from your inputs — expand sections to focus.

How one record adds up on the wire before batching effects. Total = payload + key + headers + Kafka record overhead

Payload (512B)+Key (16 B)+Headers (0 B)+Kafka overhead (21 B)=549B raw

Compression is off — wire size equals raw total. Enable a codec above to see compressed size and savings.

549B
Total raw (on wire)
21B
Kafka record overhead
1.647MB/s
Total cluster bandwidth required
Ingress to brokers: msgs/s × compressed size × replication factor (3×)
×3 RF

✓ Within a typical range for many Kafka workloads

Producer & batching

0.549MB/s
Producer network (compressed)
Bytes sent by producer/sec. CPU cost: zstd > gzip > snappy.
1.647MB/s
Broker disk write rate
Actual bytes written to broker disk/sec (compressed × 3 RF).
1.098MB/s
Inter-broker replication
Leader → followers (RF−1 paths); see note below
30msg
Messages / batch
33.3batch/s
Batches / second

Consumer & cross-AZ (FinOps model)

0.181MB/s
Est. paid cross-AZ consumer egress
~33% of consumer fetch (0.33×); not broker replication
0.549MB/s
Consumer fetch (all replicas)
msgs/s × compressed — baseline before cross-AZ factor
Formulas & broker replication detail
  • Producer MB/s = (msg/s × compressed bytes) / 10⁶ — same on-wire basis as cluster ingress/replication/fetch below.
  • Cluster network MB/s = (msg/s × compressed × RF) / 10⁶ — matches bytes brokers must accept from clients for this topic/partition mix (simplified).
  • Replication MB/s = (msg/s × compressed × (RF−1)) / 10⁶ — traffic from leader to follower brokers only.

💡 MSK / Confluent Cloud: broker-to-broker replication is included in the service; cross-AZ fees usually apply to client connections, not this internal line item. Self-hosted: counts toward broker NIC utilization.

5.632GiB/h
Storage per hour
Binary GiB (2³⁰); includes ~2% index overhead
135.179GiB/day
Storage per day
.index + .timeindex estimate
946.250GiB
Total (168h retention)
3× replicas + indexes
Enable compression (zstd recommended) to reduce network and storage by up to 70%.
Consider Avro or Protobuf schema — typically 3-4x smaller than JSON.
1partitions
Min. recommended partitions
16KB
Suggested batch size

Methodology

The UI calls calculateAll in order: message size (payload, key, headers, format overhead, batching) → throughput storage (replication, retention, small index allowance) → recommendations. Compression ratios use a browser-side simulation, not broker-measured stats. The tool does not model exact on-disk compaction, tiered storage, or cluster-wide controller overhead.

What Affects Kafka Message Size

The total Kafka message size depends on several factors: your payload (body), the serialization format (JSON, Avro, Protobuf), optional key and headers, and Kafka's internal record overhead. Each record adds roughly 18 bytes of protocol overhead, plus a shared batch header. Understanding these components helps you optimize for throughput and storage.

How Compression Works in Kafka

Kafka supports multiple compression codecs: none, gzip, snappy, lz4, and zstd. Compression happens on the producer side before batching. For typical JSON payloads, compression.type=zstd achieves ~70% size reduction, while snappy (~50%) is faster but less efficient. Enable linger.ms=5 or higher when using compression to improve batch efficiency.

Producer vs. broker perspective: The producer compresses a batch before sending — this determines producer CPU cost and network egress. The broker stores the compressed batch on disk and replicates it RF times. These are two different resource constraints: producer CPU scales with compression codec complexity (zstd costs more CPU than snappy), while broker disk I/O scales with message volume × replication factor.

Kafka Storage Formula

Storage per second = messages per second × compressed message size × replication factor. Add ~2% for index files (.index, .timeindex). Total retention storage = storage per hour × retention hours. Use this kafka storage calculator to estimate disk requirements before provisioning brokers.

When to Increase Partition Count

A practical rule: one partition can handle roughly 10 MB/s of producer throughput. If your kafka message size and message rate exceed that, add partitions. For kafka partition sizing, consider both throughput (producer bandwidth) and parallelism (one consumer per partition). Over-partitioning increases metadata overhead; under-partitioning limits consumer parallelism.

Once you know your message size, plan your consumer group capacity with the Kafka Consumer Lag Predictor — model throughput, partition ceilings, and time-to-threshold before lag becomes an incident.

Copy-paste solution

# broker: log retention sanity (example — tune for your cluster)
log.retention.hours=168
compression.type=lz4
# Validate with real producer throughput after using the calculator

Broker and topic defaults are documented on kafka.apache.org.

Related tools