dkduckkit.dev

System Latency Budget Calculator

Build your architecture from reference hop latencies and see instantly if you fit your SLA — interactive latency numbers for engineers.

Last updated: March 2026

TL;DR

Stack measured or reference hop latencies (DNS, TLS, gateways, RPC, queues) against a single SLA target to see whether the architecture fits.

Formula: Total latency ≈ Σ (hop latency × iterations); compare to SLA in milliseconds.

When to use this

  • Design reviews for user-facing request paths and microservice graphs.
  • Arguing for fewer hops or faster regions with numbers, not opinions.

How the math works

How the math worksLaTeX model and TypeScript reference — same logic as the calculator on this page.

This describes the implementation behind the numbers as of 2026-03-26. It is engineering documentation, not legal or compliance advice.

Specification citation

Logic reflects our proprietary implementation of the following public specifications: Google SRE Book — monitoring distributed systems.

This snippet represents the core logic of our proprietary calculation engine, verified against public SRE literature on latency budgets and service level objectives.

Model (LaTeX source)
Latency budget (duckkit.dev model)

SLA target in microseconds: S = SLA_ms · 1000

Total p50: T_50 = Σ_i (p50_i · max(1, iter_i))

Tail multipliers m_99, m_999 depend on architecture complexity (simple | moderate | complex):
T_99 = round(T_50 · m_99)
T_999 = round(T_99 · m_999)

Headroom (%): H = 100 · (S − T_50) / S when S > 0
Status: over if T_50 > S; else tight if H < 20; else safe.
Reference implementation (TypeScript, excerpt from shipped modules)
// lib/latency-budget/calculate.ts (excerpt)
export function calculateBudget(
  inputs: LatencyBudgetInputs,
): LatencyBudgetResult {
  const { slaTargetMs, hops, complexity } = inputs
  const slaTargetUs = slaTargetMs * 1_000

  let totalP50Us = 0
  for (const hop of hops) {
    const it = Math.max(1, hop.iterations)
    totalP50Us += hop.p50Us * it
  }

  const { p99: p99Mult, p999: p999Mult } = TAIL_MULTIPLIERS[complexity]
  const totalP99Us = Math.round(totalP50Us * p99Mult)
  const totalP999Us = Math.round(totalP99Us * p999Mult)

  const headroomUs = slaTargetUs - totalP50Us
  const headroomPct =
    slaTargetUs > 0
      ? parseFloat(((headroomUs / slaTargetUs) * 100).toFixed(1))
      : 0

  const status: BudgetStatus =
    totalP50Us > slaTargetUs ? 'over' : headroomPct < 20 ? 'tight' : 'safe'
  // … hop contributions, warnings …
}

Budget status safe. Total p50 2.50 milliseconds. Headroom 98 percent.

At a glance

Status
safe
p50 total
2.50 ms
Headroom
98%

Configuration

p50 target in milliseconds
(?)

Link and markdown summary reflect your current inputs. Do not include sensitive data. How sharing works.

Architecture hops(3 added)

Redis GET (local)500μs
PostgreSQL simple1.00ms
HTTP same AZ1.00ms
Add hop from catalogueCollapse on small screens to focus on your architecture list

Results

Within budgetBottleneck: PostgreSQL simple
p50 latency
2.50ms
p99 latency
3.50ms
p999 latency
14.0ms
Headroom
98%

p99 and p999 are estimated using a static multiplier per architecture complexity. Real tail latency compounds non-linearly across hops (statistical convolution). For precise tail latency modelling, instrument your system with HDR Histogram and measure empirically.

Latency breakdown

SLA targetRedis GET (local)500μs · 20%PostgreSQL simple1.00ms · 40%bottleneckHTTP same AZ1.00ms · 40%

Methodology

End-to-end p50 is the sum of each hop's p50 multiplied by its iteration count (minimum one). p99 and p999 are synthesized by applying fixed multipliers to that p50 sum based on architecture complexity (simple / moderate / complex) — they are not measured distributions. Headroom is SLA minus p50 sum; status flags compare that to thresholds. The tool does not simulate queuing, retries, or parallel fan-out beyond the iteration field.

Latency numbers every programmer should know

Jeff Dean and others popularised orders-of-magnitude reference latencies—from L1 cache to cross-region RTT—so engineers can reason about systems without benchmarking every layer. A latency budget turns those numbers into a design constraint: you add the cost of each hop in your critical path and compare the sum to your SLA. This tool is an interactive take on that idea: pick realistic hops, tune counts, and see whether your latency budget still fits.

Why p99 estimates are approximations

Tail latency across distributed hops does not sum linearly. The p99 of a system with 5 hops is not 5× the p99 of one hop — it depends on the latency distribution of each component (log-normal for network, heavy-tail for GC pauses). The multipliers used here (1.2× for simple, 1.4× for moderate, 2.0× for complex architectures) are empirically-derived conservative estimates suitable for SLA planning. For production SLO definition, always instrument with real traffic using HDR Histogram or OpenTelemetry percentile metrics.

Building a latency budget

A latency budget allocates your end-to-end SLA across services, databases, caches, and network segments. Teams often target p50 for capacity planning but must still understand p99 and tail behaviour: small headroom at p50 means retries, GC, or one slow dependency will breach the SLA under load. We surface p99 and p999 using multipliers that scale with architecture complexity so simple paths are not over-penalised and highly fan-out systems get a more conservative tail estimate.

Many teams aim for roughly 20% headroom below the SLA at p50 so bursts and jitter do not immediately violate customer-facing targets.

The tail latency problem

Median latency hides worst-case behaviour. Garbage collection pauses, packet loss, thread pool exhaustion, and synchronous fan-out all widen the tail. That is why p99 can exceed a naïve sum of medians and why p999 is often several times worse than p99. When you stack many hops, the probability that at least one is slow rises—another reason to treat distributed systems as higher complexity in this calculator.

Why some latencies have a physical lower bound

Speed of light in optical fibre is approximately 200,000 km/s—about 30% slower than in vacuum. A London–New York round-trip (~11,000 km) has a theoretical minimum near 55ms regardless of server speed; the actual ~80ms RTT adds routing, switching, and packet processing. This is why cross-continental HTTP will not reach sub-50ms latency, and why CDN edge nodes exist—to move work physically closer to users. For latency-sensitive operations, geographic distribution is not optional; it is physics.

Common latency killers

Disk seeks on spinning rust, N+1 database patterns (modelled here with per-hop iterations), uncached cross-region calls, and cold starts on serverless all consume budget fast. Eliminating even one dominant hop often matters more than micro-optimising the rest.

Related calculators

Once you know your p99 latency, use the p99LatencyMs input in our API Rate Limit Calculator to set safe retry windows. For Kafka-based architectures, see the Kafka Consumer Lag Predictor to factor in messaging and processing time.

Copy-paste solution

flowchart LR
  Client --> Edge[CDN / edge]
  Edge --> GW[API gateway]
  GW --> Svc[Service]
  Svc --> DB[(Database)]
  %% Label each edge with p50/p99 ms from this calculator

Paste into any Mermaid-compatible doc; replace hop labels with measured latency from this tool.

Related tools