Methodology
End-to-end p50 is the sum of each hop's p50 multiplied by its iteration count (minimum one). p99 and p999 are synthesized by applying fixed multipliers to that p50 sum based on architecture complexity (simple / moderate / complex) — they are not measured distributions. Headroom is SLA minus p50 sum; status flags compare that to thresholds. The tool does not simulate queuing, retries, or parallel fan-out beyond the iteration field.
Latency numbers every programmer should know
Jeff Dean and others popularised orders-of-magnitude reference latencies—from L1 cache to cross-region RTT—so engineers can reason about systems without benchmarking every layer. A latency budget turns those numbers into a design constraint: you add the cost of each hop in your critical path and compare the sum to your SLA. This tool is an interactive take on that idea: pick realistic hops, tune counts, and see whether your latency budget still fits.
Why p99 estimates are approximations
Tail latency across distributed hops does not sum linearly. The p99 of a system with 5 hops is not 5× the p99 of one hop — it depends on the latency distribution of each component (log-normal for network, heavy-tail for GC pauses). The multipliers used here (1.2× for simple, 1.4× for moderate, 2.0× for complex architectures) are empirically-derived conservative estimates suitable for SLA planning. For production SLO definition, always instrument with real traffic using HDR Histogram or OpenTelemetry percentile metrics.
Building a latency budget
A latency budget allocates your end-to-end SLA across services, databases, caches, and network segments. Teams often target p50 for capacity planning but must still understand p99 and tail behaviour: small headroom at p50 means retries, GC, or one slow dependency will breach the SLA under load. We surface p99 and p999 using multipliers that scale with architecture complexity so simple paths are not over-penalised and highly fan-out systems get a more conservative tail estimate.
Many teams aim for roughly 20% headroom below the SLA at p50 so bursts and jitter do not immediately violate customer-facing targets.
The tail latency problem
Median latency hides worst-case behaviour. Garbage collection pauses, packet loss, thread pool exhaustion, and synchronous fan-out all widen the tail. That is why p99 can exceed a naïve sum of medians and why p999 is often several times worse than p99. When you stack many hops, the probability that at least one is slow rises—another reason to treat distributed systems as higher complexity in this calculator.
Why some latencies have a physical lower bound
Speed of light in optical fibre is approximately 200,000 km/s—about 30% slower than in vacuum. A London–New York round-trip (~11,000 km) has a theoretical minimum near 55ms regardless of server speed; the actual ~80ms RTT adds routing, switching, and packet processing. This is why cross-continental HTTP will not reach sub-50ms latency, and why CDN edge nodes exist—to move work physically closer to users. For latency-sensitive operations, geographic distribution is not optional; it is physics.
Common latency killers
Disk seeks on spinning rust, N+1 database patterns (modelled here with per-hop iterations), uncached cross-region calls, and cold starts on serverless all consume budget fast. Eliminating even one dominant hop often matters more than micro-optimising the rest.
Related calculators
Once you know your p99 latency, use the p99LatencyMs input in our API Rate Limit Calculator to set safe retry windows. For Kafka-based architectures, see the Kafka Consumer Lag Predictor to factor in messaging and processing time.
Copy-paste solution
flowchart LR Client --> Edge[CDN / edge] Edge --> GW[API gateway] GW --> Svc[Service] Svc --> DB[(Database)] %% Label each edge with p50/p99 ms from this calculator
Paste into any Mermaid-compatible doc; replace hop labels with measured latency from this tool.