Methodology
End-to-end p50 is the sum of each hop's p50 multiplied by its iteration count (minimum one). p99p99 latency99th percentile of request durations: captures experience of users under stress.Read more → and p999 are synthesized by applying fixed multipliers to that p50 sum based on architecture complexity (simple / moderate / complex) — they are not measured distributions. HeadroomHeadroom (latency)Gap between measured latency and SLA ceiling; buffer for unpredictable spikes.Read more → is SLA minus p50 sum; status flags compare that to thresholds. The tool does not simulate queuing, retries, or parallel fan-outFan-out (distributed systems)Single request triggers multiple parallel downstream calls.Read more → beyond the iteration field.
Latency numbers every programmer should know
Jeff Dean and others popularised orders-of-magnitude reference latencies—from L1 cache to cross-region RTT—so engineers can reason about systems without benchmarking every layer. A latency budgetLatency budgetTotal time allocated for a complete user-facing request across all architectural hops.Read more → turns those numbers into a design constraint: you add the cost of each hop in your critical path and compare the sum to your SLASLA and SLO (service level agreement vs objective)SLA is a contract with guarantees; SLO is the internal target set stricter than SLA to create error budget.Read more →. This tool is an interactive take on that idea: pick realistic hops, tune counts, and see whether your latency budget still fits.
Why p99 estimates are approximations
Tail latencyTail latencyHigh-percentile latency values (p99, p99.9) representing slowest requests.Read more → across distributed hops does not sum linearly. The p99p99 latency99th percentile of request durations: captures experience of users under stress.Read more → of a system with 5 hops is not 5× the p99 of one hop — it depends on the latency distribution of each component (log-normal for network, heavy-tail for GC pauses). The multipliers used here (1.2× for simple, 1.4× for moderate, 2.0× for complex architectures) are empirically-derived conservative estimates suitable for SLA planning. For production SLO definition, always instrument with real traffic using HDR Histogram or OpenTelemetry percentile metrics.
Building a latency budget
A latency budgetLatency budgetTotal time allocated for a complete user-facing request across all architectural hops.Read more → allocates your end-to-end SLA across services, databases, caches, and network segments. Teams often target p50p50 latency (median)50th percentile of request durations: half of requests complete faster, half slower.Read more → for capacity planning but must still understand p99p99 latency99th percentile of request durations: captures experience of users under stress.Read more → and tail behaviourTail latencyHigh-percentile latency values (p99, p99.9) representing slowest requests.Read more →: small headroomHeadroom (latency)Gap between measured latency and SLA ceiling; buffer for unpredictable spikes.Read more → at p50 means retries, GC, or one slow dependency will breach the SLA under load. We surface p99 and p999 using multipliers that scale with architecture complexity so simple paths are not over-penalised and highly fan-outFan-out (distributed systems)Single request triggers multiple parallel downstream calls.Read more → systems get a more conservative tail estimate.
Many teams aim for roughly 20% headroom below the SLA at p50 so bursts and jitter do not immediately violate customer-facing targets.
The tail latency problem
Median latencyp50 latency (median)50th percentile of request durations: half of requests complete faster, half slower.Read more → hides worst-case behaviour. Garbage collection pauses, packet loss, thread pool exhaustion, and synchronous fan-outFan-out (distributed systems)Single request triggers multiple parallel downstream calls.Read more → all widen the tail. That is why p99p99 latency99th percentile of request durations: captures experience of users under stress.Read more → can exceed a naïve sum of medians and why p999 is often several times worse than p99. When you stack many hops, the probability that at least one is slow rises—another reason to treat distributed systems as higher complexity in this calculator.
Why some latencies have a physical lower bound
Speed of light in optical fibre is approximately 200,000 km/s—about 30% slower than in vacuum. A London–New York round-trip (~11,000 km of cable path) has a theoretical minimum near ~70ms before switching and queuing; measured RTT is often ~80ms once routing and equipment are included. The 54ms figure assumes a straight-line geodesic; real submarine routes are longer. This is why cross-continental HTTP will not reach sub-50ms latency, and why CDN edge nodes exist—to move work physically closer to users. For latency-sensitive operations, geographic distribution is not optional; it is physics.
Common latency killers
Disk seeks on spinning rust, N+1 database patterns (modelled here with per-hop iterations), uncached cross-AZCross-AZ latencyRound-trip time between availability zones; adds cost and latency to replication.Read more → or cross-region calls, and cold startsCold start (serverless)Extra latency when serverless function starts on new execution environment.Read more → on serverless all consume budget fast. Eliminating even one dominant hop often matters more than micro-optimising the rest.
Related calculators
Once you know your p99 latency, use the p99LatencyMs input in our API Rate Limit Calculator to set safe retry windows. For Kafka-based architectures, see the Kafka Consumer Lag Predictor to factor in messaging and processing time.
Copy-paste solution
flowchart LR Client --> Edge[CDN / edge] Edge --> GW[API gateway] GW --> Svc[Service] Svc --> DB[(Database)] %% Label each edge with p50/p99 ms from this calculator
Paste into any Mermaid-compatible doc; replace hop labels with measured latency from this tool.