p99 latency

Latency & SRE

P99 latency is the 99th percentile latency measurement, meaning that 99% of requests complete in less time than this value, while 1% take longer. It represents the worst-case experience for the vast majority of users and is a key SLO metric for production systems. P99 captures the impact of occasional slow operations like GC pauses, network retransmissions, or cache misses that don't affect the median but significantly impact user experience.

Why it matters in practice

P99 is more meaningful than average latency for user experience because the 1% of users who experience the worst performance are often the most vocal and can abandon the service entirely. For systems serving millions of requests, 1% represents thousands of users per day experiencing degraded performance. P99 is particularly important for revenue-critical APIs where slow responses directly impact conversion rates.

Common mistakes

•Using P99 as the only latency metric without also monitoring P999 — for very high-traffic systems, the 0.1% captured by P999 can represent significant user impact.
•Setting P99 SLOs too low without understanding the physical limits — network round-trip times and disk I/O set minimum achievable P99 values.
•Not correlating P99 spikes with system events — P99 increases often correspond to specific events like cache evictions or database query plan changes.

Try it in

Try in Latency Budget

p99 latency

Why it matters in practice

Common mistakes

Related Terms

Tail latency

Percentile latency (p50 / p99 / p999)

p50 latency (median)

Latency budget

Try it in