p99 latency
P99 latency is the 99th percentile latency measurement, meaning that 99% of requests complete in less time than this value, while 1% take longer. It represents the worst-case experience for the vast majority of users and is a key SLO metric for production systems. P99 captures the impact of occasional slow operations like GC pauses, network retransmissions, or cache misses that don't affect the median but significantly impact user experience.
Why it matters in practice
P99 is more meaningful than average latency for user experience because the 1% of users who experience the worst performance are often the most vocal and can abandon the service entirely. For systems serving millions of requests, 1% represents thousands of users per day experiencing degraded performance. P99 is particularly important for revenue-critical APIs where slow responses directly impact conversion rates.
Common mistakes
- •Using P99 as the only latency metric without also monitoring P999 — for very high-traffic systems, the 0.1% captured by P999 can represent significant user impact.
- •Setting P99 SLOs too low without understanding the physical limits — network round-trip times and disk I/O set minimum achievable P99 values.
- •Not correlating P99 spikes with system events — P99 increases often correspond to specific events like cache evictions or database query plan changes.
Related Terms
Tail latency
High-percentile latency values (p99, p99.9) representing slowest requests.
Percentile latency (p50 / p99 / p999)
Statistical measure of request duration distribution.
p50 latency (median)
50th percentile of request durations: half of requests complete faster, half slower.
Latency budget
Total time allocated for a complete user-facing request across all architectural hops.