Percentile latency (p50 / p99 / p999)
Percentile latency is a statistical measure of request duration distribution. The Nth percentile means N% of requests complete in less time than that value. p50 (median), p99 (99th percentile), and p99.9 (999th percentile) are the three standard metrics for latency SLOs in production systems.
Formula
Reading the numbers: p50=10ms, p99=200ms, p99.9=800ms means: half your requests complete in 10ms, 99% in 200ms, and 999 in 1000 in 800ms. The 1 in 1000 that takes longer than 800ms is your p99.9 tail.Why it matters in practice
Each percentile tier reveals a different failure mode. p50 shows normal operation. p99 shows what happens during GC pauses, cache misses, and connection pool queuing — these affect 1 in 100 users. p99.9 shows the impact of background jobs, network retransmissions, and disk flushes — 1 in 1000 users, but in a system with 1 million daily requests that is 1,000 people per day having a severely degraded experience. Financial and healthcare systems often require p99.99 SLOs.
Common mistakes
- •Only measuring p50 and calling it "latency" — p50 is best case, not typical case.
- •Setting SLOs on mean latency — the mean is mathematically sensitive to outliers and hides tail behaviour.
- •Not instrumenting p99.9 separately from p99 — the difference often reveals the severity of GC or I/O tail effects.
Related Terms
p50 latency (median)
50th percentile of request durations: half of requests complete faster, half slower.
p99 latency
99th percentile of request durations: captures experience of users under stress.
Tail latency
High-percentile latency values (p99, p99.9) representing slowest requests.
SLA and SLO (service level agreement vs objective)
SLA is a contract with guarantees; SLO is the internal target set stricter than SLA to create error budget.