dkduckkit.dev

Percentile latency (p50 / p99 / p999)

Latency & SRE

Percentile latency is a statistical measure of request duration distribution. The Nth percentile means N% of requests complete in less time than that value. p50 (median), p99 (99th percentile), and p99.9 (999th percentile) are the three standard metrics for latency SLOs in production systems.

Formula

Reading the numbers: p50=10ms, p99=200ms, p99.9=800ms means: half your requests complete in 10ms, 99% in 200ms, and 999 in 1000 in 800ms. The 1 in 1000 that takes longer than 800ms is your p99.9 tail.

Why it matters in practice

Each percentile tier reveals a different failure mode. p50 shows normal operation. p99 shows what happens during GC pauses, cache misses, and connection pool queuing — these affect 1 in 100 users. p99.9 shows the impact of background jobs, network retransmissions, and disk flushes — 1 in 1000 users, but in a system with 1 million daily requests that is 1,000 people per day having a severely degraded experience. Financial and healthcare systems often require p99.99 SLOs.

Common mistakes

  • Only measuring p50 and calling it "latency" — p50 is best case, not typical case.
  • Setting SLOs on mean latency — the mean is mathematically sensitive to outliers and hides tail behaviour.
  • Not instrumenting p99.9 separately from p99 — the difference often reveals the severity of GC or I/O tail effects.