Leaky bucket
The leaky bucket algorithm processes requests at a constant output rate, queueing incoming requests in a buffer (the bucket) when they arrive faster than the output rate. Excess requests are placed into a queue (the "bucket"). The queue drains at a fixed rate, smoothing bursty traffic into a uniform output stream. Only when the queue is full are incoming requests dropped. This queuing behaviour is what distinguishes leaky bucket as a traffic shaper from a plain rate limiter, which drops or rejects immediately without buffering. The analogy is a bucket with a hole at the bottom: water drains at a constant rate regardless of how fast you fill it.
**Terminology note:** There are two distinct interpretations of "leaky bucket" in the literature. The first, described above, is the *leaky bucket as a queue* (traffic shaper) — the more useful model for SREs and API designers. The second, the *leaky bucket as a meter*, is mathematically equivalent to a token bucket and is sometimes used interchangeably in cloud provider documentation (AWS, GCP). When a vendor claims to use "leaky bucket", verify whether they mean traffic shaping (constant output rate) or simply token bucket with a different name.
Formula
output_rate = constant (configured)
queue_size = bucket_capacity (configured)
if queue_full: drop_request
else: queue_request and process_at_output_rateWhy it matters in practice
Leaky bucket is used when a downstream system cannot tolerate any burst, even brief ones — for example, a legacy backend with fixed threading that becomes unstable under load spikes. It is also used in traffic shaping to enforce a maximum average rate for billing purposes. The trade-off is that bursty clients experience queuing latency even when the system is underloaded, which can make p99 latency worse than with token bucket.
Common mistakes
- •Using leaky bucket for interactive APIs where clients legitimately need burst support — the resulting queuing latency makes the API feel sluggish.
- •Setting queue size too small — overflow drops requests immediately, which is equivalent to a hard rate limit with no burst tolerance.
- •Confusing output rate with input rate — leaky bucket smooths output, not input.