dkduckkit.dev

Leaky bucket

Rate Limiting

The leaky bucket algorithm processes requests at a constant output rate, queueing incoming requests in a buffer (the bucket) when they arrive faster than the output rate. Excess requests are placed into a queue (the "bucket"). The queue drains at a fixed rate, smoothing bursty traffic into a uniform output stream. Only when the queue is full are incoming requests dropped. This queuing behaviour is what distinguishes leaky bucket as a traffic shaper from a plain rate limiter, which drops or rejects immediately without buffering. The analogy is a bucket with a hole at the bottom: water drains at a constant rate regardless of how fast you fill it.

**Terminology note:** There are two distinct interpretations of "leaky bucket" in the literature. The first, described above, is the *leaky bucket as a queue* (traffic shaper) — the more useful model for SREs and API designers. The second, the *leaky bucket as a meter*, is mathematically equivalent to a token bucket and is sometimes used interchangeably in cloud provider documentation (AWS, GCP). When a vendor claims to use "leaky bucket", verify whether they mean traffic shaping (constant output rate) or simply token bucket with a different name.

Formula

output_rate = constant (configured) queue_size = bucket_capacity (configured) if queue_full: drop_request else: queue_request and process_at_output_rate

Why it matters in practice

Leaky bucket is used when a downstream system cannot tolerate any burst, even brief ones — for example, a legacy backend with fixed threading that becomes unstable under load spikes. It is also used in traffic shaping to enforce a maximum average rate for billing purposes. The trade-off is that bursty clients experience queuing latency even when the system is underloaded, which can make p99 latency worse than with token bucket.

Common mistakes

  • Using leaky bucket for interactive APIs where clients legitimately need burst support — the resulting queuing latency makes the API feel sluggish.
  • Setting queue size too small — overflow drops requests immediately, which is equivalent to a hard rate limit with no burst tolerance.
  • Confusing output rate with input rate — leaky bucket smooths output, not input.