dkduckkit.dev

Token bucket

Rate Limiting

Token bucket is a rate limiting algorithm where a virtual bucket accumulates tokens at a fixed refill rate (e.g., 100 tokens/second) up to a maximum capacity (the burst limit). Each request consumes one or more tokens. If the bucket has enough tokens the request proceeds immediately; if not, it is either queued or rejected with a 429. The bucket capacity determines the maximum burst size above the steady-state rate.

Formula

refill_rate = steady-state requests/second. bucket_capacity = maximum burst requests. Burst above refill_rate is allowed until the bucket empties; then requests are limited to refill_rate.

Why it matters in practice

Token bucket is the most common algorithm in production API gateways because it provides two independent controls: the refill rate (steady-state throughput) and the bucket capacity (burst tolerance). This maps well to real API usage patterns where clients make occasional bursts of requests followed by idle periods. AWS API Gateway usage plans and Kong's rate-limiting plugin implement token bucket. Nginx's `limit_req` module uses leaky bucket — with `nodelay`, the behaviour approximates token bucket. Understanding the algorithm helps you tune `burst` and `rate` parameters correctly.

Common mistakes

  • Setting burst capacity equal to rate limit — this effectively disables burst support and forces all clients to spread requests exactly evenly over time, which is unrealistic.
  • Not considering that bucket capacity accumulates during idle periods — a client that is quiet for 60 seconds and then sends 60 seconds worth of burst traffic is within the rules, but may overwhelm a backend not designed for burst.
  • Confusing limit_req burst=10 nodelay in Nginx with a strict rate — nodelay allows the burst to be consumed instantly without queuing.