Burst limit
A burst limit is the maximum number of requests a client can send in a short period above the steady-state rate limit. In a token bucket implementation, burst limit equals the bucket capacity. A rate limit of 100 req/s with a burst limit of 300 means a client can send 300 requests in one second (consuming the full bucket), then is limited to 100 req/s while the bucket refills. Burst limits exist because most legitimate API clients have natural burst patterns: loading a dashboard makes 10–20 API calls simultaneously, then is idle.
Why it matters in practice
Setting burst limit too low forces clients to spread requests artificially over time, which is impractical for interactive use cases. Setting it too high allows clients to overwhelm the backend with a short but intense burst. The right burst limit depends on the backend's capacity headroom: if your backend can handle 3× normal peak for 1–2 seconds before saturating, your burst limit should be roughly 3× the per-window steady-state limit.
Common mistakes
- •Setting burst limit equal to the rate limit — this makes bursting impossible and clients that generate natural bursts (page loads, batch processing) will receive 429s unexpectedly.
- •Not accounting for burst in backend capacity planning — a rate limit of 100 req/s with burst=500 means the backend must handle 500 req/s for short periods.
- •Forgetting that multiple clients each have their own burst bucket — 100 clients each with burst=50 can simultaneously send 5,000 requests.