API Rate Limit Calculator

Optimal rate limits from traffic and consumers. Nginx, Kong, AWS snippets and IETF headers.

Last updated: March 2026

TL;DR

Derive steady RPS and burst headroom from concurrent clients and per-client call patterns, then map them to token-bucket or leaky-bucket style limits.

Formula: Required RPS ≈ clients × requests per client per second; burst sized to absorb coordinated spikes.

When to use this

Setting gateway limits that protect upstreams without false positives.
Translating product SLOs into enforceable rate and burst parameters.

Related glossary

How the math worksLaTeX model and TypeScript reference — same logic as the calculator on this page.

This describes the implementation behind the numbers as of 2026-03-26. It is engineering documentation, not legal or compliance advice.

Specification citation

Logic reflects our proprietary implementation of the following public specifications: IETF RFC 6585 and HTTP 429 Too Many Requests (MDN).

This snippet represents the core logic of our proprietary calculation engine, verified against RFC 6585 and widely used token-bucket / fixed-window rate-limit patterns.

Model (LaTeX source)

Sustainable requests per window (duckkit.dev model)

Let R_peak be peak RPS, W window seconds, N consumers, global flag G.
Effective consumer count: N_eff = G ? 1 : N

Raw allowance (20% headroom in implementation):
N_raw = floor((R_peak · W · 0.8) / N_eff)
N_window = max(1, N_raw)

Effective RPS shown: R_eff = N_window / W

Burst depends on strategy (fixed-window, sliding-window, token-bucket multiplier).

Reference implementation (TypeScript, excerpt from shipped modules)

// lib/rate-limit-calculator/limits.ts
export function calculateLimits(inputs: RateLimitInputs): RecommendedLimits {
  const effectiveConsumers = globalMode ? 1 : consumerCount

  const rawLimit = Math.floor(
    (peakRPS * windowSeconds * 0.8) / effectiveConsumers,
  )

  const requestsPerWindow = Math.max(1, rawLimit)

  const burstLimit =
    strategy === 'fixed-window'
      ? requestsPerWindow
      : strategy === 'sliding-window'
        ? Math.floor(requestsPerWindow * 1.5)
        : Math.floor(requestsPerWindow * burstMultiplier)

  const rpsEffective = parseFloat(
    (requestsPerWindow / windowSeconds).toFixed(2),
  )
  return { requestsPerWindow, burstLimit, /* … */ }
}

At a glance

Throttling risk

20%

Effective RPS

8.00

Req / window

480

Configuration

Quick presets

Traffic spike simulator1.0× peak

Optional — tap to adjust stress multiplier

Normal (1×)Black Friday (5×)DDoS-like (10×)

Traffic profile

Peak requests per second

Avg payload size (KB)

SLA p99 latency (ms)

Number of consumers

Global limit modeLimit applies to all consumers combined, not per-consumer

Rate limit strategy

StrategyWindow duration (seconds)

Burst multiplier

Ignored for fixed-windowRetry-After (seconds)

ContextEnvironment, auth, retry behavior — tap to expand

EnvironmentAuth scope

Consumers have exponential backoff

With random jitterSpreads retries randomly — prevents synchronized storms

Use IETF Draft-07 headersRateLimit-* instead of X-RateLimit-* — emerging standard

Results

480req/window

Requests per window

8.00req/s

Effective RPS

1440burst

Burst limit

480req

Per consumer / window

How your strategy handles traffic

Incoming trafficAllowed throughThrottled

0.078MB/s

Max throughput

10consumers

Safe consumer count

20%

Throttling risk (peak)

100% util

Utilization at peak

Retry storm riskMEDIUM

nginxNginx — limit_req

# Nginx rate limiting — generated by API Rate Limit Calculator
# Algorithm: Nginx limit_req uses a leaky bucket algorithm internally.
# With 'nodelay': behaviour is equivalent to a token bucket.
# Reference: https://nginx.org/en/docs/http/ngx_http_limit_req_module.html
limit_req_zone $http_x_api_key zone=api_limit:10m rate=8r/s;

server {
    location /api/ {
        limit_req zone=api_limit burst=1440 nodelay;
        limit_req_status 429;
        add_header Retry-After 5 always;
    }
}

yamlKong — rate-limiting plugin

# Kong rate-limiting plugin — generated by API Rate Limit Calculator
plugins:
  - name: rate-limiting
    config:
      second: 8
      minute: 480
      policy: consumer
      fault_tolerant: true
      hide_client_headers: false
      error_code: 429
      error_message: "API rate limit exceeded"

jsonAWS API Gateway — throttle (JSON)

{
  "_comment": "AWS API Gateway Usage Plan — throttle settings only. Quota (daily/monthly limits) is a billing concern — configure separately.",
  "throttle": {
    "rateLimit": 8,
    "burstLimit": 1440
  }
}

hclAWS API Gateway — Terraform

# Terraform — AWS API Gateway Usage Plan
# Generated by API Rate Limit Calculator

resource "aws_api_gateway_usage_plan" "rate_limit" {
  name = "api-rate-limit"

  throttle_settings {
    rate_limit  = 8        # req/s steady state
    burst_limit = 1440 # token bucket size
  }
}

resource "aws_api_gateway_usage_plan_key" "rate_limit_key" {
  key_id        = aws_api_gateway_api_key.consumer.id
  key_type      = "API_KEY"
  usage_plan_id = aws_api_gateway_usage_plan.rate_limit.id
}

httpRate limit response headers

# X-RateLimit Headers (legacy — widely supported)
X-RateLimit-Limit: 480        # requests allowed per window
X-RateLimit-Remaining: <current_count>         # requests remaining
X-RateLimit-Reset: <unix_timestamp>            # when window resets (unix epoch)
X-RateLimit-Window: 60    # window duration in seconds

# On 429 Too Many Requests:
Retry-After: 5              # seconds before client should retry

You're running at 100% of capacity at peak. Add 20% headroom or your next traffic spike will cause throttling.

Methodology

The tool runs calculateAll: limits derive a per-window request budget from peak RPS, window length, consumer count, and global vs per-key mode (with a built-in headroom factor), then adjust burstBurst limitMaximum requests above steady-state rate in short period.Read more → by strategy (fixed, sliding, token bucket). Capacity compares those limits to your targets; configs are template snippets, not validated against a live gateway. The model is steady-state planning — not adaptive abuse detection or exact vendor rate-limit semantics.

What is API rate limiting and why it matters

An api rate limit caps how many requests a client can make in a time window. Think of it as a merge lane: unlimited inbound traffic causes collisions—your service protects itself and fair-shares capacity. Platform teams use an api rate limit calculator like this one to translate real peak RPS, payload size, and consumer count into limits that avoid abuse without blocking legitimate partners.

Rate limiting strategies compared

Fixed windowFixed window rate limitingCounts requests in discrete, non-overlapping time buckets.Read more → resets a counter every N seconds—simple but clients can double their effective rate at window boundaries. Sliding windowSliding window rate limitingMaintains rolling count of requests over recent N seconds.Read more → smooths counts across time. Token bucketToken bucketAccumulates tokens at fixed rate; requests consume tokens.Read more → allows controlled burstsBurst limitMaximum requests above steady-state rate in short period.Read more → while refilling steadily—what most edge gateways implement. Leaky bucketLeaky bucketProcesses requests at constant output rate, queueing excess.Read more → outputs traffic at a constant rate, shaping noisy clients. Use this tool to see how each behaves before you paste configs into nginx or Kong.

The hidden danger: retry storms

When thousands of clients get HTTP 429HTTP 429 Too Many RequestsStandard status code for rate limiting responses.Read more → at once, naive retries can synchronize and amplify load—an api rate limit incident becomes self-sustaining. Exponential backoffExponential backoffRetry wait times grow multiplicatively after each failure.Read more → alone is not enough without jitterJitter (retries)Adds randomness to retry wait times to prevent synchronized retries.Read more →; otherwise pods or mobile clients retry on the same tick. This calculator surfaces retry-stormRetry stormFailure amplification where synchronized retries recreate overload.Read more → risk so you can tune Retry-AfterRetry-After headerSpecifies how long client should wait before retrying.Read more →relative to your p99 latencyp99 latency99th percentile of request durations: captures experience of users under stress.Read more → before you commit to production limits.

Not sure what p99 latencyp99 latency99th percentile of request durations: captures experience of users under stress.Read more → to assume? Build your critical path in our System Latency Budget Calculator to estimate realistic p50/p99 from reference hops before you plug numbers into retry and rate-limit math.

X-RateLimit and IETF RateLimit headers

Legacy X-RateLimit-* headers are widely supported; IETF Draft-07 introduces RateLimit-Limit, RateLimit-Remaining, RateLimit-Reset (seconds to reset, not always a Unix timestamp—check your implementation), and RateLimit-Policy for machine-readable discovery. Toggle the option here to generate the style your API contract requires. Whether you deploy with nginx, Kong, or AWS API Gateway, consistent headers reduce support tickets from confused integrators.

AWS API Gateway exposes throttle (rate + burstBurst limitMaximum requests above steady-state rate in short period.Read more →) as a token bucket; daily quota is separate billing/monetization—configure it outside this throttle-focused snippet.

Token bucket vs fixed window

Token bucketToken bucketAccumulates tokens at fixed rate; requests consume tokens.Read more → — smooth average rate with controlled burstsBurst limitMaximum requests above steady-state rate in short period.Read more →; common in gateways and AWS throttles.
Fixed windowFixed window rate limitingCounts requests in discrete, non-overlapping time buckets.Read more → — resets every interval; can allow 2× spikes at window edges unless you add jitter or sliding logic.

IETF guidance on communicating limits lives in RFC 6585 (status 429 Too Many RequestsHTTP 429 Too Many RequestsStandard status code for rate limiting responses.Read more →) and rate-limit header drafts your gateway may implement.

Copy-paste solution

limit_req_zone $binary_remote_addr zone=api:10m rate=10r/s;
server {
  location /api/ {
    limit_req zone=api burst=20 nodelay;
    proxy_pass http://upstream;
  }
}

Frequently asked questions

What is the difference between fixed window and token bucket rate limiting?▾

Fixed window resets counters at fixed time boundaries (e.g., every 60 seconds). At the reset boundary, clients can send up to 2× the limit in a short burst (boundary exploit). Token bucket continuously replenishes tokens at a steady rate, allowing controlled bursts without the boundary problem. For most APIs, token bucket is the safer default.

What is a retry storm and how do I prevent it?▾

A retry storm occurs when many clients receive 429 Too Many Requests and retry simultaneously. Without jitter, all clients retry at exactly the same time, amplifying the overload. Prevention: implement exponential backoff with random jitter on the client side, and set Retry-After headers on the server. The server's retryAfterSeconds should be at least 5–10× the p99 latency.

What is the difference between X-RateLimit-* and RateLimit-* headers?▾

X-RateLimit-* are legacy headers (widely supported). RateLimit-* (without X- prefix) are the IETF Draft-07 standard. Key difference: legacy X-RateLimit-Reset is a unix epoch timestamp, while IETF RateLimit-Reset is seconds until reset (relative). Mixing them causes client bugs. For new APIs, prefer IETF Draft-07. For APIs with existing clients, keep legacy headers.

Should I rate limit by IP or by API key?▾

Per-IP limiting is unreliable for public APIs because multiple clients often share a single IP (corporate NAT, CDN). Per-API-key limiting is more accurate and resilient to abuse patterns. Per-organization is best for B2B APIs where an org-level quota is the business requirement. Never use per-IP as the sole mechanism for public-facing APIs.

What is the Banking PSD2 recommended rate limit configuration?▾

EBA (European Banking Authority) guidelines for PSD2 open banking recommend conservative burst settings (burstMultiplier ≤ 1.5) and long retry-after windows (≥ 30 seconds) to prevent retry flooding from Third Party Providers (TPPs). Token bucket strategy is preferred. Per-API-key limiting is standard, as each TPP registers individual OAuth credentials.

Related glossary

Token bucket Fixed window rate limiting Retry storm RateLimit headers (IETF)HTTP 429 Too Many Requests Exponential backoff

API Rate Limit Calculator

TL;DR

How the math works

Configuration

Results

How your strategy handles traffic

Methodology

What is API rate limiting and why it matters

Rate limiting strategies compared

The hidden danger: retry storms

X-RateLimit and IETF RateLimit headers

Token bucket vs fixed window

Copy-paste solution

Frequently asked questions

Related glossary

Related tools