Rate limits: token bucket, fixed window, PSD2

A fixed-window limiter set to 100 requests/minute will happily pass 200 requests in two seconds - and a client that retries on 429 triggers it without any malice. For a payments API under PSD2, that doubled burst is the difference between a stable gateway and a regulator asking why third-party providers got throttled. The algorithm you pick is a stability and compliance decision, not a middleware afterthought.

The boundary exploit

The Fixed Window algorithm is the most intuitive implementation: divide time into discrete buckets, maintain a counter per client. Simple to implement with Redis INCR + EXPIRE. It contains a fundamental mathematical vulnerability.

With a limit of 100 requests per minute (L=100, W=60s), a client can burst to double the intended throughput by timing requests around the window reset:

100 requests at t=59s  (window almost closed, quota exhausted)
100 requests at t=60s  (window resets, full quota available)
= 200 requests in 2 seconds

This is exactly 2× the maximum density your downstream services were designed to handle - and it requires no malicious intent, just a naive client that retries immediately on 429.

Token bucket behaviour with Nginx limit_req

The Token Bucket algorithm solves the boundary problem. Tokens accumulate at a constant refill rate r. Each request consumes one token. The bucket has a maximum capacity b that defines the burst allowance. It is the default choice for most gateways because it allows short-term bursts while enforcing a strict long-term average.

Nginx implementation (leaky bucket)

nginx

# Zone definition: 10MB holds ~160,000 client states
# rate=10r/s is the refill rate
limit_req_zone $binary_remote_addr zone=api_limit:10m rate=10r/s;

server {
    location /api/v1/payments {
        # burst=50 defines the bucket capacity
        # nodelay: pass immediately within burst, reject beyond - no queuing
        # Note: Nginx documents limit_req as "leaky bucket" algorithm.
        # With nodelay, behaviour is equivalent to token bucket.
        limit_req zone=api_limit burst=50 nodelay;
        limit_req_status 429;

        proxy_pass http://payment_backend;
    }
}

Nginx implements rate limiting via the limit_req module, which uses a leaky bucket algorithm internally. When configured withnodelay, the behaviour becomes functionally equivalent to a token bucket: requests up to the burst capacity are served immediately, and requests beyond the burst limit receive a 429 immediately rather than being queued. This is the recommended configuration for API rate limiting because it avoids the artificial latency introduced by request queuing.

Retry storms: the silent killer

When the rate limiter starts rejecting traffic with HTTP 429, the client's retry behaviour determines whether the system recovers or collapses further. Synchronized backoff without jitter creates a retry storm.

1,000 clients blocked simultaneously. All use a fixed 5-second backoff. At t=5s, your gateway receives 1,000 simultaneous requests - recreating the original overload with none of the legitimate traffic absorbed.

The solution is jitter, as documented in the AWS Architecture Blog on exponential backoff:

RetryTime = min(cap, base × 2^attempt) + uniform_random(0, jitter_max)

With 1,000 clients and a jitter window of [0, 5s], retries spread into approximately 200 requests per second - a recovery-friendly curve instead of a synchronized spike.

Banking APIs and PSD2

In European banking, api rate limiting strategy choices intersect with regulatory requirements. Under PSD2 and the EBA's Regulatory Technical Standards for Strong Customer Authentication, banks must provide a Dedicated Interface for Third Party Providers (TPPs) with availability and performance comparable to the bank's own customer-facing channel.

This creates a practical constraint on rate limits: if your mobile app allows a user to refresh their balance several times per second, an equivalent TPP limit must be defensible. The EBA doesn't mandate specific multiplier values, but industry practice in Open Banking implementations has converged on conservative defaults to prevent database contention:

Industry convention (not a regulatory mandate): burst capacity rarely exceeds 1.5–2× the average rate for financial data endpoints
Retry-After header (per RFC 6585) signals exact reset time to TPPs
Limits applied per client_id (the TPP's OAuth credential), not per IP address - TPPs route through centralized proxy farms

Limiting by IP address fails in this context: a single large TPP may route all its traffic through a small range of egress IPs, sharing quota across thousands of end-users.

The Fix - production configs

IETF RateLimit headers

Move away from X-RateLimit-* (legacy, non-standard) toward the IETF draft headers (draft-ietf-httpapi-ratelimit-headers, not yet published as an RFC). Key difference: X-RateLimit-Reset is a unix epoch timestamp; RateLimit-Reset (IETF) is seconds until reset - a relative value. Mixing them causes client bugs.

http

HTTP/1.1 429 Too Many Requests
RateLimit-Limit: 100
RateLimit-Remaining: 0
RateLimit-Reset: 30
Retry-After: 30

{
  "error": "rate_limit_exceeded",
  "message": "Too many requests. Retry after RateLimit-Reset seconds."
}

Nginx config for B2B / banking

For authenticated B2B APIs, limit by credential identifier, not IP:

nginx

# Extract API key from header, set as limiting variable
map $http_x_api_key $api_client_id {
    default $binary_remote_addr;
    "~^.+$" $http_x_api_key;
}

limit_req_zone $api_client_id zone=banking_api:20m rate=50r/s;

server {
    location /api/v1/accounts {
        limit_req zone=banking_api burst=25 nodelay;
        limit_req_status 429;
        proxy_pass http://account_service;
    }
}

To tune your bucket sizes and refill rates for your specific traffic patterns, use the API Rate Limit Calculator which generates production-ready Nginx, Kong, and AWS API Gateway configs with correct IETF RateLimit headers.