Rate Limiting Strategies Compared: Token Bucket, Fixed Window, and PSD2
Fixed window boundary exploit, token bucket burst math, retry storm prevention, and why banking APIs need different configuration than public APIs.
Rate limiting is often treated as "set and forget" infrastructure — a middleware layer to prevent database overload. For platform engineers building financial systems, the choice of api rate limiting strategy is an architectural decision with direct consequences for system stability and, in the EU banking space, regulatory compliance.
The boundary exploit
The Fixed Window algorithm is the most intuitive implementation: divide time into discrete buckets, maintain a counter per client. Simple to implement with Redis INCR + EXPIRE. It contains a fundamental mathematical vulnerability.
With a limit of 100 requests per minute (L=100, W=60s), a client can burst to double the intended throughput by timing requests around the window reset:
100 requests at t=59s (window almost closed, quota exhausted) 100 requests at t=60s (window resets, full quota available) = 200 requests in 2 seconds
This is exactly 2× the maximum density your downstream services were designed to handle — and it requires no malicious intent, just a naive client that retries immediately on 429.
Token bucket behaviour with Nginx limit_req
The Token Bucket algorithm solves the boundary problem. Tokens accumulate at a constant refill rate r. Each request consumes one token. The bucket has a maximum capacity b that defines the burst allowance. This is the preferred api rate limiting strategy because it allows short-term bursts while enforcing a strict long-term average.
Nginx implementation (leaky bucket)
# Zone definition: 10MB holds ~160,000 client states
# rate=10r/s is the refill rate
limit_req_zone $binary_remote_addr zone=api_limit:10m rate=10r/s;
server {
location /api/v1/payments {
# burst=50 defines the bucket capacity
# nodelay: pass immediately within burst, reject beyond — no queuing
# Note: Nginx documents limit_req as "leaky bucket" algorithm.
# With nodelay, behaviour is equivalent to token bucket.
limit_req zone=api_limit burst=50 nodelay;
limit_req_status 429;
proxy_pass http://payment_backend;
}
}Nginx implements rate limiting via the limit_req module, which uses a leaky bucket algorithm internally. When configured withnodelay, the behaviour becomes functionally equivalent to a token bucket: requests up to the burst capacity are served immediately, and requests beyond the burst limit receive a 429 immediately rather than being queued. This is the recommended configuration for API rate limiting because it avoids the artificial latency introduced by request queuing.
Retry storms: the silent killer
When the rate limiter starts rejecting traffic with HTTP 429, the client's retry behaviour determines whether the system recovers or collapses further. Synchronized backoff without jitter creates a retry storm.
1,000 clients blocked simultaneously. All use a fixed 5-second backoff. At t=5s, your gateway receives 1,000 simultaneous requests — recreating the original overload with none of the legitimate traffic absorbed.
The solution is jitter, as documented in the AWS Architecture Blog on exponential backoff:
RetryTime = min(cap, base × 2^attempt) + uniform_random(0, jitter_max)
With 1,000 clients and a jitter window of [0, 5s], retries spread into approximately 200 requests per second — a recovery-friendly curve instead of a synchronized spike.
Banking APIs and PSD2
In European banking, api rate limiting strategy choices intersect with regulatory requirements. Under PSD2 and the EBA's Regulatory Technical Standards for Strong Customer Authentication, banks must provide a Dedicated Interface for Third Party Providers (TPPs) with availability and performance comparable to the bank's own customer-facing channel.
This creates a practical constraint on rate limits: if your mobile app allows a user to refresh their balance several times per second, an equivalent TPP limit must be defensible. The EBA doesn't mandate specific multiplier values, but industry practice in Open Banking implementations has converged on conservative defaults to prevent database contention:
- Industry convention (not a regulatory mandate): burst capacity rarely exceeds 1.5–2× the average rate for financial data endpoints
Retry-Afterheader (per RFC 6585) signals exact reset time to TPPs- Limits applied per
client_id(the TPP's OAuth credential), not per IP address — TPPs route through centralized proxy farms
Limiting by IP address fails in this context: a single large TPP may route all its traffic through a small range of egress IPs, sharing quota across thousands of end-users.
IETF RateLimit headers
Move away from X-RateLimit-* (legacy, non-standard) toward the IETF draft headers (draft-ietf-httpapi-ratelimit-headers, not yet published as an RFC). Key difference: X-RateLimit-Reset is a unix epoch timestamp; RateLimit-Reset (IETF) is seconds until reset — a relative value. Mixing them causes client bugs.
HTTP/1.1 429 Too Many Requests
RateLimit-Limit: 100
RateLimit-Remaining: 0
RateLimit-Reset: 30
Retry-After: 30
{
"error": "rate_limit_exceeded",
"message": "Too many requests. Retry after RateLimit-Reset seconds."
}Nginx config for B2B / banking
For authenticated B2B APIs, limit by credential identifier, not IP:
# Extract API key from header, set as limiting variable
map $http_x_api_key $api_client_id {
default $binary_remote_addr;
"~^.+$" $http_x_api_key;
}
limit_req_zone $api_client_id zone=banking_api:20m rate=50r/s;
server {
location /api/v1/accounts {
limit_req zone=banking_api burst=25 nodelay;
limit_req_status 429;
proxy_pass http://account_service;
}
}To tune your bucket sizes and refill rates for your specific traffic patterns, use the API Rate Limit Calculator which generates production-ready Nginx, Kong, and AWS API Gateway configs with correct IETF RateLimit headers.
Related tool
API Rate Limit Calculator →Optimal rate limits from traffic and consumers. Nginx, Kong, AWS snippets and IETF headers.