API throttling vs rate limiting
Rate limiting and throttling are often used interchangeably but have a meaningful distinction in production systems. Rate limiting is a client-side policy: it enforces a maximum request rate per client, rejecting excess requests with a 429. Throttling is a server-side or network-level policy: it slows down or delays requests when the system is under stress, regardless of per-client counts. Throttling preserves service availability by sacrificing response time; rate limiting preserves service availability by rejecting requests.
Why it matters in practice
Understanding the distinction matters when debugging 429 vs 503 responses, and when choosing the right mechanism. A DynamoDB capacity exceeded error is throttling — the table's provisioned throughput is exhausted, and writes are rejected. An API Gateway usage plan 429 is rate limiting — the client's quota is exhausted. A server returning 503 during overload is throttling — the service is protecting itself. Each requires a different client response strategy.
Common mistakes
- •Implementing only rate limiting without throttling — if all clients are within their per-client limits but aggregate traffic exceeds server capacity, you need a global throttle as well.
- •Treating all 4xx/5xx responses as equivalent for retry purposes — 429 should trigger a slower retry (wait for Retry-After), 503 may indicate transient issues that recover faster.