dkduckkit.dev

Fan-out (distributed systems)

Latency & SRE

Fan-out is the pattern where a single incoming request triggers multiple parallel downstream calls — to databases, microservices, caches, or external APIs. Fan-out is common in API gateways and BFF (Backend for Frontend) layers. While the parallel calls complete faster than sequential calls, the composite latency is determined by the slowest dependency, making tail latency management critical.

Formula

latency_fan_out = max(latency_dep_1, latency_dep_2, ..., latency_dep_N) + overhead. At p99, the probability that at least one of N calls hits its p99 = 1 - (1 - 0.01)^N. For N=10: ~10% of requests hit at least one p99 boundary.

Why it matters in practice

A request that fans out to 20 microservices is almost guaranteed to hit the p99 of at least one of them on every single request. This means the effective p99 of the composite call approaches the p99 of the slowest dependency, regardless of how fast the other 19 are. Teams that benchmark individual microservices in isolation and declare success without testing fan-out patterns will discover this problem in production.

Common mistakes

  • Not setting per-call timeouts in fan-out requests — a single slow dependency can hold the entire request open indefinitely.
  • Waiting for all fan-out calls to complete when some results are optional — use partial results and return what is available within the budget.
  • Ignoring the statistical compounding of tail latency across parallel calls.