dkduckkit.dev
← Blog
·8 min read·Related tool →

ReDoS: Regex Denial of Service — Backtracking, CVEs, and Defenses

How NFA backtracking explodes, nested quantifiers and overlapping alternation, real CVEs, safe engines and fixes, and why length limits plus testing beat hoping your regex is fine.

On July 2, 2019, at 13:42 UTC, Cloudflare's edge network — which processes roughly 10% of all web traffic — experienced total CPU exhaustion across every server globally. The outage lasted 27 minutes. Discord, Medium, Shopify, and millions of other sites returned 502 errors. The cause was not a DDoS attack or hardware failure. It was a single regular expression added to a WAF rule to detect XSS patterns.

The terminal segment .*(?:.*=.*) forced the regex engine into an exponential search space on certain malformed HTTP requests. A CPU safeguard had been accidentally removed during a recent refactor, and the rule bypassed the standard gradual rollout. A regex took down global infrastructure for half an hour.

How backtracking explodes

JavaScript (V8), Python, Java, and PCRE — the engines behind most backend stacks — use NFA backtracking. When a regex fails to match, the engine backtracks to the last decision point and tries a different path. For most patterns this is fast. For patterns with ambiguity, the number of paths to explore grows exponentially.

The classic example: pattern (a+)+ against input aaaaaab.

  • Inner a+ consumes all six as. Outer + is satisfied.
  • Engine reaches b. No match. Backtracks.
  • Tries splitting six as across inner and outer groups in every possible combination.
  • For n characters: 2n combinations to try.
Input lengthBacktracking stepsApproximate time
201,048,576~100ms
301,073,741,824~2 minutes
401.1 × 10¹²~24 hours

This is not a bug in the engine. It is the correct behaviour of an NFA following a pattern that allows multiple interpretations of the same string. The vulnerability is in the regex itself.

The two patterns that cause ReDoS

Most documented vulnerabilities come from one of two structures:

Nested quantifiers — an inner group with a quantifier wrapped in an outer quantifier:

(X+)+    (X*)+    (X+)*

Any time you see this structure: stop. The engine can match the same characters by distributing them across the inner and outer loops in exponentially many ways.

Alternation with shared prefix — alternatives that can match the same characters, repeated:

(cat|catch)+    (a|a?)+    (https?|http)+

When alternatives overlap, every character becomes a branching decision.

Real packages, real CVEs

These patterns appear regularly in production npm packages:

  • validator.js (CVE-2021-3765) — URL and Magnet URI validation contained catastrophic backtracking patterns. Used by 6,000+ downstream packages. Fixed in commit 496fc8b by replacing overlapping alternations with rigid, non-overlapping structures.
  • moment.js (CVE-2022-31129) — date parsing regex showed O(n²) complexity on malformed inputs with thousands of unmatched parentheses. Quadratic, not exponential — but enough to cause noticeable slowdowns under load.
  • Fedify HTML parser (GHSA-rchf-xwx2-hm93) — attribute matching regex /<(a|link)((\ s+[a-z][a-z:_-]*=...)+)\ s*\/?>/ contained nested quantifiers. An attacker could trigger it by serving a response with a malformed HTML tag.

Which engines are safe

The risk depends entirely on the runtime:

Language / runtimeEngineVulnerable?
Node.js / V8Backtracking NFAYes
Python (re)Backtracking NFAYes
Java (java.util.regex)Backtracking NFAYes
PHP / PCREBacktracking NFAYes
GoRE2 / DFANo
Rust (regex crate)Pike NFANo
.NET 7+ (opt-in)DFA (NonBacktracking)No (with flag)

Node.js is particularly dangerous: regex execution is synchronous on the main event loop. A single catastrophic match blocks all other traffic to that process.

Three fixes

1 — Rewrite the regex to remove ambiguity. Often the nested quantifier is simply redundant:

// Before: catastrophic
/(a+)+b/

// After: linear — outer quantifier was redundant
/a+b/

For email validation: don't write your own. Use a simple bounded pattern that trades 100% RFC compliance for safety:

/^[a-zA-Z0-9_.+-]{1,64}@[a-zA-Z0-9-]{1,253}\.[a-zA-Z]{2,63}$/

2 — Use a safe engine for untrusted input. In Node.js, the re2 npm package is a C++ wrapper around Google's RE2 engine. It throws an error for unsupported patterns (lookahead, backreferences) rather than running them unsafely:

const RE2 = require('re2')

try {
  const safe = new RE2(/(a+)+b/)
  safe.test('aaaaaaaaaaaaaaaaaab')
} catch (e) {
  console.error('Pattern not supported by safe engine')
}

In Go and Rust, the default regex libraries are already linear-time. In .NET 7+, add RegexOptions.NonBacktracking.

3 — Input length limits as a second layer. ReDoS attacks require 30–100 character inputs to cause meaningful delay. A 254-character cap on an email field makes even O(2ⁿ) patterns harmless in practice. Combined with rate limiting to prevent repeated attempts, length limits are cheap and effective defense in depth.

Test your patterns before they reach production. The ReDoS Vulnerability Tester analyses patterns for catastrophic backtracking, shows the backtracking step count, and flags which engine types are at risk.

Related tool

ReDoS Vulnerability Tester

Analyze JavaScript-style regex for catastrophic backtracking: scores from redos-detector, static heuristics, evil inputs, and bounded timing.