Cloudflare Turns a Security-Review Skill Into an Autonomous Agent Pipeline

Cloudflare evolved a single code-review skill into a two-stage multi-agent pipeline that found and triaged over 20,000 findings across 100+ repos.

Tools Vulnerability Threat Intelligence

Cloudflare published details on how it scaled an internal security-review skill into a full autonomous pipeline. The team started with a single skill that ran security scans against repositories, but quickly hit scaling limits — context window constraints and difficulty understanding cross-repository relationships.

They restructured the same seven steps into an autonomous pipeline split across two processes. The first handles discovery: three agents collaborate to read the codebase, map architecture and relationships, then attempt to find and break things, validating findings and deduplicating before producing a report. The second process handles validation and remediation: a separate model (not just a separate session) attempts to disprove or confirm each finding, then a fix is generated, tested, and opened as a PR for human review.

Across more than 100 repositories, the pipeline produced over 20,000 raw findings. After validation, ~13,800 survived; roughly 5,000 were deduplicated and 1,100 were downgraded as low-risk, leaving about 7,200 findings routed to engineering and security teams. The final breakdown: 41 critical, 777 high, with the remainder medium/low.

The structure — separate discovery and validation stages, a different model for adversarial review, and a human-in-the-loop gate before any fix merges — is a useful reference architecture for any team trying to move AI-assisted security review beyond a single-shot skill.

Why it matters: If you're scaling AI-assisted code review, Cloudflare's split between a discovery pipeline and a separate validation/fix pipeline (using a different model to check findings) is worth copying — it's what keeps signal-to-noise manageable as scope grows past a handful of repos.

Read source →