Studying Coordination and Collusion in Multi-Agent LLM Code Reviews
Abstract
Agentic large language models (LLMs) are rapidly moving from single-assistanttools to collaborative systems that write and review code, creating new failuremodes, as agents may coordinate to subvert oversight. We study whether suchsystems exhibit coordination behaviour that enables backdoored code to pass peer-review, and how these behaviours vary across seven frontier models with minimalcoordination scaffolding. Six of seven models exploited the backdoor incentive,submitting functionally impaired code in 34.9-75.9% of attempts across 10 roundsof our simulation spanning 90 seeds. Whilst GPT-5 largely refused (≤10%),models across GPT, Gemini and Claude model families preferentially requestedreviews from other saboteurs (29.2–38.5% vs 20% random), indicating possibleselective coordination capabilities. Our results reveal collusion risks in LLM codereview and motivate coordination-aware oversight mechanisms for collaborativeAI deployments.