Poster
in
Workshop: Workshop on Multi-Turn Interactions in Large Language Models Sat, Dec 6, 2025 • 10:30 AM – 11:30 AM PST

Studying Coordination and Collusion in Multi-Agent LLM Code Reviews

Jennifer Za · Aristeidis Panos · Roger Dearnaley · Samuel Albanie

Project Page [ OpenReview]

Abstract

Agentic large language models (LLMs) are rapidly moving from single-assistanttools to collaborative systems that write and review code, creating new failuremodes, as agents may coordinate to subvert oversight. We study whether suchsystems exhibit coordination behaviour that enables backdoored code to pass peer-review, and how these behaviours vary across seven frontier models with minimalcoordination scaffolding. Six of seven models exploited the backdoor incentive,submitting functionally impaired code in 34.9-75.9% of attempts across 10 roundsof our simulation spanning 90 seeds. Whilst GPT-5 largely refused (≤10%),models across GPT, Gemini and Claude model families preferentially requestedreviews from other saboteurs (29.2–38.5% vs 20% random), indicating possibleselective coordination capabilities. Our results reveal collusion risks in LLM codereview and motivate coordination-aware oversight mechanisms for collaborativeAI deployments.

Chat is not available.