CEDA: Cross-modal Evaluation through Debate Agents for Robust Hallucination Detection
Abstract
We present CEDA, a novel multimodal framework for detecting hallucinations in large language model outputs through a multi-agent debate approach. While existing methods for black-box LLMs often rely on response sampling and self-consistency checking, our framework leverages a three fold approach - a multi-agent debate setting to critically examine and debate the authenticity of generated content, a lightweight classifier head on top of LLM-as-a-judge for more calibrated detection and a confidence estimation to quantify uncertainty in hallucination detection. This debate-based architecture enables more nuanced and contextual evaluation of potential hallucinations across multiple modalities. Through extensive experiments on five different benchmarks - TruthfulQA, Natural Questions, FEVER, HallusionBench and HaluEval Summarisation, we demonstrate that our approach achieves significant improvements over baseline methods. Our framework also provides interpretable debate traces that help explain the reasoning behind hallucination determinations. These results suggest that structured multi-agent debate systems offer a promising direction for improving the reliability and trustworthiness of language model outputs.