Credal Transformer: A Principled Approach for Quantifying and Mitigating Hallucinations in Large Language Models
Abstract
Large Language Models (LLMs) have demonstrated remarkable capabilities in generating fluent text, yet their practical application is critically hindered by the phenomenon of hallucination,—the generation of factually incorrect yet high-confidence assertions. We posit that a fundamental cause lies within the Transformer architecture itself, specifically in the Softmax function of the attention mechanism. We argue that Softmax induces Artificial Certainty by collapsing latent, potentially ambiguous attention scores into a single normalized probability distribution. This process, which discards information about model uncertainty at each layer, propagates and amplifies, leading to overconfident predictions on fabricated content. To address this, we introduce the Credal Transformer, a novel architecture that replaces the standard attention mechanism with a Credal Attention Mechanism (CAM). Grounded in evidential theory, CAM does not produce a single attention vector but a credal set, a convex set of distributions. The size of this set serves as a direct, differentiable measure of the model's epistemic uncertainty. For computational tractability, we parameterize this credal set using principles from Evidential Deep Learning, where attention scores are re-conceptualized as evidence masses for a Dirichlet distribution. Sufficient evidence yields a sharp distribution that recovers standard attention, whereas insufficient evidence results in a diffuse distribution, explicitly representing ambiguity or lack of knowledge. We empirically demonstrate the efficacy of our approach on a suite of tasks. The Credal Transformer correctly identifies out-of-distribution inputs by producing high-entropy outputs, quantifies ambiguity in inputs, and, in a question-answering benchmark, significantly reduces confident errors on unanswerable questions by abstaining from prediction. Our contribution is twofold: We present a concrete architecture for mitigating hallucinations and, more broadly, advocate a design paradigm that integrates uncertainty quantification as an intrinsic component of the model. The Credal Transformer provides a principled architectural foundation for developing more reliable and trustworthy AI systems capable of representing their own uncertainty.