Query-Only Attention for Trustworthy Continual Adaptation
Abstract
Foundation models deployed in dynamic environments face continual distributional shifts and evolving data conditions, where failure to adapt can erode reliability and fairness. We propose a Query-Only Attention mechanism that discards keys and values while preserving the inductive bias of full-attention architectures. In continual learning scenarios, this simplified mechanism significantly mitigates both loss of plasticity and catastrophic forgetting, outperforming baselines such as selective re-initialization. Query-Only Attention achieves competitive performance to full attention while being more compute-efficient. We establish a conceptual link between query-only attention, full transformer attention, and model agnostic meta learning, framing them as instances of meta-learning. Finally, through Hessian spectrum analysis, we show that models maintaining higher curvature rank across tasks exhibit sustained adaptability, improving trustworthiness under distribution shift. These findings highlight principles relevant to real-world continual learning systems that demand reliability, fairness, and accountability.