ELLA: Efficient Lifelong Learning for Adapters in Large Language Models
Shristi Das Biswas · Yue Zhang · Anwesan Pal · Radhika Bhargava · Kaushik Roy
Abstract
Continual Learning (CL) is a vital requirement for deploying large language models (LLMs) in today's dynamic world. Existing approaches seek to acquire task-specific knowledge via parameter-efficient fine-tuning (PEFT) with reduced compute overhead. However, sequential FT often sacrifices performance retention and forward transfer, especially under replay-free constraints. We introduce ELLA, a novel CL framework that regularizes low-rank adapter updates via cross-task subspace decorrelation. By learning a compact adapter per task and penalizing overlap between representational subspaces for past and current adapter activations, ELLA encourages task specialization while preserving prior knowledge, without storing data. Across $3$ benchmarks, ELLA outperforms prior CL methods in both accuracy and forgetting metrics, providing a scalable solution for lifelong LLM learning.
Video
Chat is not available.
Successful Page Load