Skip to yearly menu bar Skip to main content

Workshop: Workshop on robustness of zero/few-shot learning in foundation models (R0-FoMo)

HART: Efficient Adaptation via Regularized Autoregressive Parameter Generation

Chen Liang · Nikos Karampatziakis · Tuo Zhao · Weizhu Chen


Fine-tuning is an effective approach for adapting a pre-trained language model to downstream tasks, but it incurs a high computational cost. To achieve an extremely efficient task adaptation, \citet{phang2022hypertuning} have proposed to use an auxiliary hypernetwork to generate task-specific weights without any backpropagation. A hypernetwork can generate weights for parameter-efficient fine-tuning (PEFT) modules, such as prefixes \citep{li2021prefix} and LoRAs \citep{hu2021lora}, for any unseen task based on a few task-specific demonstration examples, at the cost of a single forward pass. However, hypernetwork training is challenging. Firstly, it is sample inefficient due to the under-exploitation of the dependencies between PEFT weights across layers. Secondly, it exhibits training instability due to the high diversity of few-shot demonstration inputs. To address these limitations, we propose a novel hypernetwork training approach, named HART. It exploits layerwise dependencies by autoregressively generating weights for individual layers, and stabilizes the training by regularizing the consistency between weights generated based on different demonstrations. We train the hypernetwork on a diverse collection of tasks \citep{wang2022super,sanh2021multitask} and evaluate its performance on unseen tasks. HART notably outperforms \citet{phang2022hypertuning} on both T5-Large and T5-XL models.

Chat is not available.