Timezone: »

Less-forgetting Multi-lingual Fine-tuning
Yuren Mao · Yaobo Liang · Nan Duan · Haobo Wang · Kai Wang · Lu Chen · Yunjun Gao


Multi-lingual fine-tuning (MLF), which fine-tunes a multi-lingual language model (MLLM) with multiple source languages, aims to gain good zero-shot performance on target languages. In MLF, the fine-tuned model tends to fit the source languages while forgetting its cross-lingual knowledge obtained from the pre-training stage. This forgetting phenomenon degenerates the zero-shot performance of MLF, which remains under-explored. To fill this gap, this paper proposes a multi-lingual fine-tuning method, dubbed Less-forgetting Multi-lingual Fine-tuning (LF-MLF). In LF-MLF, we cast multi-lingual fine-tuning as a constrained optimization problem, where the optimization objective is to minimize forgetting, and constraints are reducing the fine-tuning loss. The proposed method has superior zero-shot performance; furthermore, it can achieve the Pareto stationarity. Extensive experiments on Named Entity Recognition, Question Answering and Natural Language Inference back up our theoretical analysis and validate the superiority of our proposals.

Author Information

Yuren Mao (Zhejiang University)

Yuren Mao received his PhD degree in computer science from University of New South Wales, Australia in 2022. He is currently an assistant professor with the School of Software Technology, Zhejiang University, China. His current research interests include Multi-task Learning and its applications. His research results have been published at leading conferences such as ICML, NeurIPS, ACL, TKDE and so on.

Yaobo Liang (Microsoft)
Nan Duan (Microsoft Research Asia)
Haobo Wang (Zhejiang University)
Kai Wang (University of New South Wales)
Lu Chen (Zhejiang University)
Yunjun Gao (Zhejiang University)

More from the Same Authors