Timezone: »
Poster
Robust Optimization for Multilingual Translation with Imbalanced Data
Xian Li · Hongyu Gong
Multilingual models are parameter-efficient and especially effective in improving low-resource languages by leveraging crosslingual transfer. Despite recent advance in massive multilingual translation with ever-growing model and data, how to effectively train multilingual models has not been well understood. In this paper, we show that a common situation in multilingual training, data imbalance among languages, poses optimization tension between high resource and low resource languages where the found multilingual solution is often sub-optimal for low resources. We show that common training method which upsamples low resources can not robustly optimize population loss with risks of either underfitting high resource languages or overfitting low resource ones. Drawing on recent findings on the geometry of loss landscape and its effect on generalization, we propose a principled optimization algorithm, Curvature Aware Task Scaling (CATS), which adaptively rescales gradients from different tasks with a meta objective of guiding multilingual training to low-curvature neighborhoods with uniformly low loss for all languages. We ran experiments on common benchmarks (TED, WMT and OPUS-100) with varying degrees of data imbalance. CATS effectively improved multilingual optimization and as a result demonstrated consistent gains on low resources ($+0.8$ to $+2.2$ BLEU) without hurting high resources. In addition, CATS is robust to overparameterization and large batch size training, making it a promising training method for massive multilingual models that truly improve low resource languages.
Author Information
Xian Li (Meta AI)
Hongyu Gong (Facebook AI Research)
Hongyu is a research scientist at Facebook AI Research with a focus on speech and text translation. Her research interests span the areas of language representation learning and language generation. She obtained her PhD from the University of Illinois at Urbana-Champaign in 2020.
More from the Same Authors
-
2021 Spotlight: Multimodal and Multilingual Embeddings for Large-Scale Speech Mining »
Paul-Ambroise Duquenne · Hongyu Gong · Holger Schwenk -
2021 Poster: Pay Better Attention to Attention: Head Selection in Multilingual and Multi-Domain Sequence Modeling »
Hongyu Gong · Yun Tang · Juan Pino · Xian Li -
2021 Poster: Multimodal and Multilingual Embeddings for Large-Scale Speech Mining »
Paul-Ambroise Duquenne · Hongyu Gong · Holger Schwenk -
2020 Poster: Deep Transformers with Latent Depth »
Xian Li · Asa Cooper Stickland · Yuqing Tang · Xiang Kong -
2020 Poster: Cross-lingual Retrieval for Iterative Self-Supervised Training »
Chau Tran · Yuqing Tang · Xian Li · Jiatao Gu -
2020 Spotlight: Cross-lingual Retrieval for Iterative Self-Supervised Training »
Chau Tran · Yuqing Tang · Xian Li · Jiatao Gu -
2019 : Poster lighting round »
Yinhe Zheng · Anders Søgaard · Abdelrhman Saleh · Youngsoo Jang · Hongyu Gong · Omar U. Florez · Margaret Li · Andrea Madotto · The Tung Nguyen · Ilia Kulikov · Arash einolghozati · Yiru Wang · Mihail Eric · Victor Petrén Bach Hansen · Nurul Lubis · Yen-Chen Wu