Timezone: »
Poster
Pay Better Attention to Attention: Head Selection in Multilingual and Multi-Domain Sequence Modeling
Hongyu Gong · Yun Tang · Juan Pino · Xian Li
Multi-head attention has each of the attention heads collect salient information from different parts of an input sequence, making it a powerful mechanism for sequence modeling. Multilingual and multi-domain learning are common scenarios for sequence modeling, where the key challenge is to maximize positive transfer and mitigate negative interference across languages and domains. In this paper, we find that non-selective attention sharing is sub-optimal for achieving good generalization across all languages and domains. We further propose attention sharing strategies to facilitate parameter sharing and specialization in multilingual and multi-domain sequence modeling. Our approach automatically learns shared and specialized attention heads for different languages and domains. Evaluated in various tasks including speech recognition, text-to-text and speech-to-text translation, the proposed attention sharing strategies consistently bring gains to sequence models built upon multi-head attention. For speech-to-text translation, our approach yields an average of $+2.0$ BLEU over $13$ language directions in multilingual setting and $+2.0$ BLEU over $3$ domains in multi-domain setting.
Author Information
Hongyu Gong (Facebook AI Research)
Hongyu is a research scientist at Facebook AI Research with a focus on speech and text translation. Her research interests span the areas of language representation learning and language generation. She obtained her PhD from the University of Illinois at Urbana-Champaign in 2020.
Yun Tang (Facebook)
Juan Pino (Meta)
Xian Li (Meta AI)
More from the Same Authors
-
2021 Spotlight: Multimodal and Multilingual Embeddings for Large-Scale Speech Mining »
Paul-Ambroise Duquenne · Hongyu Gong · Holger Schwenk -
2021 Poster: Robust Optimization for Multilingual Translation with Imbalanced Data »
Xian Li · Hongyu Gong -
2021 Poster: Multimodal and Multilingual Embeddings for Large-Scale Speech Mining »
Paul-Ambroise Duquenne · Hongyu Gong · Holger Schwenk -
2020 Poster: Deep Transformers with Latent Depth »
Xian Li · Asa Cooper Stickland · Yuqing Tang · Xiang Kong -
2020 Poster: Cross-lingual Retrieval for Iterative Self-Supervised Training »
Chau Tran · Yuqing Tang · Xian Li · Jiatao Gu -
2020 Spotlight: Cross-lingual Retrieval for Iterative Self-Supervised Training »
Chau Tran · Yuqing Tang · Xian Li · Jiatao Gu -
2019 : Poster lighting round »
Yinhe Zheng · Anders Søgaard · Abdelrhman Saleh · Youngsoo Jang · Hongyu Gong · Omar U. Florez · Margaret Li · Andrea Madotto · The Tung Nguyen · Ilia Kulikov · Arash einolghozati · Yiru Wang · Mihail Eric · Victor Petrén Bach Hansen · Nurul Lubis · Yen-Chen Wu