We present an approach to encode a speech signal into a fixed-size representation which minimizes the cosine loss with the existing massively multilingual LASER text embedding space. Sentences are close in this embedding space, independently of their language and modality, either text or audio. Using a similarity metric in that multimodal embedding space, we perform mining of audio in German, French, Spanish and English from Librivox against billions of sentences from Common Crawl. This yielded more than twenty thousand hours of aligned speech translations. To evaluate the automatically mined speech/text corpora, we train neural speech translation systems for several languages pairs. Adding the mined data, achieves significant improvements in the BLEU score on the CoVoST2 and the MUST-C test sets with respect to a very competitive baseline. Our approach can also be used to directly perform speech-to-speech mining, without the need to first transcribe or translate the data. We obtain more than one thousand three hundred hours of aligned speech in French, German, Spanish and English. This speech corpus has the potential to boost research in speech-to-speech translation which suffers from scarcity of natural end-to-end training data. All the mined multimodal corpora will be made freely available.
Paul-Ambroise Duquenne (Facebook)
Hongyu Gong (Facebook AI Research)
Hongyu is a research scientist at Facebook AI Research with a focus on speech and text translation. Her research interests span the areas of language representation learning and language generation. She obtained her PhD from the University of Illinois at Urbana-Champaign in 2020.
Holger Schwenk (Université of Le Mans)
Related Events (a corresponding poster, oral, or spotlight)
2021 Spotlight: Multimodal and Multilingual Embeddings for Large-Scale Speech Mining »
Dates n/a. Room
More from the Same Authors
2021 Poster: Pay Better Attention to Attention: Head Selection in Multilingual and Multi-Domain Sequence Modeling »
Hongyu Gong · Yun Tang · Juan Pino · Xian Li
2021 Poster: Robust Optimization for Multilingual Translation with Imbalanced Data »
Xian Li · Hongyu Gong
2019 : Poster lighting round »
Yinhe Zheng · Anders Søgaard · Abdelrhman Saleh · Youngsoo Jang · Hongyu Gong · Omar U. Florez · Margaret Li · Andrea Madotto · The Tung Nguyen · Ilia Kulikov · Arash einolghozati · Yiru Wang · Mihail Eric · Victor Petrén Bach Hansen · Nurul Lubis · Yen-Chen Wu