Timezone: »
Despite rapid progress in the recent past, current speech recognition systems still require labeled training data which limits this technology to a small fraction of the languages spoken around the globe. This paper describes wav2vec-U, short for wav2vec Unsupervised, a method to train speech recognition models without any labeled data. We leverage self-supervised speech representations to segment unlabeled audio and learn a mapping from these representations to phonemes via adversarial training. The right representations are key to the success of our method. Compared to the best previous unsupervised work, wav2vec-U reduces the phone error rate on the TIMIT benchmark from 26.1 to 11.3. On the larger English Librispeech benchmark, wav2vec-U achieves a word error rate of 5.9 on test-other, rivaling some of the best published systems trained on 960 hours of labeled data from only two years ago. We also experiment on nine other languages, including low-resource languages such as Kyrgyz, Swahili and Tatar.
Author Information
Alexei Baevski (Facebook AI Research)
Wei-Ning Hsu (Facebook, Inc.)
Alexis CONNEAU (Facebook)
Michael Auli (Facebook AI Research)
Related Events (a corresponding poster, oral, or spotlight)
-
2021 Poster: Unsupervised Speech Recognition »
Thu. Dec 9th 04:30 -- 06:00 PM Room
More from the Same Authors
-
2023 Poster: DinoSR: Self-Distillation and Online Clustering for Self-supervised Speech Representation Learning »
Alexander Liu · Heng-Jui Chang · Michael Auli · Wei-Ning Hsu · Jim Glass -
2023 Poster: Textually Pretrained Speech Language Models »
Michael Hassid · Tal Remez · Tu Anh Nguyen · Itai Gat · Alexis CONNEAU · Felix Kreuk · Jade Copet · Alexandre Defossez · Gabriel Synnaeve · Emmanuel Dupoux · Roy Schwartz · Yossi Adi -
2022 Poster: u-HuBERT: Unified Mixed-Modal Speech Pretraining And Zero-Shot Transfer to Unlabeled Modality »
Wei-Ning Hsu · Bowen Shi -
2022 Poster: Masked Autoencoders that Listen »
Po-Yao Huang · Hu Xu · Juncheng Li · Alexei Baevski · Michael Auli · Wojciech Galuba · Florian Metze · Christoph Feichtenhofer -
2020 : HUBERT: How much can a bad teacher benefit ASR pre-training? »
Wei-Ning Hsu -
2020 : Text-Free Image-to-Speech Synthesis Using Learned Segmental Units »
Wei-Ning Hsu -
2020 Poster: wav2vec 2.0: A Framework for Self-Supervised Learning of Speech Representations »
Alexei Baevski · Yuhao Zhou · Abdelrahman Mohamed · Michael Auli -
2019 Poster: Cross-lingual Language Model Pretraining »
Alexis CONNEAU · Guillaume Lample -
2019 Spotlight: Cross-lingual Language Model Pretraining »
Alexis CONNEAU · Guillaume Lample -
2018 : Coffee break + posters 2 »
Jan Kremer · Erik McDermott · Brandon Carter · Albert Zeyer · Andreas Krug · Paul Pu Liang · Katherine Lee · Dominika Basaj · Abelino Jimenez · Lisa Fan · Gautam Bhattacharya · Tzeviya S Fuchs · David Gifford · Loren Lugosch · Orhan Firat · Benjamin Baer · JAHANGIR ALAM · Jamin Shin · Mirco Ravanelli · Paul Smolensky · Zining Zhu · Hamid Eghbal-zadeh · Skyler Seto · Imran Sheikh · Joao Felipe Santos · Yonatan Belinkov · Nadir Durrani · Oiwi Parker Jones · Shuai Tang · André Merboldt · Titouan Parcollet · Wei-Ning Hsu · Krishna Pillutla · Ehsan Hosseini-Asl · Monica Dinculescu · Alexander Amini · Ying Zhang · Taoli Cheng · Alain Tapp -
2018 : Coffee break + posters 1 »
Samuel Myer · Wei-Ning Hsu · Jialu Li · Monica Dinculescu · Lea Schönherr · Ehsan Hosseini-Asl · Skyler Seto · Oiwi Parker Jones · Imran Sheikh · Thomas Manzini · Yonatan Belinkov · Nadir Durrani · Alexander Amini · Johanna Hansen · Gabi Shalev · Jamin Shin · Paul Smolensky · Lisa Fan · Zining Zhu · Hamid Eghbal-zadeh · Benjamin Baer · Abelino Jimenez · Joao Felipe Santos · Jan Kremer · Erik McDermott · Andreas Krug · Tzeviya S Fuchs · Shuai Tang · Brandon Carter · David Gifford · Albert Zeyer · André Merboldt · Krishna Pillutla · Katherine Lee · Titouan Parcollet · Orhan Firat · Gautam Bhattacharya · JAHANGIR ALAM · Mirco Ravanelli -
2017 Poster: Unsupervised Learning of Disentangled and Interpretable Representations from Sequential Data »
Wei-Ning Hsu · Yu Zhang · James Glass