Timezone: »
We show for the first time that learning powerful representations from speech audio alone followed by fine-tuning on transcribed speech can outperform the best semi-supervised methods while being conceptually simpler. wav2vec 2.0 masks the speech input in the latent space and solves a contrastive task defined over a quantization of the latent representations which are jointly learned. Experiments using all labeled data of Librispeech achieve 1.8/3.3 WER on the clean/other test sets. When lowering the amount of labeled data to one hour, wav2vec 2.0 outperforms the previous state of the art on the 100 hour subset while using 100 times less labeled data. Using just ten minutes of labeled data and pre-training on 53k hours of unlabeled data still achieves 4.8/8.2 WER. This demonstrates the feasibility of speech recognition with limited amounts of labeled data.
Author Information
Alexei Baevski (Facebook AI Research)
Yuhao Zhou (University of Toronto)
Abdelrahman Mohamed (Facebook AI Research (FAIR))
Michael Auli (Facebook AI Research)
More from the Same Authors
-
2022 Poster: Masked Autoencoders that Listen »
Po-Yao Huang · Hu Xu · Juncheng Li · Alexei Baevski · Michael Auli · Wojciech Galuba · Florian Metze · Christoph Feichtenhofer -
2021 Workshop: 2nd Workshop on Self-Supervised Learning: Theory and Practice »
Pengtao Xie · Ishan Misra · Pulkit Agrawal · Abdelrahman Mohamed · Shentong Mo · Youwei Liang · Jeannette Bohg · Kristina N Toutanova -
2021 Poster: Unsupervised Speech Recognition »
Alexei Baevski · Wei-Ning Hsu · Alexis CONNEAU · Michael Auli -
2021 Oral: Unsupervised Speech Recognition »
Alexei Baevski · Wei-Ning Hsu · Alexis CONNEAU · Michael Auli -
2020 Workshop: Self-Supervised Learning -- Theory and Practice »
Pengtao Xie · Shanghang Zhang · Pulkit Agrawal · Ishan Misra · Cynthia Rudin · Abdelrahman Mohamed · Wenzhen Yuan · Barret Zoph · Laurens van der Maaten · Xingyi Yang · Eric Xing -
2020 : Closing remark »
Abdelrahman Mohamed -
2020 Workshop: Self-Supervised Learning for Speech and Audio Processing »
Abdelrahman Mohamed · Hung-yi Lee · Shinji Watanabe · Shang-Wen Li · Tara Sainath · Karen Livescu