Timezone: »
Neural networks have become ubiquitous in automatic speech recognition systems. While neural networks are typically used as acoustic models in more complex systems, recent studies have explored end-to-end speech recognition systems based on neural networks, which can be trained to directly predict text from input acoustic features. Although such systems are conceptually elegant and simpler than traditional systems, it is less obvious how to interpret the trained models. In this work, we analyze the speech representations learned by a deep end-to-end model that is based on convolutional and recurrent layers, and trained with a connectionist temporal classification (CTC) loss. We use a pre-trained model to generate frame-level features which are given to a classifier that is trained on frame classification into phones. We evaluate representations from different layers of the deep model and compare their quality for predicting phone labels. Our experiments shed light on important aspects of the end-to-end model such as layer depth, model complexity, and other design choices.
Author Information
Yonatan Belinkov (MIT)
Jim Glass (Massachusetts Institute of Technology)
More from the Same Authors
-
2021 Spotlight: PARP: Prune, Adjust and Re-Prune for Self-Supervised Speech Recognition »
Cheng-I Jeff Lai · Yang Zhang · Alexander Liu · Shiyu Chang · Yi-Lun Liao · Yung-Sung Chuang · Kaizhi Qian · Sameer Khurana · David Cox · Jim Glass -
2022 : PCFG-based Natural Language Interface Improves Generalization for Controlled Text Generation »
Jingyu Zhang · Jim Glass · Tianxing He -
2022 : PCFG-based Natural Language Interface Improves Generalization for Controlled Text Generation »
Jingyu Zhang · Jim Glass · Tianxing He -
2022 Poster: Measures of Information Reflect Memorization Patterns »
Rachit Bansal · Danish Pruthi · Yonatan Belinkov -
2022 Poster: Locating and Editing Factual Associations in GPT »
Kevin Meng · David Bau · Alex Andonian · Yonatan Belinkov -
2021 Poster: PARP: Prune, Adjust and Re-Prune for Self-Supervised Speech Recognition »
Cheng-I Jeff Lai · Yang Zhang · Alexander Liu · Shiyu Chang · Yi-Lun Liao · Yung-Sung Chuang · Kaizhi Qian · Sameer Khurana · David Cox · Jim Glass -
2018 : Coffee break + posters 2 »
Jan Kremer · Erik McDermott · Brandon Carter · Albert Zeyer · Andreas Krug · Paul Pu Liang · Katherine Lee · Dominika Basaj · Abelino Jimenez · Lisa Fan · Gautam Bhattacharya · Tzeviya S Fuchs · David Gifford · Loren Lugosch · Orhan Firat · Benjamin Baer · JAHANGIR ALAM · Jamin Shin · Mirco Ravanelli · Paul Smolensky · Zining Zhu · Hamid Eghbal-zadeh · Skyler Seto · Imran Sheikh · Joao Felipe Santos · Yonatan Belinkov · Nadir Durrani · Oiwi Parker Jones · Shuai Tang · André Merboldt · Titouan Parcollet · Wei-Ning Hsu · Krishna Pillutla · Ehsan Hosseini-Asl · Monica Dinculescu · Alexander Amini · Ying Zhang · Taoli Cheng · Alain Tapp -
2018 : Coffee break + posters 1 »
Samuel Myer · Wei-Ning Hsu · Jialu Li · Monica Dinculescu · Lea Schönherr · Ehsan Hosseini-Asl · Skyler Seto · Oiwi Parker Jones · Imran Sheikh · Thomas Manzini · Yonatan Belinkov · Nadir Durrani · Alexander Amini · Johanna Hansen · Gabi Shalev · Jamin Shin · Paul Smolensky · Lisa Fan · Zining Zhu · Hamid Eghbal-zadeh · Benjamin Baer · Abelino Jimenez · Joao Felipe Santos · Jan Kremer · Erik McDermott · Andreas Krug · Tzeviya S Fuchs · Shuai Tang · Brandon Carter · David Gifford · Albert Zeyer · André Merboldt · Krishna Pillutla · Katherine Lee · Titouan Parcollet · Orhan Firat · Gautam Bhattacharya · JAHANGIR ALAM · Mirco Ravanelli -
2018 Workshop: The second Conversational AI workshop – today's practice and tomorrow's potential »
Alborz Geramifard · Jason Williams · Larry Heck · Jim Glass · Milica Gasic · Dilek Hakkani-Tur · Steve Young · Lazaros Polymenakos · Y-Lan Boureau · Maxine Eskenazi -
2018 Poster: Unsupervised Cross-Modal Alignment of Speech and Text Embedding Spaces »
Yu-An Chung · Wei-Hung Weng · Schrasing Tong · Jim Glass -
2018 Spotlight: Unsupervised Cross-Modal Alignment of Speech and Text Embedding Spaces »
Yu-An Chung · Wei-Hung Weng · Schrasing Tong · Jim Glass -
2017 : Learning Word Embeddings from Speech »
Jim Glass · Yu-An Chung -
2017 Workshop: Conversational AI - today's practice and tomorrow's potential »
Alborz Geramifard · Jason Williams · Larry Heck · Jim Glass · Antoine Bordes · Steve Young · Gerald Tesauro