`

Timezone: »

 
Poster
Analyzing Hidden Representations in End-to-End Automatic Speech Recognition Systems
Yonatan Belinkov · Jim Glass

Mon Dec 04 06:30 PM -- 10:30 PM (PST) @ Pacific Ballroom #85 #None

Neural networks have become ubiquitous in automatic speech recognition systems. While neural networks are typically used as acoustic models in more complex systems, recent studies have explored end-to-end speech recognition systems based on neural networks, which can be trained to directly predict text from input acoustic features. Although such systems are conceptually elegant and simpler than traditional systems, it is less obvious how to interpret the trained models. In this work, we analyze the speech representations learned by a deep end-to-end model that is based on convolutional and recurrent layers, and trained with a connectionist temporal classification (CTC) loss. We use a pre-trained model to generate frame-level features which are given to a classifier that is trained on frame classification into phones. We evaluate representations from different layers of the deep model and compare their quality for predicting phone labels. Our experiments shed light on important aspects of the end-to-end model such as layer depth, model complexity, and other design choices.

Author Information

Yonatan Belinkov (MIT)
Jim Glass (Massachusetts Institute of Technology)

More from the Same Authors

  • 2021 Spotlight: PARP: Prune, Adjust and Re-Prune for Self-Supervised Speech Recognition »
    Cheng-I Jeff Lai · Yang Zhang · Alexander Liu · Shiyu Chang · Yi-Lun Liao · Yung-Sung Chuang · Kaizhi Qian · Sameer Khurana · David Cox · Jim Glass
  • 2021 Poster: PARP: Prune, Adjust and Re-Prune for Self-Supervised Speech Recognition »
    Cheng-I Jeff Lai · Yang Zhang · Alexander Liu · Shiyu Chang · Yi-Lun Liao · Yung-Sung Chuang · Kaizhi Qian · Sameer Khurana · David Cox · Jim Glass
  • 2018 : Coffee break + posters 2 »
    Jan Kremer · Erik McDermott · Brandon Carter · Albert Zeyer · Andreas Krug · Paul Pu Liang · Katherine Lee · Dominika Basaj · Abelino Jimenez · Lisa Fan · Gautamb85 Bhattacharya · Tzeviya S Fuchs · David Gifford · Loren Lugosch · Orhan Firat · Ben Baer · JAHANGIR ALAM · Jay Shin · Mirco Ravanelli · Paul Smolensky · Zining Zhu · Hamid Eghbal-zadeh · Skyler Seto · Imran Sheikh · João Felipe Santos · Yonatan Belinkov · Nadir Durrani · Oiwi Parker Jones · Shuai Tang · André Merboldt · Titouan Parcollet · Wei-Ning Hsu · Krishna Pillutla · Ehsan Hosseini-Asl · Monica Dinculescu · Alexander Amini · Ying Zhang · Taoli Cheng · Alain Tapp
  • 2018 : Coffee break + posters 1 »
    Samuel Myer · Wei-Ning Hsu · Jialu Li · Monica Dinculescu · Lea Schönherr · Ehsan Hosseini-Asl · Skyler Seto · Oiwi Parker Jones · Imran Sheikh · Thomas Manzini · Yonatan Belinkov · Nadir Durrani · Alexander Amini · Johanna Hansen · Gabi Shalev · Jay Shin · Paul Smolensky · Lisa Fan · Zining Zhu · Hamid Eghbal-zadeh · Ben Baer · Abelino Jimenez · João Felipe Santos · Jan Kremer · Erik McDermott · Andreas Krug · Tzeviya S Fuchs · Shuai Tang · Brandon Carter · David Gifford · Albert Zeyer · André Merboldt · Krishna Pillutla · Katherine Lee · Titouan Parcollet · Orhan Firat · Gautamb85 Bhattacharya · JAHANGIR ALAM · Mirco Ravanelli
  • 2018 Workshop: The second Conversational AI workshop – today's practice and tomorrow's potential »
    Alborz Geramifard · Jason Williams · Larry Heck · Jim Glass · Milica Gasic · Dilek Hakkani-Tur · Steve Young · Lazaros Polymenakos · Y-Lan Boureau · Maxine Eskenazi
  • 2018 Poster: Unsupervised Cross-Modal Alignment of Speech and Text Embedding Spaces »
    Yu-An Chung · Wei-Hung Weng · Schrasing Tong · Jim Glass
  • 2018 Spotlight: Unsupervised Cross-Modal Alignment of Speech and Text Embedding Spaces »
    Yu-An Chung · Wei-Hung Weng · Schrasing Tong · Jim Glass
  • 2017 : Learning Word Embeddings from Speech »
    Jim Glass · Yu-An Chung
  • 2017 Workshop: Conversational AI - today's practice and tomorrow's potential »
    Alborz Geramifard · Jason Williams · Larry Heck · Jim Glass · Antoine Bordes · Steve Young · Gerald Tesauro