Timezone: »

 
Poster
Unsupervised Learning of Disentangled and Interpretable Representations from Sequential Data
Wei-Ning Hsu · Yu Zhang · James Glass

Tue Dec 05 06:30 PM -- 10:30 PM (PST) @ Pacific Ballroom #115

We present a factorized hierarchical variational autoencoder, which learns disentangled and interpretable representations from sequential data without supervision. Specifically, we exploit the multi-scale nature of information in sequential data by formulating it explicitly within a factorized hierarchical graphical model that imposes sequence-dependent priors and sequence-independent priors to different sets of latent variables. The model is evaluated on two speech corpora to demonstrate, qualitatively, its ability to transform speakers or linguistic content by manipulating different sets of latent variables; and quantitatively, its ability to outperform an i-vector baseline for speaker verification and reduce the word error rate by as much as 35% in mismatched train/test scenarios for automatic speech recognition tasks.

Author Information

Wei-Ning Hsu (Massachusetts Institute of Technology)
Yu Zhang (Google Brain)
James Glass (MIT CSAIL)

More from the Same Authors

  • 2022 Poster: u-HuBERT: Unified Mixed-Modal Speech Pretraining And Zero-Shot Transfer to Unlabeled Modality »
    Wei-Ning Hsu · Bowen Shi
  • 2021 Poster: Unsupervised Speech Recognition »
    Alexei Baevski · Wei-Ning Hsu · Alexis CONNEAU · Michael Auli
  • 2021 Oral: Unsupervised Speech Recognition »
    Alexei Baevski · Wei-Ning Hsu · Alexis CONNEAU · Michael Auli
  • 2020 : HUBERT: How much can a bad teacher benefit ASR pre-training? »
    Wei-Ning Hsu
  • 2020 : Text-Free Image-to-Speech Synthesis Using Learned Segmental Units »
    Wei-Ning Hsu
  • 2018 : Coffee break + posters 2 »
    Jan Kremer · Erik McDermott · Brandon Carter · Albert Zeyer · Andreas Krug · Paul Pu Liang · Katherine Lee · Dominika Basaj · Abelino Jimenez · Lisa Fan · Gautam Bhattacharya · Tzeviya S Fuchs · David Gifford · Loren Lugosch · Orhan Firat · Benjamin Baer · JAHANGIR ALAM · Jamin Shin · Mirco Ravanelli · Paul Smolensky · Zining Zhu · Hamid Eghbal-zadeh · Skyler Seto · Imran Sheikh · Joao Felipe Santos · Yonatan Belinkov · Nadir Durrani · Oiwi Parker Jones · Shuai Tang · André Merboldt · Titouan Parcollet · Wei-Ning Hsu · Krishna Pillutla · Ehsan Hosseini-Asl · Monica Dinculescu · Alexander Amini · Ying Zhang · Taoli Cheng · Alain Tapp
  • 2018 : Coffee break + posters 1 »
    Samuel Myer · Wei-Ning Hsu · Jialu Li · Monica Dinculescu · Lea Schönherr · Ehsan Hosseini-Asl · Skyler Seto · Oiwi Parker Jones · Imran Sheikh · Thomas Manzini · Yonatan Belinkov · Nadir Durrani · Alexander Amini · Johanna Hansen · Gabi Shalev · Jamin Shin · Paul Smolensky · Lisa Fan · Zining Zhu · Hamid Eghbal-zadeh · Benjamin Baer · Abelino Jimenez · Joao Felipe Santos · Jan Kremer · Erik McDermott · Andreas Krug · Tzeviya S Fuchs · Shuai Tang · Brandon Carter · David Gifford · Albert Zeyer · André Merboldt · Krishna Pillutla · Katherine Lee · Titouan Parcollet · Orhan Firat · Gautam Bhattacharya · JAHANGIR ALAM · Mirco Ravanelli
  • 2016 Poster: Unsupervised Learning of Spoken Language with Visual Context »
    David Harwath · Antonio Torralba · James Glass