Timezone: »

Interpretability and Robustness in Audio, Speech, and Language
Mirco Ravanelli · Dmitriy Serdyuk · Ehsan Variani · Bhuvana Ramabhadran

Sat Dec 08 05:00 AM -- 03:30 PM (PST) @ Room 513DEF
Event URL: https://irasl.gitlab.io »

Domains of natural and spoken language processing have a rich history deeply rooted in information theory, statistics, digital signal processing and machine learning. With the rapid rise of deep learning (“deep learning revolution”), many of these systematic approaches have been replaced by variants of deep neural methods, that often achieve unprecedented performance levels in many fields. With more and more of the spoken language processing pipeline being replaced by sophisticated neural layers, feature extraction, adaptation, noise robustness are learnt inherently within the network. More recently, end-to-end frameworks that learn a mapping from speech (audio) to target labels (words, phones, graphemes, sub-word units, etc.) are becoming increasingly popular across the board in speech processing in tasks ranging from speech recognition, speaker identification, language/dialect identification, multilingual speech processing, code switching, natural language processing, speech synthesis and much much more.

A key aspect behind the success of deep learning lies in the discovered low and high-level representations, that can potentially capture relevant underlying structure in the training data. In the NLP domain, for instance, researchers have mapped word and sentence embeddings to semantic and syntactic similarity and argued that the models capture latent representations of meaning. Nevertheless, some recent works on adversarial examples have shown that it is possible to easily fool a neural network (such as a speech recognizer or a speaker verification system) by just adding a small amount of specially constructed noise. Such a remarkable sensibility towards adversarial attacks highlights how superficial the discovered representations could be, rising crucial concerns on the actual robustness, security, and interpretability of modern deep neural networks. This weakness naturally leads researchers to ask very crucial questions on what these models are really learning, how we can interpret what they have learned, and how the representations provided by current neural networks can be revealed or explained in a fashion that modeling power can be enhanced further. These open questions have recently raised the interest towards interpretability of deep models, as witness by the numerous works recently published on this topic in all the major machine learning conferences. Moreover, some workshops at NIPS 2016, NIPS 2017 and Interspeech 2017 have promoted research and discussion around this important issue.
With our initiative, we wish to further foster some progresses on interpretability and robustness of modern deep learning techniques, with a particular focus on audio, speech and NLP technologies. The workshop will also analyze the connection between deep learning and models developed earlier for machine learning, linguistic analysis, signal processing, and speech recognition. This way we hope to encourage a discussion amongst experts and practitioners in these
areas with the expectation of understanding these models better and allowing to build upon the existing collective expertise.

The workshop will feature invited talks, panel discussions, as well as oral and poster contributed presentations. We welcome papers that specifically address one or more of the leading questions listed below:
1. Is there a theoretical/linguistic motivation/analysis that can explain how nets encapsulate the structure of the training data it learns from?
2. Does the visualization of this information (MDS, t-SNE) offer any insights to creating a better model?
3. How can we design more powerful networks with simpler architectures?
4. How can we can exploit adversarial examples to improve the system robustness?
5. Do alternative methods offer any complimentary modeling power to what the networks can memorize?
6. Can we explain the path of inference?
7. How do we analyze data requirements for a given model? How does multilingual data improves learning power?

Author Information

Mirco Ravanelli (Montreal Istitute for Learning Algorithms)

I received my master's degree in Telecommunications Engineering (full marks and honours) from the University of Trento, Italy in 2011. I then joined the SHINE research group (led by Prof. Maurizio Omologo) of the Bruno Kessler Foundation (FBK), contributing to some projects on distant-talking speech recognition in noisy and reverberant environments, such as DIRHA and DOMHOS. In 2013 I was visiting researcher at the International Computer Science Institute (University of California, Berkeley) working on deep neural networks for large-vocabulary speech recognition in the context of the IARPA BABEL project (led by Prof. Nelson Morgan). I received my PhD (with cum laude distinction) in Information and Communication Technology from the University of Trento in December 2017. During my PhD I worked on “deep learning for distant speech recognition”, with a particular focus on recurrent and cooperative neural networks (see my PhD thesis here). In the context of my PhD I recently spent 6 months in the MILA lab led by Prof. Yoshua Bengio. I'm currently a post-doc researcher at the University of Montreal, working on deep learning for speech recognition in the MILA Lab.

Dmitriy Serdyuk (MILA)
Ehsan Variani (Google)

I am a Staff Research Scientist in Google. My main research focus is statistical and machine learning and information theory with focus on speech and language recognition.

Bhuvana Ramabhadran (Google)

Bhuvana Ramabhadran (IEEE Fellow, 2017, ISCA Fellow 2017) currently leads a team of researchers in Google, focussing on multilingual speech recognition and synthesis. Previously, she was a Distinguished Research Staff Member and Manager in IBM Research AI, at the IBM T. J. Watson Research Center, Yorktown Heights, NY, USA, where she led a team of researchers in the Speech Technologies Group and coordinated activities across IBM's world­wide laboratories in the areas of speech recognition, synthesis, and spoken term detection. She was the elected Chair of the IEEE SLTC (2014–2016), Area Chair for ICASSP (2011–2018) and Interspeech (2012–2016), was on the editorial board of the IEEE Transactions on Audio, Speech, and Language Processing (2011–2015), and is currently an ISCA board member. She has published over 150 papers and been granted over 40 U.S. patents. Her research interests include speech recognition and synthesis algorithms, statistical modeling, signal processing, and machine learning.

More from the Same Authors

  • 2021 : Timers and Such: A Practical Benchmark for Spoken Language Understanding with Numbers »
    Loren Lugosch · Piyush Papreja · Mirco Ravanelli · Abdelwahab HEBA · Titouan Parcollet
  • 2020 : Invited talk - Towards robust self-supervised learning of speech representations »
    Mirco Ravanelli
  • 2020 : Invited talk - A Broad Perspective into Self Supervised Learning for Speech Recognition »
    Bhuvana Ramabhadran
  • 2019 : Poster Session »
    Jonathan Scarlett · Piotr Indyk · Ali Vakilian · Adrian Weller · Partha P Mitra · Benjamin Aubin · Bruno Loureiro · Florent Krzakala · Lenka Zdeborová · Kristina Monakhova · Joshua Yurtsever · Laura Waller · Hendrik Sommerhoff · Michael Moeller · Rushil Anirudh · Shuang Qiu · Xiaohan Wei · Zhuoran Yang · Jayaraman Thiagarajan · Salman Asif · Michael Gillhofer · Johannes Brandstetter · Sepp Hochreiter · Felix Petersen · Dhruv Patel · Assad Oberai · Akshay Kamath · Sushrut Karmalkar · Eric Price · Ali Ahmed · Zahra Kadkhodaie · Sreyas Mohan · Eero Simoncelli · Carlos Fernandez-Granda · Oscar Leong · Wesam Sakla · Rebecca Willett · Stephan Hoyer · Jascha Sohl-Dickstein · Samuel Greydanus · Gauri Jagatap · Chinmay Hegde · Michael Kellman · Jonathan Tamir · Nouamane Laanait · Ousmane Dia · Mirco Ravanelli · Jonathan Binas · Negar Rostamzadeh · Shirin Jalali · Tiantian Fang · Alex Schwing · Sébastien Lachapelle · Philippe Brouillard · Tristan Deleu · Simon Lacoste-Julien · Stella Yu · Arya Mazumdar · Ankit Singh Rawat · Yue Zhao · Jianshu Chen · Xiaoyang Li · Hubert Ramsauer · Gabrio Rizzuti · Nikolaos Mitsakos · Dingzhou Cao · Thomas Strohmer · Yang Li · Pei Peng · Gregory Ongie
  • 2018 : Coffee break + posters 2 »
    Jan Kremer · Erik McDermott · Brandon Carter · Albert Zeyer · Andreas Krug · Paul Pu Liang · Katherine Lee · Dominika Basaj · Abelino Jimenez · Lisa Fan · Gautam Bhattacharya · Tzeviya S Fuchs · David Gifford · Loren Lugosch · Orhan Firat · Benjamin Baer · JAHANGIR ALAM · Jamin Shin · Mirco Ravanelli · Paul Smolensky · Zining Zhu · Hamid Eghbal-zadeh · Skyler Seto · Imran Sheikh · Joao Felipe Santos · Yonatan Belinkov · Nadir Durrani · Oiwi Parker Jones · Shuai Tang · André Merboldt · Titouan Parcollet · Wei-Ning Hsu · Krishna Pillutla · Ehsan Hosseini-Asl · Monica Dinculescu · Alexander Amini · Ying Zhang · Taoli Cheng · Alain Tapp
  • 2018 : Mirco Ravanelli, "Interpretable convolutional filters with SincNet" »
    Mirco Ravanelli
  • 2018 : Coffee break + posters 1 »
    Samuel Myer · Wei-Ning Hsu · Jialu Li · Monica Dinculescu · Lea Schönherr · Ehsan Hosseini-Asl · Skyler Seto · Oiwi Parker Jones · Imran Sheikh · Thomas Manzini · Yonatan Belinkov · Nadir Durrani · Alexander Amini · Johanna Hansen · Gabi Shalev · Jamin Shin · Paul Smolensky · Lisa Fan · Zining Zhu · Hamid Eghbal-zadeh · Benjamin Baer · Abelino Jimenez · Joao Felipe Santos · Jan Kremer · Erik McDermott · Andreas Krug · Tzeviya S Fuchs · Shuai Tang · Brandon Carter · David Gifford · Albert Zeyer · André Merboldt · Krishna Pillutla · Katherine Lee · Titouan Parcollet · Orhan Firat · Gautam Bhattacharya · JAHANGIR ALAM · Mirco Ravanelli
  • 2018 : Workshop Opening »
    Mirco Ravanelli · Dmitriy Serdyuk · Ehsan Variani · Bhuvana Ramabhadran