Timezone: »
Deep learning is currently playing a crucial role toward higher levels of artificial intelligence. This paradigm allows neural networks to learn complex and abstract representations, that are progressively obtained by combining simpler ones. Nevertheless, the internal "black-box" representations automatically discovered by current neural architectures often suffer from a lack of interpretability, making of primary interest the study of explainable machine learning techniques. This paper summarizes our recent efforts to develop a more interpretable neural model for directly processing speech from the raw waveform. In particular, we propose SincNet, a novel Convolutional Neural Network (CNN) that encourages the first layer to discover more meaningful filters by exploiting parametrized sinc functions. In contrast to standard CNNs, which learn all the elements of each filter, only low and high cutoff frequencies of band-pass filters are directly learned from data. This inductive bias offers a very compact way to derive a customized filter-bank front-end, that only depends on some parameters with a clear physical meaning. Our experiments, conducted on both speaker and speech recognition, show that the proposed architecture converges faster, performs better, and is more interpretable than standard CNNs.
Author Information
Mirco Ravanelli (Montreal Istitute for Learning Algorithms)
I received my master's degree in Telecommunications Engineering (full marks and honours) from the University of Trento, Italy in 2011. I then joined the SHINE research group (led by Prof. Maurizio Omologo) of the Bruno Kessler Foundation (FBK), contributing to some projects on distant-talking speech recognition in noisy and reverberant environments, such as DIRHA and DOMHOS. In 2013 I was visiting researcher at the International Computer Science Institute (University of California, Berkeley) working on deep neural networks for large-vocabulary speech recognition in the context of the IARPA BABEL project (led by Prof. Nelson Morgan). I received my PhD (with cum laude distinction) in Information and Communication Technology from the University of Trento in December 2017. During my PhD I worked on “deep learning for distant speech recognition”, with a particular focus on recurrent and cooperative neural networks (see my PhD thesis here). In the context of my PhD I recently spent 6 months in the MILA lab led by Prof. Yoshua Bengio. I'm currently a post-doc researcher at the University of Montreal, working on deep learning for speech recognition in the MILA Lab.
More from the Same Authors
-
2021 : Timers and Such: A Practical Benchmark for Spoken Language Understanding with Numbers »
Loren Lugosch · Piyush Papreja · Mirco Ravanelli · Abdelwahab HEBA · Titouan Parcollet -
2020 : Invited talk - Towards robust self-supervised learning of speech representations »
Mirco Ravanelli -
2019 : Poster Session »
Jonathan Scarlett · Piotr Indyk · Ali Vakilian · Adrian Weller · Partha P Mitra · Benjamin Aubin · Bruno Loureiro · Florent Krzakala · Lenka Zdeborová · Kristina Monakhova · Joshua Yurtsever · Laura Waller · Hendrik Sommerhoff · Michael Moeller · Rushil Anirudh · Shuang Qiu · Xiaohan Wei · Zhuoran Yang · Jayaraman Thiagarajan · Salman Asif · Michael Gillhofer · Johannes Brandstetter · Sepp Hochreiter · Felix Petersen · Dhruv Patel · Assad Oberai · Akshay Kamath · Sushrut Karmalkar · Eric Price · Ali Ahmed · Zahra Kadkhodaie · Sreyas Mohan · Eero Simoncelli · Carlos Fernandez-Granda · Oscar Leong · Wesam Sakla · Rebecca Willett · Stephan Hoyer · Jascha Sohl-Dickstein · Sam Greydanus · Gauri Jagatap · Chinmay Hegde · Michael Kellman · Jonathan Tamir · Nouamane Laanait · Ousmane Dia · Mirco Ravanelli · Jonathan Binas · Negar Rostamzadeh · Shirin Jalali · Tiantian Fang · Alex Schwing · Sébastien Lachapelle · Philippe Brouillard · Tristan Deleu · Simon Lacoste-Julien · Stella Yu · Arya Mazumdar · Ankit Singh Rawat · Yue Zhao · Jianshu Chen · Xiaoyang Li · Hubert Ramsauer · Gabrio Rizzuti · Nikolaos Mitsakos · Dingzhou Cao · Thomas Strohmer · Yang Li · Pei Peng · Gregory Ongie -
2018 : Coffee break + posters 2 »
Jan Kremer · Erik McDermott · Brandon Carter · Albert Zeyer · Andreas Krug · Paul Pu Liang · Katherine Lee · Dominika Basaj · Abelino Jimenez · Lisa Fan · Gautam Bhattacharya · Tzeviya S Fuchs · David Gifford · Loren Lugosch · Orhan Firat · Benjamin Baer · JAHANGIR ALAM · Jamin Shin · Mirco Ravanelli · Paul Smolensky · Zining Zhu · Hamid Eghbal-zadeh · Skyler Seto · Imran Sheikh · Joao Felipe Santos · Yonatan Belinkov · Nadir Durrani · Oiwi Parker Jones · Shuai Tang · André Merboldt · Titouan Parcollet · Wei-Ning Hsu · Krishna Pillutla · Ehsan Hosseini-Asl · Monica Dinculescu · Alexander Amini · Ying Zhang · Taoli Cheng · Alain Tapp -
2018 : Coffee break + posters 1 »
Samuel Myer · Wei-Ning Hsu · Jialu Li · Monica Dinculescu · Lea Schönherr · Ehsan Hosseini-Asl · Skyler Seto · Oiwi Parker Jones · Imran Sheikh · Thomas Manzini · Yonatan Belinkov · Nadir Durrani · Alexander Amini · Johanna Hansen · Gabi Shalev · Jamin Shin · Paul Smolensky · Lisa Fan · Zining Zhu · Hamid Eghbal-zadeh · Benjamin Baer · Abelino Jimenez · Joao Felipe Santos · Jan Kremer · Erik McDermott · Andreas Krug · Tzeviya S Fuchs · Shuai Tang · Brandon Carter · David Gifford · Albert Zeyer · André Merboldt · Krishna Pillutla · Katherine Lee · Titouan Parcollet · Orhan Firat · Gautam Bhattacharya · JAHANGIR ALAM · Mirco Ravanelli -
2018 : Workshop Opening »
Mirco Ravanelli · Dmitriy Serdyuk · Ehsan Variani · Bhuvana Ramabhadran -
2018 Workshop: Interpretability and Robustness in Audio, Speech, and Language »
Mirco Ravanelli · Dmitriy Serdyuk · Ehsan Variani · Bhuvana Ramabhadran