Timezone: »
The recently proposed Conformer model has become the de facto backbone model for various downstream speech tasks based on its hybrid attention-convolution architecture that captures both local and global features. However, through a series of systematic studies, we find that the Conformer architecture’s design choices are not optimal. After re-examining the design choices for both the macro and micro-architecture of Conformer, we propose Squeezeformer which consistently outperforms the state-of-the-art ASR models under the same training schemes. In particular, for the macro-architecture, Squeezeformer incorporates (i) the Temporal U-Net structure which reduces the cost of the multi-head attention modules on long sequences, and (ii) a simpler block structure of multi-head attention or convolution modules followed up by feed-forward module instead of the Macaron structure proposed in Conformer. Furthermore, for the micro-architecture, Squeezeformer (i) simplifies the activations in the convolutional block, (ii) removes redundant Layer Normalization operations, and (iii) incorporates an efficient depthwise down-sampling layer to efficiently sub-sample the input signal. Squeezeformer achieves state-of-the-art results of 7.5%, 6.5%, and 6.0% word-error-rate (WER) on LibriSpeech test-other without external language models, which are 3.1%, 1.4%, and 0.6% better than Conformer-CTC with the same number of FLOPs. Our code is open-sourced and available online.
Author Information
Sehoon Kim (University of California Berkeley)
Amir Gholami (University of California, Berkeley)
Albert Shaw (Google)
Nicholas Lee (University of California, Berkeley)
Karttikeya Mangalam (UC Berkeley (BAIR))
I’m a first year PhD student in Computer Science at the Department of Electrical Engineering & Computer Sciences (EECS) at University of California, Berkeley where I’m jointly advised by Prof. Jitendra Malik and Prof. Yi Ma.
Jitendra Malik (University of California at Berkley)
Michael Mahoney (UC Berkeley)
Kurt Keutzer (EECS, UC Berkeley)
More from the Same Authors
-
2021 Spotlight: Newton-LESS: Sparsification without Trade-offs for the Sketched Newton Update »
Michal Derezinski · Jonathan Lacotte · Mert Pilanci · Michael Mahoney -
2022 : Multi-skill Mobile Manipulation for Object Rearrangement »
Jiayuan Gu · Devendra Singh Chaplot · Hao Su · Jitendra Malik -
2022 : A Fast, Fisher Based Pruning of Transformers without Retraining »
Amir Gholami -
2022 Poster: K-LITE: Learning Transferable Visual Models with External Knowledge »
Sheng Shen · Chunyuan Li · Xiaowei Hu · Yujia Xie · Jianwei Yang · Pengchuan Zhang · Zhe Gan · Lijuan Wang · Lu Yuan · Ce Liu · Kurt Keutzer · Trevor Darrell · Anna Rohrbach · Jianfeng Gao -
2022 Poster: A Fast Post-Training Pruning Framework for Transformers »
Woosuk Kwon · Sehoon Kim · Michael Mahoney · Joseph Hassoun · Kurt Keutzer · Amir Gholami -
2022 Poster: Bringing Image Scene Structure to Video via Frame-Clip Consistency of Object Tokens »
Elad Ben Avraham · Roei Herzig · Karttikeya Mangalam · Amir Bar · Anna Rohrbach · Leonid Karlinsky · Trevor Darrell · Amir Globerson -
2022 Poster: LSAR: Efficient Leverage Score Sampling Algorithm for the Analysis of Big Time Series Data »
Ali Eshragh · Fred Roosta · Asef Nazari · Michael Mahoney -
2021 : Habitat 2.0: Training Home Assistants to Rearrange their Habitat »
Andrew Szot · Alexander Clegg · Eric Undersander · Erik Wijmans · Yili Zhao · Noah Maestre · Mustafa Mukadam · Oleksandr Maksymets · Aaron Gokaslan · Sameer Dharur · Franziska Meier · Wojciech Galuba · Angel Chang · Zsolt Kira · Vladlen Koltun · Jitendra Malik · Manolis Savva · Dhruv Batra -
2021 : Habitat 2.0: Training Home Assistants to Rearrange their Habitat »
Andrew Szot · Alexander Clegg · Eric Undersander · Erik Wijmans · Yili Zhao · Noah Maestre · Mustafa Mukadam · Oleksandr Maksymets · Aaron Gokaslan · Sameer Dharur · Franziska Meier · Wojciech Galuba · Angel Chang · Zsolt Kira · Vladlen Koltun · Jitendra Malik · Manolis Savva · Dhruv Batra -
2021 : Q&A with Michael Mahoney »
Michael Mahoney -
2021 : Putting Randomized Matrix Algorithms in LAPACK, and Connections with Second-order Stochastic Optimization, Michael Mahoney »
Michael Mahoney -
2021 Poster: Newton-LESS: Sparsification without Trade-offs for the Sketched Newton Update »
Michal Derezinski · Jonathan Lacotte · Mert Pilanci · Michael Mahoney -
2021 Poster: Terra: Imperative-Symbolic Co-Execution of Imperative Deep Learning Programs »
Taebum Kim · Eunji Jeong · Geon-Woo Kim · Yunmo Koo · Sehoon Kim · Gyeongin Yu · Byung-Gon Chun -
2021 Poster: Noisy Recurrent Neural Networks »
Soon Hoe Lim · N. Benjamin Erichson · Liam Hodgkinson · Michael Mahoney -
2021 Poster: Hessian Eigenspectra of More Realistic Nonlinear Models »
Zhenyu Liao · Michael Mahoney -
2021 Poster: Characterizing possible failure modes in physics-informed neural networks »
Aditi Krishnapriyan · Amir Gholami · Shandian Zhe · Robert Kirby · Michael Mahoney -
2021 Poster: Taxonomizing local versus global structure in neural network loss landscapes »
Yaoqing Yang · Liam Hodgkinson · Ryan Theisen · Joe Zou · Joseph Gonzalez · Kannan Ramchandran · Michael Mahoney -
2021 Poster: Stateful ODE-Nets using Basis Function Expansions »
Alejandro Queiruga · N. Benjamin Erichson · Liam Hodgkinson · Michael Mahoney -
2021 Oral: Hessian Eigenspectra of More Realistic Nonlinear Models »
Zhenyu Liao · Michael Mahoney -
2020 : QA: Jitendra Malik »
Jitendra Malik -
2020 : Invited Talk: Jitendra Malik »
Jitendra Malik -
2020 Poster: Boundary thickness and robustness in learning models »
Yaoqing Yang · Rajiv Khanna · Yaodong Yu · Amir Gholami · Kurt Keutzer · Joseph Gonzalez · Kannan Ramchandran · Michael Mahoney -
2020 Poster: Debiasing Distributed Second Order Optimization with Surrogate Sketching and Scaled Regularization »
Michal Derezinski · Burak Bartan · Mert Pilanci · Michael Mahoney -
2020 Poster: HAWQ-V2: Hessian Aware trace-Weighted Quantization of Neural Networks »
Zhen Dong · Zhewei Yao · Daiyaan Arfeen · Amir Gholami · Michael Mahoney · Kurt Keutzer -
2020 Poster: Exact expressions for double descent and implicit regularization via surrogate random design »
Michal Derezinski · Feynman Liang · Michael Mahoney -
2020 Poster: Improved guarantees and a multiple-descent curve for Column Subset Selection and the Nystrom method »
Michal Derezinski · Rajiv Khanna · Michael Mahoney -
2020 Poster: Precise expressions for random projections: Low-rank approximation and randomized Newton »
Michal Derezinski · Feynman Liang · Zhenyu Liao · Michael Mahoney -
2020 Oral: Improved guarantees and a multiple-descent curve for Column Subset Selection and the Nystrom method »
Michal Derezinski · Rajiv Khanna · Michael Mahoney -
2020 Poster: A random matrix analysis of random Fourier features: beyond the Gaussian kernel, a precise phase transition, and the corresponding double descent »
Zhenyu Liao · Romain Couillet · Michael Mahoney -
2020 Poster: A Statistical Framework for Low-bitwidth Training of Deep Neural Networks »
Jianfei Chen · Yu Gai · Zhewei Yao · Michael Mahoney · Joseph Gonzalez -
2020 Poster: 3D Shape Reconstruction from Vision and Touch »
Edward Smith · Roberto Calandra · Adriana Romero · Georgia Gkioxari · David Meger · Jitendra Malik · Michal Drozdzal -
2019 : Final remarks »
Anastasios Kyrillidis · Albert Berahas · Fred Roosta · Michael Mahoney -
2019 Workshop: Beyond first order methods in machine learning systems »
Anastasios Kyrillidis · Albert Berahas · Fred Roosta · Michael Mahoney -
2019 : Opening Remarks »
Anastasios Kyrillidis · Albert Berahas · Fred Roosta · Michael Mahoney -
2019 Poster: ANODEV2: A Coupled Neural ODE Framework »
Tianjun Zhang · Zhewei Yao · Amir Gholami · Joseph Gonzalez · Kurt Keutzer · Michael Mahoney · George Biros -
2019 Poster: Distributed estimation of the inverse Hessian by determinantal averaging »
Michal Derezinski · Michael Mahoney -
2019 Poster: Multi-source Domain Adaptation for Semantic Segmentation »
Sicheng Zhao · Bo Li · Xiangyu Yue · Yang Gu · Pengfei Xu · Runbo Hu · Hua Chai · Kurt Keutzer -
2019 Poster: Approximate Feature Collisions in Neural Nets »
Ke Li · Tianhao Zhang · Jitendra Malik -
2018 : Talk 3: Jitendra Malik - Linking Perception and Action »
Jitendra Malik -
2018 : Prof. Kurt Keutzer »
Kurt Keutzer -
2018 Poster: GIANT: Globally Improved Approximate Newton Method for Distributed Optimization »
Shusen Wang · Fred Roosta · Peng Xu · Michael Mahoney -
2018 Poster: Visual Memory for Robust Path Following »
Ashish Kumar · Saurabh Gupta · David Fouhey · Sergey Levine · Jitendra Malik -
2018 Oral: Visual Memory for Robust Path Following »
Ashish Kumar · Saurabh Gupta · David Fouhey · Sergey Levine · Jitendra Malik -
2018 Poster: Hessian-based Analysis of Large Batch Training and Robustness to Adversaries »
Zhewei Yao · Amir Gholami · Qi Lei · Kurt Keutzer · Michael Mahoney -
2017 : Poster Session (encompasses coffee break) »
Beidi Chen · Borja Balle · Daniel Lee · iuri frosio · Jitendra Malik · Jan Kautz · Ke Li · Masashi Sugiyama · Miguel A. Carreira-Perpinan · Ramin Raziperchikolaei · Theja Tulabandhula · Yung-Kyun Noh · Adams Wei Yu -
2017 Poster: Learning a Multi-View Stereo Machine »
Abhishek Kar · Christian Häne · Jitendra Malik -
2016 : Kurt Keutzer: High-Performance Deep Learning »
Kurt Keutzer -
2016 Poster: Feature-distributed sparse regression: a screen-and-clean approach »
Jiyan Yang · Michael Mahoney · Michael Saunders · Yuekai Sun -
2016 Poster: Sub-sampled Newton Methods with Non-uniform Sampling »
Peng Xu · Jiyan Yang · Farbod Roosta-Khorasani · Christopher Ré · Michael Mahoney -
2015 : Challenges in Multiresolution Methods for Graph-based Learning »
Michael Mahoney -
2015 : Using Local Spectral Methods in Theory and in Practice »
Michael Mahoney -
2015 Poster: Fast Randomized Kernel Ridge Regression with Statistical Guarantees »
Ahmed Alaoui · Michael Mahoney -
2013 Workshop: Large Scale Matrix Analysis and Inference »
Reza Zadeh · Gunnar Carlsson · Michael Mahoney · Manfred K. Warmuth · Wouter M Koolen · Nati Srebro · Satyen Kale · Malik Magdon-Ismail · Ashish Goel · Matei A Zaharia · David Woodruff · Ioannis Koutis · Benjamin Recht