Timezone: »
The recently proposed Conformer model has become the de facto backbone model for various downstream speech tasks based on its hybrid attention-convolution architecture that captures both local and global features. However, through a series of systematic studies, we find that the Conformer architecture’s design choices are not optimal. After re-examining the design choices for both the macro and micro-architecture of Conformer, we propose Squeezeformer which consistently outperforms the state-of-the-art ASR models under the same training schemes. In particular, for the macro-architecture, Squeezeformer incorporates (i) the Temporal U-Net structure which reduces the cost of the multi-head attention modules on long sequences, and (ii) a simpler block structure of multi-head attention or convolution modules followed up by feed-forward module instead of the Macaron structure proposed in Conformer. Furthermore, for the micro-architecture, Squeezeformer (i) simplifies the activations in the convolutional block, (ii) removes redundant Layer Normalization operations, and (iii) incorporates an efficient depthwise down-sampling layer to efficiently sub-sample the input signal. Squeezeformer achieves state-of-the-art results of 7.5%, 6.5%, and 6.0% word-error-rate (WER) on LibriSpeech test-other without external language models, which are 3.1%, 1.4%, and 0.6% better than Conformer-CTC with the same number of FLOPs. Our code is open-sourced and available online.
Author Information
Sehoon Kim (University of California Berkeley)
Amir Gholami (University of California, Berkeley)
Albert Shaw (Google)
Nicholas Lee (University of California, Berkeley)
Karttikeya Mangalam (UC Berkeley (BAIR))
I’m a first year PhD student in Computer Science at the Department of Electrical Engineering & Computer Sciences (EECS) at University of California, Berkeley where I’m jointly advised by Prof. Jitendra Malik and Prof. Yi Ma.
Jitendra Malik (University of California at Berkley)
Michael Mahoney (UC Berkeley)
Kurt Keutzer (EECS, UC Berkeley)
More from the Same Authors
-
2021 Spotlight: Newton-LESS: Sparsification without Trade-offs for the Sketched Newton Update »
Michal Derezinski · Jonathan Lacotte · Mert Pilanci · Michael Mahoney -
2022 : Multi-skill Mobile Manipulation for Object Rearrangement »
Jiayuan Gu · Devendra Singh Chaplot · Hao Su · Jitendra Malik -
2023 Poster: Characterizing Scaling and Transfer Learning of Neural Networks for Scientific Machine Learning »
Shashank Subramanian · Peter Harrington · Kurt Keutzer · Wahid Bhimji · Dmitriy Morozov · Michael Mahoney · Amir Gholami -
2023 Poster: Temperature Balancing, Layer-wise Weight Analysis, and Neural Network Training »
Yefan Zhou · TIANYU PANG · Keqin Liu · charles martin · Michael Mahoney · Yaoqing Yang -
2023 Poster: When are ensembles really effective? »
Ryan Theisen · Hyunsuk Kim · Yaoqing Yang · Liam Hodgkinson · Michael Mahoney -
2023 Poster: A Heavy-Tailed Algebra for Probabilistic Programming »
Feynman Liang · Liam Hodgkinson · Michael Mahoney -
2023 Poster: MAViL: Masked Audio-Video Learners »
Po-Yao Huang · Vasu Sharma · Hu Xu · Chaitanya Ryali · haoqi fan · Yanghao Li · Shang-Wen Li · Gargi Ghosh · Jitendra Malik · Christoph Feichtenhofer -
2023 Poster: Big Little Transformer Decoder »
Sehoon Kim · Karttikeya Mangalam · Suhong Moon · John Canny · Jitendra Malik · Michael Mahoney · Amir Gholami · Kurt Keutzer -
2023 Poster: Where are we in the search for an Artificial Visual Cortex for Embodied Intelligence? »
Arjun Majumdar · Karmesh Yadav · Sergio Arnaud · Jason Yecheng Ma · Claire Chen · Sneha Silwal · Aryan Jain · Vincent-Pierre Berges · Tingfan Wu · Jay Vakil · Pieter Abbeel · Jitendra Malik · Dhruv Batra · Yixin Lin · Oleksandr Maksymets · Aravind Rajeswaran · Franziska Meier -
2023 Poster: Language Models are Visual Reasoning Coordinators »
Liangyu Chen · Bo Li · Sheng Shen · Jingkang Yang · Chunyuan Li · Kurt Keutzer · Trevor Darrell · Ziwei Liu -
2023 Poster: EgoSchema: A Diagnostic Benchmark for Very Long-form Video Language Understanding »
Karttikeya Mangalam · Raiymbek Akshulakov · Jitendra Malik -
2023 Workshop: Touch Processing: a new Sensing Modality for AI »
Roberto Calandra · Haozhi Qi · Mike Lambeta · Perla Maiolino · Yasemin Bekiroglu · Jitendra Malik -
2023 Workshop: Heavy Tails in ML: Structure, Stability, Dynamics »
Mert Gurbuzbalaban · Stefanie Jegelka · Michael Mahoney · Umut Simsekli -
2022 : A Fast, Fisher Based Pruning of Transformers without Retraining »
Amir Gholami -
2022 Poster: K-LITE: Learning Transferable Visual Models with External Knowledge »
Sheng Shen · Chunyuan Li · Xiaowei Hu · Yujia Xie · Jianwei Yang · Pengchuan Zhang · Zhe Gan · Lijuan Wang · Lu Yuan · Ce Liu · Kurt Keutzer · Trevor Darrell · Anna Rohrbach · Jianfeng Gao -
2022 Poster: A Fast Post-Training Pruning Framework for Transformers »
Woosuk Kwon · Sehoon Kim · Michael Mahoney · Joseph Hassoun · Kurt Keutzer · Amir Gholami -
2022 Poster: Bringing Image Scene Structure to Video via Frame-Clip Consistency of Object Tokens »
Elad Ben Avraham · Roei Herzig · Karttikeya Mangalam · Amir Bar · Anna Rohrbach · Leonid Karlinsky · Trevor Darrell · Amir Globerson -
2022 Poster: LSAR: Efficient Leverage Score Sampling Algorithm for the Analysis of Big Time Series Data »
Ali Eshragh · Fred Roosta · Asef Nazari · Michael Mahoney -
2021 : Habitat 2.0: Training Home Assistants to Rearrange their Habitat »
Andrew Szot · Alexander Clegg · Eric Undersander · Erik Wijmans · Yili Zhao · Noah Maestre · Mustafa Mukadam · Oleksandr Maksymets · Aaron Gokaslan · Sameer Dharur · Franziska Meier · Wojciech Galuba · Angel Chang · Zsolt Kira · Vladlen Koltun · Jitendra Malik · Manolis Savva · Dhruv Batra -
2021 : Habitat 2.0: Training Home Assistants to Rearrange their Habitat »
Andrew Szot · Alexander Clegg · Eric Undersander · Erik Wijmans · Yili Zhao · Noah Maestre · Mustafa Mukadam · Oleksandr Maksymets · Aaron Gokaslan · Sameer Dharur · Franziska Meier · Wojciech Galuba · Angel Chang · Zsolt Kira · Vladlen Koltun · Jitendra Malik · Manolis Savva · Dhruv Batra -
2021 : Q&A with Michael Mahoney »
Michael Mahoney -
2021 : Putting Randomized Matrix Algorithms in LAPACK, and Connections with Second-order Stochastic Optimization, Michael Mahoney »
Michael Mahoney -
2021 Poster: Newton-LESS: Sparsification without Trade-offs for the Sketched Newton Update »
Michal Derezinski · Jonathan Lacotte · Mert Pilanci · Michael Mahoney -
2021 Poster: Terra: Imperative-Symbolic Co-Execution of Imperative Deep Learning Programs »
Taebum Kim · Eunji Jeong · Geon-Woo Kim · Yunmo Koo · Sehoon Kim · Gyeongin Yu · Byung-Gon Chun -
2021 Poster: Noisy Recurrent Neural Networks »
Soon Hoe Lim · N. Benjamin Erichson · Liam Hodgkinson · Michael Mahoney -
2021 Poster: Hessian Eigenspectra of More Realistic Nonlinear Models »
Zhenyu Liao · Michael Mahoney -
2021 Poster: Characterizing possible failure modes in physics-informed neural networks »
Aditi Krishnapriyan · Amir Gholami · Shandian Zhe · Robert Kirby · Michael Mahoney -
2021 Poster: Taxonomizing local versus global structure in neural network loss landscapes »
Yaoqing Yang · Liam Hodgkinson · Ryan Theisen · Joe Zou · Joseph Gonzalez · Kannan Ramchandran · Michael Mahoney -
2021 Poster: Stateful ODE-Nets using Basis Function Expansions »
Alejandro Queiruga · N. Benjamin Erichson · Liam Hodgkinson · Michael Mahoney -
2021 Oral: Hessian Eigenspectra of More Realistic Nonlinear Models »
Zhenyu Liao · Michael Mahoney -
2020 : QA: Jitendra Malik »
Jitendra Malik -
2020 : Invited Talk: Jitendra Malik »
Jitendra Malik -
2020 Poster: Boundary thickness and robustness in learning models »
Yaoqing Yang · Rajiv Khanna · Yaodong Yu · Amir Gholami · Kurt Keutzer · Joseph Gonzalez · Kannan Ramchandran · Michael Mahoney -
2020 Poster: Debiasing Distributed Second Order Optimization with Surrogate Sketching and Scaled Regularization »
Michal Derezinski · Burak Bartan · Mert Pilanci · Michael Mahoney -
2020 Poster: HAWQ-V2: Hessian Aware trace-Weighted Quantization of Neural Networks »
Zhen Dong · Zhewei Yao · Daiyaan Arfeen · Amir Gholami · Michael Mahoney · Kurt Keutzer -
2020 Poster: Exact expressions for double descent and implicit regularization via surrogate random design »
Michal Derezinski · Feynman Liang · Michael Mahoney -
2020 Poster: Improved guarantees and a multiple-descent curve for Column Subset Selection and the Nystrom method »
Michal Derezinski · Rajiv Khanna · Michael Mahoney -
2020 Poster: Precise expressions for random projections: Low-rank approximation and randomized Newton »
Michal Derezinski · Feynman Liang · Zhenyu Liao · Michael Mahoney -
2020 Oral: Improved guarantees and a multiple-descent curve for Column Subset Selection and the Nystrom method »
Michal Derezinski · Rajiv Khanna · Michael Mahoney -
2020 Poster: A random matrix analysis of random Fourier features: beyond the Gaussian kernel, a precise phase transition, and the corresponding double descent »
Zhenyu Liao · Romain Couillet · Michael Mahoney -
2020 Poster: A Statistical Framework for Low-bitwidth Training of Deep Neural Networks »
Jianfei Chen · Yu Gai · Zhewei Yao · Michael Mahoney · Joseph Gonzalez -
2020 Poster: 3D Shape Reconstruction from Vision and Touch »
Edward Smith · Roberto Calandra · Adriana Romero · Georgia Gkioxari · David Meger · Jitendra Malik · Michal Drozdzal -
2019 : Final remarks »
Anastasios Kyrillidis · Albert Berahas · Fred Roosta · Michael Mahoney -
2019 Workshop: Beyond first order methods in machine learning systems »
Anastasios Kyrillidis · Albert Berahas · Fred Roosta · Michael Mahoney -
2019 : Opening Remarks »
Anastasios Kyrillidis · Albert Berahas · Fred Roosta · Michael Mahoney -
2019 Poster: ANODEV2: A Coupled Neural ODE Framework »
Tianjun Zhang · Zhewei Yao · Amir Gholami · Joseph Gonzalez · Kurt Keutzer · Michael Mahoney · George Biros -
2019 Poster: Distributed estimation of the inverse Hessian by determinantal averaging »
Michal Derezinski · Michael Mahoney -
2019 Poster: Multi-source Domain Adaptation for Semantic Segmentation »
Sicheng Zhao · Bo Li · Xiangyu Yue · Yang Gu · Pengfei Xu · Runbo Hu · Hua Chai · Kurt Keutzer -
2019 Poster: Approximate Feature Collisions in Neural Nets »
Ke Li · Tianhao Zhang · Jitendra Malik -
2018 : Talk 3: Jitendra Malik - Linking Perception and Action »
Jitendra Malik -
2018 : Prof. Kurt Keutzer »
Kurt Keutzer -
2018 Poster: GIANT: Globally Improved Approximate Newton Method for Distributed Optimization »
Shusen Wang · Fred Roosta · Peng Xu · Michael Mahoney -
2018 Poster: Visual Memory for Robust Path Following »
Ashish Kumar · Saurabh Gupta · David Fouhey · Sergey Levine · Jitendra Malik -
2018 Oral: Visual Memory for Robust Path Following »
Ashish Kumar · Saurabh Gupta · David Fouhey · Sergey Levine · Jitendra Malik -
2018 Poster: Hessian-based Analysis of Large Batch Training and Robustness to Adversaries »
Zhewei Yao · Amir Gholami · Qi Lei · Kurt Keutzer · Michael Mahoney -
2017 : Poster Session (encompasses coffee break) »
Beidi Chen · Borja Balle · Daniel Lee · iuri frosio · Jitendra Malik · Jan Kautz · Ke Li · Masashi Sugiyama · Miguel A. Carreira-Perpinan · Ramin Raziperchikolaei · Theja Tulabandhula · Yung-Kyun Noh · Adams Wei Yu -
2017 Poster: Learning a Multi-View Stereo Machine »
Abhishek Kar · Christian Häne · Jitendra Malik -
2017 Poster: Union of Intersections (UoI) for Interpretable Data Driven Discovery and Prediction »
Kristofer Bouchard · Alejandro Bujan · Farbod Roosta-Khorasani · Shashanka Ubaru · Mr. Prabhat · Antoine Snijders · Jian-Hua Mao · Edward Chang · Michael W Mahoney · Sharmodeep Bhattacharya -
2016 : Kurt Keutzer: High-Performance Deep Learning »
Kurt Keutzer -
2016 Poster: Feature-distributed sparse regression: a screen-and-clean approach »
Jiyan Yang · Michael Mahoney · Michael Saunders · Yuekai Sun -
2016 Poster: Sub-sampled Newton Methods with Non-uniform Sampling »
Peng Xu · Jiyan Yang · Farbod Roosta-Khorasani · Christopher Ré · Michael Mahoney -
2015 : Challenges in Multiresolution Methods for Graph-based Learning »
Michael Mahoney -
2015 : Using Local Spectral Methods in Theory and in Practice »
Michael Mahoney -
2015 Poster: Fast Randomized Kernel Ridge Regression with Statistical Guarantees »
Ahmed Alaoui · Michael Mahoney -
2013 Workshop: Large Scale Matrix Analysis and Inference »
Reza Zadeh · Gunnar Carlsson · Michael Mahoney · Manfred K. Warmuth · Wouter M Koolen · Nati Srebro · Satyen Kale · Malik Magdon-Ismail · Ashish Goel · Matei A Zaharia · David Woodruff · Ioannis Koutis · Benjamin Recht -
2012 Poster: Semi-supervised Eigenvectors for Locally-biased Learning »
Toke Jansen Hansen · Michael W Mahoney -
2011 Workshop: Sparse Representation and Low-rank Approximation »
Ameet S Talwalkar · Lester W Mackey · Mehryar Mohri · Michael W Mahoney · Francis Bach · Mike Davies · Remi Gribonval · Guillaume R Obozinski -
2011 Poster: Regularized Laplacian Estimation and Fast Eigenvector Approximation »
Patrick O Perry · Michael W Mahoney -
2010 Workshop: Low-rank Methods for Large-scale Machine Learning »
Arthur Gretton · Michael W Mahoney · Mehryar Mohri · Ameet S Talwalkar -
2010 Poster: CUR from a Sparse Optimization Viewpoint »
Jacob Bien · Ya Xu · Michael W Mahoney -
2009 Poster: Unsupervised Feature Selection for the $k$-means Clustering Problem »
Christos Boutsidis · Michael W Mahoney · Petros Drineas