Timezone: »
Adversarial robustness is a key desirable property of neural networks. It has been empirically shown to be affected by their sizes, with larger networks being typically more robust. Recently, \citet{bubeck2021universal} proved a lower bound on the Lipschitz constant of functions that fit the training data in terms of their number of parameters. This raises an interesting open question, do---and can---functions with more parameters, but not necessarily more computational cost, have better robustness? We study this question for sparse Mixture of Expert models (MoEs), that make it possible to scale up the model size for a roughly constant computational cost. We theoretically show that under certain conditions on the routing and the structure of the data, MoEs can have significantly smaller Lipschitz constants than their dense counterparts. The robustness of MoEs can suffer when the highest weighted experts for an input implement sufficiently different functions. We next empirically evaluate the robustness of MoEs on ImageNet using adversarial attacks and show they are indeed more robust than dense models with the same computational cost. We make key observations showing the robustness of MoEs to the choice of experts, highlighting the redundancy of experts in models trained in practice.
Author Information
Joan Puigcerver (Google Research)
Rodolphe Jenatton (Amazon Research)
Carlos Riquelme (Google Brain)
Pranjal Awasthi (Google)
Srinadh Bhojanapalli (Google Research)
More from the Same Authors
-
2021 Spotlight: On the Existence of The Adversarial Bayes Classifier »
Pranjal Awasthi · Natalie Frank · Mehryar Mohri -
2021 Spotlight: Calibration and Consistency of Adversarial Surrogate Losses »
Pranjal Awasthi · Natalie Frank · Anqi Mao · Mehryar Mohri · Yutao Zhong -
2022 : A Theory of Learning with Competing Objectives and User Feedback »
Pranjal Awasthi · Corinna Cortes · Yishay Mansour · Mehryar Mohri -
2022 : Theory and Algorithm for Batch Distribution Drift Problems »
Pranjal Awasthi · Corinna Cortes · Christopher Mohri -
2022 : A Theory of Learning with Competing Objectives and User Feedback »
Pranjal Awasthi · Corinna Cortes · Yishay Mansour · Mehryar Mohri -
2022 : A Theory of Learning with Competing Objectives and User Feedback »
Pranjal Awasthi · Corinna Cortes · Yishay Mansour · Mehryar Mohri -
2022 Poster: Trimmed Maximum Likelihood Estimation for Robust Generalized Linear Model »
Pranjal Awasthi · Abhimanyu Das · Weihao Kong · Rajat Sen -
2022 Poster: Multi-Class $H$-Consistency Bounds »
Pranjal Awasthi · Anqi Mao · Mehryar Mohri · Yutao Zhong -
2022 Poster: Semi-supervised Active Linear Regression »
Nived Rajaraman · Fnu Devvrit · Pranjal Awasthi -
2022 Poster: Multimodal Contrastive Learning with LIMoE: the Language-Image Mixture of Experts »
Basil Mustafa · Carlos Riquelme · Joan Puigcerver · Rodolphe Jenatton · Neil Houlsby -
2021 Poster: On the Existence of The Adversarial Bayes Classifier »
Pranjal Awasthi · Natalie Frank · Mehryar Mohri -
2021 Poster: Efficient Algorithms for Learning Depth-2 Neural Networks with General ReLU Activations »
Pranjal Awasthi · Alex Tang · Aravindan Vijayaraghavan -
2021 Poster: Neural Active Learning with Performance Guarantees »
Zhilei Wang · Pranjal Awasthi · Christoph Dann · Ayush Sekhari · Claudio Gentile -
2021 Poster: A Convergence Analysis of Gradient Descent on Graph Neural Networks »
Pranjal Awasthi · Abhimanyu Das · Sreenivas Gollapudi -
2021 Poster: Scaling Vision with Sparse Mixture of Experts »
Carlos Riquelme · Joan Puigcerver · Basil Mustafa · Maxim Neumann · Rodolphe Jenatton · AndrĂ© Susano Pinto · Daniel Keysers · Neil Houlsby -
2021 Poster: Calibration and Consistency of Adversarial Surrogate Losses »
Pranjal Awasthi · Natalie Frank · Anqi Mao · Mehryar Mohri · Yutao Zhong -
2020 Poster: An efficient nonconvex reformulation of stagewise convex optimization problems »
Rudy Bunel · Oliver Hinder · Srinadh Bhojanapalli · Krishnamurthy Dvijotham -
2020 Poster: O(n) Connections are Expressive Enough: Universal Approximability of Sparse Transformers »
Chulhee Yun · Yin-Wen Chang · Srinadh Bhojanapalli · Ankit Singh Rawat · Sashank Reddi · Sanjiv Kumar -
2020 Session: Orals & Spotlights Track 13: Deep Learning/Theory »
Stanislaw Jastrzebski · Srinadh Bhojanapalli -
2019 Poster: Adaptive Temporal-Difference Learning for Policy Evaluation with Per-State Uncertainty Estimates »
Carlos Riquelme · Hugo Penedones · Damien Vincent · Hartmut Maennel · Sylvain Gelly · Timothy A Mann · Andre Barreto · Gergely Neu -
2019 Poster: Practical and Consistent Estimation of f-Divergences »
Paul Rubenstein · Olivier Bousquet · Josip Djolonga · Carlos Riquelme · Ilya Tolstikhin -
2018 Poster: Scalable Hyperparameter Transfer Learning »
Valerio Perrone · Rodolphe Jenatton · Matthias W Seeger · Cedric Archambeau -
2017 Poster: Exploring Generalization in Deep Learning »
Behnam Neyshabur · Srinadh Bhojanapalli · David Mcallester · Nati Srebro -
2017 Poster: Implicit Regularization in Matrix Factorization »
Suriya Gunasekar · Blake Woodworth · Srinadh Bhojanapalli · Behnam Neyshabur · Nati Srebro -
2017 Spotlight: Implicit Regularization in Matrix Factorization »
Suriya Gunasekar · Blake Woodworth · Srinadh Bhojanapalli · Behnam Neyshabur · Nati Srebro -
2016 Poster: Single Pass PCA of Matrix Products »
Shanshan Wu · Srinadh Bhojanapalli · Sujay Sanghavi · Alex Dimakis -
2016 Poster: Global Optimality of Local Search for Low Rank Matrix Recovery »
Srinadh Bhojanapalli · Behnam Neyshabur · Nati Srebro -
2013 Poster: Convex Relaxations for Permutation Problems »
Fajwel Fogel · Rodolphe Jenatton · Francis Bach · Alexandre d'Aspremont