Timezone: »
Adversarial training is a powerful type of defense against adversarial examples. Previous empirical results suggest that adversarial training requires wider networks for better performances. However, it remains elusive how does neural network width affect model robustness. In this paper, we carefully examine the relationship between network width and model robustness. Specifically, we show that the model robustness is closely related to the tradeoff between natural accuracy and perturbation stability, which is controlled by the robust regularization parameter λ. With the same λ, wider networks can achieve better natural accuracy but worse perturbation stability, leading to a potentially worse overall model robustness. To understand the origin of this phenomenon, we further relate the perturbation stability with the network's local Lipschitzness. By leveraging recent results on neural tangent kernels, we theoretically show that wider networks tend to have worse perturbation stability. Our analyses suggest that: 1) the common strategy of first fine-tuning λ on small networks and then directly use it for wide model training could lead to deteriorated model robustness; 2) one needs to properly enlarge λ to unleash the robustness potential of wider models fully. Finally, we propose a new Width Adjusted Regularization (WAR) method that adaptively enlarges λ on wide models and significantly saves the tuning time.
Author Information
Boxi Wu (Zhejiang University)
Jinghui Chen (University of Virginia)
Deng Cai (ZJU)
Xiaofei He (Zhejiang University)
Quanquan Gu (UCLA)
More from the Same Authors
-
2021 : Learning Two-Player Mixture Markov Games: Kernel Function Approximation and Correlated Equilibrium »
Chris Junchi Li · Dongruo Zhou · Quanquan Gu · Michael Jordan -
2021 Poster: The Benefits of Implicit Regularization from SGD in Least Squares Problems »
Difan Zou · Jingfeng Wu · Vladimir Braverman · Quanquan Gu · Dean Foster · Sham Kakade -
2021 Poster: Uniform-PAC Bounds for Reinforcement Learning with Linear Function Approximation »
Jiafan He · Dongruo Zhou · Quanquan Gu -
2021 Poster: Proxy Convexity: A Unified Framework for the Analysis of Neural Networks Trained by Gradient Descent »
Spencer Frei · Quanquan Gu -
2021 Poster: Risk Bounds for Over-parameterized Maximum Margin Classification on Sub-Gaussian Mixtures »
Yuan Cao · Quanquan Gu · Mikhail Belkin -
2021 Poster: Nearly Minimax Optimal Reinforcement Learning for Discounted MDPs »
Jiafan He · Dongruo Zhou · Quanquan Gu -
2021 Poster: Reward-Free Model-Based Reinforcement Learning with Linear Function Approximation »
Weitong ZHANG · Dongruo Zhou · Quanquan Gu -
2021 Poster: Variance-Aware Off-Policy Evaluation with Linear Function Approximation »
Yifei Min · Tianhao Wang · Dongruo Zhou · Quanquan Gu -
2021 Poster: Iterative Teacher-Aware Learning »
Luyao Yuan · Dongruo Zhou · Junhong Shen · Jingdong Gao · Jeffrey L Chen · Quanquan Gu · Ying Nian Wu · Song-Chun Zhu -
2021 Poster: Provably Efficient Reinforcement Learning with Linear Function Approximation under Adaptivity Constraints »
Tianhao Wang · Dongruo Zhou · Quanquan Gu -
2021 Poster: Exploring Architectural Ingredients of Adversarially Robust Deep Neural Networks »
Hanxun Huang · Yisen Wang · Sarah Erfani · Quanquan Gu · James Bailey · Xingjun Ma -
2021 Poster: Pure Exploration in Kernel and Neural Bandits »
Yinglun Zhu · Dongruo Zhou · Ruoxi Jiang · Quanquan Gu · Rebecca Willett · Robert Nowak -
2019 Poster: Region Mutual Information Loss for Semantic Segmentation »
Shuai Zhao · Yang Wang · Zheng Yang · Deng Cai -
2018 Poster: Third-order Smoothness Helps: Faster Stochastic Optimization Algorithms for Finding Local Minima »
Yaodong Yu · Pan Xu · Quanquan Gu -
2018 Poster: Global Convergence of Langevin Dynamics Based Algorithms for Nonconvex Optimization »
Pan Xu · Jinghui Chen · Difan Zou · Quanquan Gu -
2018 Spotlight: Global Convergence of Langevin Dynamics Based Algorithms for Nonconvex Optimization »
Pan Xu · Jinghui Chen · Difan Zou · Quanquan Gu -
2018 Poster: Stochastic Nested Variance Reduced Gradient Descent for Nonconvex Optimization »
Dongruo Zhou · Pan Xu · Quanquan Gu -
2018 Spotlight: Stochastic Nested Variance Reduced Gradient Descent for Nonconvex Optimization »
Dongruo Zhou · Pan Xu · Quanquan Gu -
2018 Poster: Distributed Learning without Distress: Privacy-Preserving Empirical Risk Minimization »
Bargav Jayaraman · Lingxiao Wang · David Evans · Quanquan Gu -
2018 Poster: MacNet: Transferring Knowledge from Machine Comprehension to Sequence-to-Sequence Models »
Boyuan Pan · Yazheng Yang · Hao Li · Zhou Zhao · Yueting Zhuang · Deng Cai · Xiaofei He -
2017 Poster: Speeding Up Latent Variable Gaussian Graphical Model Estimation via Nonconvex Optimization »
Pan Xu · Jian Ma · Quanquan Gu -
2016 Poster: Semiparametric Differential Graph Models »
Pan Xu · Quanquan Gu -
2015 Poster: High Dimensional EM Algorithm: Statistical Optimization and Asymptotic Normality »
Zhaoran Wang · Quanquan Gu · Yang Ning · Han Liu -
2014 Poster: Sparse PCA with Oracle Property »
Quanquan Gu · Zhaoran Wang · Han Liu -
2014 Poster: Robust Tensor Decomposition with Gross Corruption »
Quanquan Gu · Huan Gui · Jiawei Han -
2012 Poster: Selective Labeling via Error Bound Minimization »
Quanquan Gu · Tong Zhang · Chris Ding · Jiawei Han -
2012 Poster: Multi-task Vector Field Learning »
Binbin Lin · Sen Yang · Chiyuan Zhang · Jieping Ye · Xiaofei He -
2011 Poster: Semi-supervised Regression via Parallel Field Regularization »
Binbin Lin · Chiyuan Zhang · Xiaofei He