Timezone: »
Modern deep neural networks (DNNs) have achieved state-of-the-art performances but are typically over-parameterized. The over-parameterization may result in undesirably large generalization error in the absence of other customized training strategies. Recently, a line of research under the name of Sharpness-Aware Minimization (SAM) has shown that minimizing a sharpness measure, which reflects the geometry of the loss landscape, can significantly reduce the generalization error. However, SAM-like methods incur a two-fold computational overhead of the given base optimizer (e.g. SGD) for approximating the sharpness measure. In this paper, we propose Sharpness-Aware Training for Free, or SAF, which mitigates the sharp landscape at almost zero additional computational cost over the base optimizer. Intuitively, SAF achieves this by avoiding sudden drops in the loss in the sharp local minima throughout the trajectory of the updates of the weights. Specifically, we suggest a novel trajectory loss, based on the KL-divergence between the outputs of DNNs with the current weights and past weights, as a replacement of the SAM's sharpness measure. This loss captures the rate of change of the training loss along the model's update trajectory. By minimizing it, SAF ensures the convergence to a flat minimum with improved generalization capabilities. Extensive empirical results show that SAF minimizes the sharpness in the same way that SAM does, yielding better results on the ImageNet dataset with essentially the same computational cost as the base optimizer.
Author Information
JIAWEI DU (CENTRE FOR FRONTIER AI RESEARCH (CFAR) A*STAR, national university of singapore)
Daquan Zhou (National University of Singapore)
Jiashi Feng (UC Berkeley)
Vincent Tan (National University of Singapore)
Joey Tianyi Zhou (IHPC, A*STAR)
More from the Same Authors
-
2021 : Architecture Personalization in Resource-constrained Federated Learning »
Mi Luo · Fei Chen · Zhenguo Li · Jiashi Feng -
2022 Poster: Multi-Scale Adaptive Network for Single Image Denoising »
Yuanbiao Gou · Peng Hu · Jiancheng Lv · Joey Tianyi Zhou · Xi Peng -
2022 Spotlight: Scaling & Shifting Your Features: A New Baseline for Efficient Model Tuning »
Dongze Lian · Daquan Zhou · Jiashi Feng · Xinchao Wang -
2022 Spotlight: Lightning Talks 6A-1 »
Ziyi Wang · Nian Liu · Yaming Yang · Qilong Wang · Yuanxin Liu · Zongxin Yang · Yizhao Gao · Yanchen Deng · Dongze Lian · Nanyi Fei · Ziyu Guan · Xiao Wang · Shufeng Kong · Xumin Yu · Daquan Zhou · Yi Yang · Fandong Meng · Mingze Gao · Caihua Liu · Yongming Rao · Zheng Lin · Haoyu Lu · Zhe Wang · Jiashi Feng · Zhaolin Zhang · Deyu Bo · Xinchao Wang · Chuan Shi · Jiangnan Li · Jiangtao Xie · Jie Zhou · Zhiwu Lu · Wei Zhao · Bo An · Jiwen Lu · Peihua Li · Jian Pei · Hao Jiang · Cai Xu · Peng Fu · Qinghua Hu · Yijie Li · Weigang Lu · Yanan Cao · Jianbin Huang · Weiping Wang · Zhao Cao · Jie Zhou -
2022 Poster: Minimax Optimal Fixed-Budget Best Arm Identification in Linear Bandits »
Junwen Yang · Vincent Tan -
2022 Poster: Relational Reasoning via Set Transformers: Provable Efficiency and Applications to MARL »
Fengzhuo Zhang · Boyi Liu · Kaixin Wang · Vincent Tan · Zhuoran Yang · Zhaoran Wang -
2022 Poster: Deep Model Reassembly »
Xingyi Yang · Daquan Zhou · Songhua Liu · Jingwen Ye · Xinchao Wang -
2022 Poster: Scaling & Shifting Your Features: A New Baseline for Efficient Model Tuning »
Dongze Lian · Daquan Zhou · Jiashi Feng · Xinchao Wang -
2022 Poster: Self-Supervised Aggregation of Diverse Experts for Test-Agnostic Long-Tailed Recognition »
Yifan Zhang · Bryan Hooi · Lanqing Hong · Jiashi Feng -
2021 : Contributed Talk 3: Architecture Personalization in Resource-constrained Federated Learning »
Mi Luo · Fei Chen · Zhenguo Li · Jiashi Feng -
2021 Poster: Trustworthy Multimodal Regression with Mixture of Normal-inverse Gamma Distributions »
Huan Ma · Zongbo Han · Changqing Zhang · Huazhu Fu · Joey Tianyi Zhou · Qinghua Hu -
2021 Poster: Robustifying Algorithms of Learning Latent Trees with Vector Variables »
Fengzhuo Zhang · Vincent Tan -
2021 Poster: All Tokens Matter: Token Labeling for Training Better Vision Transformers »
Zi-Hang Jiang · Qibin Hou · Li Yuan · Daquan Zhou · Yujun Shi · Xiaojie Jin · Anran Wang · Jiashi Feng -
2020 Poster: Deep Reinforcement Learning with Stacked Hierarchical Attention for Text-based Games »
Yunqiu Xu · Meng Fang · Ling Chen · Yali Du · Joey Tianyi Zhou · Chengqi Zhang -
2020 Poster: Partially View-aligned Clustering »
Zhenyu Huang · Peng Hu · Joey Tianyi Zhou · Jiancheng Lv · Xi Peng -
2020 Oral: Partially View-aligned Clustering »
Zhenyu Huang · Peng Hu · Joey Tianyi Zhou · Jiancheng Lv · Xi Peng -
2020 Poster: ConvBERT: Improving BERT with Span-based Dynamic Convolution »
Zi-Hang Jiang · Weihao Yu · Daquan Zhou · Yunpeng Chen · Jiashi Feng · Shuicheng Yan -
2020 Spotlight: ConvBERT: Improving BERT with Span-based Dynamic Convolution »
Zi-Hang Jiang · Weihao Yu · Daquan Zhou · Yunpeng Chen · Jiashi Feng · Shuicheng Yan -
2019 Poster: CPM-Nets: Cross Partial Multi-View Networks »
Changqing Zhang · Zongbo Han · yajie cui · Huazhu Fu · Joey Tianyi Zhou · Qinghua Hu -
2019 Spotlight: CPM-Nets: Cross Partial Multi-View Networks »
Changqing Zhang · Zongbo Han · yajie cui · Huazhu Fu · Joey Tianyi Zhou · Qinghua Hu