Skip to yearly menu bar Skip to main content

Workshop: OPT 2023: Optimization for Machine Learning

Revisiting Random Weight Perturbation for Efficiently Improving Generalization

Tao Li · Weihao weihao · Qinghua Tao · Zehao Lei · Yingwen Wu · Kun Fang · Mingzhen He · Xiaolin Huang


Improving the generalization ability of modern deep neural networks (DNNs) is a fundamental problem in machine learning. Two branches of methods have been proposed to seek flat minima and improve generalization: one led by sharpness-aware minimization (SAM) minimizes the worst-case neighborhood loss through adversarial weight perturbation (AWP), and the other minimizes the expected Bayes objective with random weight perturbation (RWP). Although RWP has advantages in training time and is closely linked to AWP on a mathematical basis, its empirical performance always lags behind that of AWP. In this paper, we revisit RWP and analyze its convergence properties. We find that RWP requires a much larger perturbation magnitude than AWP, which leads to convergence issues. To resolve this, we propose m-RWP that incorporates the original loss objective to aid convergence, significantly lifting the performance of RWP. Compared with SAM, m-RWP is more efficient since it enables parallel computing of the two gradient steps and faster convergence, with comparable or even better performance. We will release the code for reproducibility.

Chat is not available.