Timezone: »
Anderson mixing (AM) is an acceleration method for fixed-point iterations. Despite its success and wide usage in scientific computing, the convergence theory of AM remains unclear, and its applications to machine learning problems are not well explored. In this paper, by introducing damped projection and adaptive regularization to the classical AM, we propose a Stochastic Anderson Mixing (SAM) scheme to solve nonconvex stochastic optimization problems. Under mild assumptions, we establish the convergence theory of SAM, including the almost sure convergence to stationary points and the worst-case iteration complexity. Moreover, the complexity bound can be improved when randomly choosing an iterate as the output. To further accelerate the convergence, we incorporate a variance reduction technique into the proposed SAM. We also propose a preconditioned mixing strategy for SAM which can empirically achieve faster convergence or better generalization ability. Finally, we apply the SAM method to train various neural networks including the vanilla CNN, ResNets, WideResNet, ResNeXt, DenseNet and LSTM. Experimental results on image classification and language model demonstrate the advantages of our method.
Author Information
Fuchao Wei (Tsinghua University, Tsinghua University)
Chenglong Bao (Tsinghua university)
Yang Liu (Tsinghua University)
More from the Same Authors
-
2022 Poster: A Variant of Anderson Mixing with Minimal Memory Size »
Fuchao Wei · Chenglong Bao · Yang Liu · Guangwen Yang -
2023 Poster: Crystal Structure Prediction by Joint Equivariant Diffusion »
Rui Jiao · Wenbing Huang · Peijia Lin · Jiaqi Han · Pin Chen · Yutong Lu · Yang Liu -
2022 Poster: Molecule Generation by Principal Subgraph Mining and Assembling »
Xiangzhe Kong · Wenbing Huang · Zhixing Tan · Yang Liu -
2022 Poster: A Closer Look at the Adversarial Robustness of Deep Equilibrium Models »
Zonghan Yang · Tianyu Pang · Yang Liu -
2021 Poster: AFEC: Active Forgetting of Negative Transfer in Continual Learning »
Liyuan Wang · Mingtian Zhang · Zhongfan Jia · Qian Li · Chenglong Bao · Kaisheng Ma · Jun Zhu · Yi Zhong -
2020 Poster: Task-Oriented Feature Distillation »
Linfeng Zhang · Yukang Shi · Zuoqiang Shi · Kaisheng Ma · Chenglong Bao -
2019 Poster: SCAN: A Scalable Neural Networks Framework Towards Compact and Efficient Models »
Linfeng Zhang · Zhanhong Tan · Jiebo Song · Jingwei Chen · Chenglong Bao · Kaisheng Ma