Timezone: »
In this paper, we comprehensively reveal the learning dynamics of normalized neural network using Stochastic Gradient Descent (with momentum) and Weight Decay (WD), named as Spherical Motion Dynamics (SMD). Most related works focus on studying behavior of effective learning rate" in
equilibrium" state, i.e. assuming weight norm remains unchanged. However, their discussion on why this equilibrium can be reached is either absent or less convincing. Our work directly explores the cause of equilibrium, as a special state of SMD. Specifically, 1) we introduce the assumptions that can lead to equilibrium state in SMD, and prove equilibrium can be reached in a linear rate regime under given assumptions; 2) we propose ``angular update" as a substitute for effective learning rate to depict the state of SMD, and derive the theoretical value of angular update in equilibrium state; 3) we verify our assumptions and theoretical results on various large-scale computer vision tasks including ImageNet and MSCOCO with standard settings. Experiment results show our theoretical findings agree well with empirical observations. We also show that the behavior of angular update in SMD can produce interesting effect to the optimization of neural network in practice.
Author Information
Ruosi Wan (Megvii Technology Technology)
Zhanxing Zhu (Peking University)
Xiangyu Zhang (MEGVII Technology)
Jian Sun (Megvii, Face++)
Related Events (a corresponding poster, oral, or spotlight)
-
2021 Poster: Spherical Motion Dynamics: Learning Dynamics of Normalized Neural Network using SGD and Weight Decay »
Thu. Dec 9th 08:30 -- 10:00 AM Room
More from the Same Authors
-
2022 Poster: Unifying Voxel-based Representation with Transformer for 3D Object Detection »
Yanwei Li · Yilun Chen · Xiaojuan Qi · Zeming Li · Jian Sun · Jiaya Jia -
2023 Poster: Hierarchical Semi-Implicit Variational Inference with Application to Diffusion Model Acceleration »
Longlin Yu · Tianyu Xie · Yu Zhu · Tong Yang · Xiangyu Zhang · Cheng Zhang -
2023 Poster: Slot-guided Volumetric Object Radiance Fields »
DI QI · Tong Yang · Xiangyu Zhang -
2023 Poster: Neural Lad: A Neural Latent Dynamics Framework for Times Series Modeling »
ting li · Jianguo Li · Zhanxing Zhu -
2023 Poster: RevColV2: Exploring Disentangled Representations in Masked Image Modeling »
Qi Han · Yuxuan Cai · Xiangyu Zhang -
2023 Poster: Implicit Bias of (Stochastic) Gradient Descent for Rank-1 Linear Neural Network »
Bochen Lv · Zhanxing Zhu -
2022 Poster: Self-Supervised Visual Representation Learning with Semantic Grouping »
Xin Wen · Bingchen Zhao · Anlin Zheng · Xiangyu Zhang · Xiaojuan Qi -
2021 Poster: Dynamic Grained Encoder for Vision Transformers »
Lin Song · Songyang Zhang · Songtao Liu · Zeming Li · Xuming He · Hongbin Sun · Jian Sun · Nanning Zheng -
2021 Poster: Instance-Conditional Knowledge Distillation for Object Detection »
Zijian Kang · Peizhen Zhang · Xiangyu Zhang · Jian Sun · Nanning Zheng -
2021 Poster: SOLQ: Segmenting Objects by Learning Queries »
Bin Dong · Fangao Zeng · Tiancai Wang · Xiangyu Zhang · Yichen Wei -
2020 Poster: Black-Box Certification with Randomized Smoothing: A Functional Optimization Based Framework »
Dinghuai Zhang · Mao Ye · Chengyue Gong · Zhanxing Zhu · Qiang Liu -
2020 Poster: Rethinking Learnable Tree Filter for Generic Feature Transform »
Lin Song · Yanwei Li · Zhengkai Jiang · Zeming Li · Xiangyu Zhang · Hongbin Sun · Jian Sun · Nanning Zheng -
2020 Poster: Fine-Grained Dynamic Head for Object Detection »
Lin Song · Yanwei Li · Zhengkai Jiang · Zeming Li · Hongbin Sun · Jian Sun · Nanning Zheng -
2020 Poster: Knowledge Distillation in Wide Neural Networks: Risk Bound, Data Efficiency and Imperfect Teacher »
Guangda Ji · Zhanxing Zhu -
2019 Poster: Learnable Tree Filter for Structure-preserving Feature Transform »
Lin Song · Yanwei Li · Zeming Li · Gang Yu · Hongbin Sun · Jian Sun · Nanning Zheng -
2019 Poster: DetNAS: Backbone Search for Object Detection »
Yukang Chen · Tong Yang · Xiangyu Zhang · GAOFENG MENG · Xinyu Xiao · Jian Sun -
2019 Poster: You Only Propagate Once: Accelerating Adversarial Training via Maximal Principle »
Dinghuai Zhang · Tianyuan Zhang · Yiping Lu · Zhanxing Zhu · Bin Dong -
2018 Poster: Thermostat-assisted continuously-tempered Hamiltonian Monte Carlo for Bayesian learning »
Rui Luo · Jianhong Wang · Yaodong Yang · Jun WANG · Zhanxing Zhu -
2018 Poster: Reinforced Continual Learning »
Ju Xu · Zhanxing Zhu -
2018 Poster: MetaAnchor: Learning to Detect Objects with Customized Anchors »
Tong Yang · Xiangyu Zhang · Zeming Li · Wenqiang Zhang · Jian Sun -
2018 Poster: Bayesian Adversarial Learning »
Nanyang Ye · Zhanxing Zhu -
2017 Poster: Langevin Dynamics with Continuous Tempering for Training Deep Neural Networks »
Nanyang Ye · Zhanxing Zhu · Rafal Mantiuk