Timezone: »
The stochastic heavy ball method (SHB), also known as stochastic gradient descent (SGD) with Polyak's momentum, is widely used in training neural networks. However, despite the remarkable success of such algorithm in practice, its theoretical characterization remains limited. In this paper, we focus on neural networks with two and three layers and provide a rigorous understanding of the properties of the solutions found by SHB: \emph{(i)} stability after dropping out part of the neurons, \emph{(ii)} connectivity along a low-loss path, and \emph{(iii)} convergence to the global optimum.To achieve this goal, we take a mean-field view and relate the SHB dynamics to a certain partial differential equation in the limit of large network widths. This mean-field perspective has inspired a recent line of work focusing on SGD while, in contrast, our paper considers an algorithm with momentum. More specifically, after proving existence and uniqueness of the limit differential equations, we show convergence to the global optimum and give a quantitative bound between the mean-field limit and the SHB dynamics of a finite-width network. Armed with this last bound, we are able to establish the dropout-stability and connectivity of SHB solutions.
Author Information
Diyuan Wu (ISTA)
Vyacheslav Kungurtsev (Czech Technical Univeresity in Prague)
Marco Mondelli (IST Austria)
More from the Same Authors
-
2022 : Poster Session 1 »
Andrew Lowy · Thomas Bonnier · Yiling Xie · Guy Kornowski · Simon Schug · Seungyub Han · Nicolas Loizou · xinwei zhang · Laurent Condat · Tabea E. Röber · Si Yi Meng · Marco Mondelli · Runlong Zhou · Eshaan Nichani · Adrian Goldwaser · Rudrajit Das · Kayhan Behdin · Atish Agarwala · Mukul Gagrani · Gary Cheng · Tian Li · Haoran Sun · Hossein Taheri · Allen Liu · Siqi Zhang · Dmitrii Avdiukhin · Bradley Brown · Miaolan Xie · Junhyung Lyle Kim · Sharan Vaswani · Xinmeng Huang · Ganesh Ramachandra Kini · Angela Yuan · Weiqiang Zheng · Jiajin Li -
2022 Poster: The price of ignorance: how much does it cost to forget noise structure in low-rank matrix estimation? »
Jean Barbier · TianQi Hou · Marco Mondelli · Manuel Saenz -
2022 Poster: Memorization and Optimization in Deep Neural Networks with Minimum Over-parameterization »
Simone Bombari · Mohammad Hossein Amani · Marco Mondelli -
2021 Poster: When Are Solutions Connected in Deep Networks? »
Quynh Nguyen · Pierre Bréchet · Marco Mondelli -
2021 Poster: PCA Initialization for Approximate Message Passing in Rotationally Invariant Models »
Marco Mondelli · Ramji Venkataramanan -
2020 Poster: Global Convergence of Deep Networks with One Wide Layer Followed by Pyramidal Topology »
Quynh Nguyen · Marco Mondelli