Timezone: »
Second-order optimization algorithms hold the potential to speed up learning in neural networks, but are notoriously hard to compute due to the enormous size of the curvature matrix. This problem has inspired approximations of the curvature matrix, which allow for efficient computations, most prominently the kronecker-factored curvature approximation (KFAC). Indeed, KFAC shows significant speed-ups for optimization compared to standard baselines. In this context, we challenge two common beliefs: Firstly we show that, when subsampling the curvature matrix (in our case the Fisher Information), second-order updates can be computed efficiently and exactly: The PyTorch implementation of our method requires less than twice as much amortised wall-clock time per parameter update than SGD. Secondly, through a careful set of experiments, we demonstrate that KFAC does not owe its performance to approximating the curvature matrix, but rather is closely linked to a new, simple first-order optimization algorithm. We propose and analyse using this first-order optimizer and demonstrate that it outperforms KFAC both in terms of computation cost and optimization progress per parameter update.
Author Information
Frederik Benzing (ETH Zurich)
Related Events (a corresponding poster, oral, or spotlight)
-
2021 : Fast, Exact Subsampled Natural Gradients and First-Order KFAC »
Dates n/a. Room
More from the Same Authors
-
2022 : Random initialisations performing above chance and how to find them »
Frederik Benzing · Simon Schug · Robert Meier · Johannes von Oswald · Yassir Akram · Nicolas Zucchet · Laurence Aitchison · Angelika Steger -
2021 : Poster Session 1 (gather.town) »
Hamed Jalali · Robert Hönig · Maximus Mutschler · Manuel Madeira · Abdurakhmon Sadiev · Egor Shulgin · Alasdair Paren · Pascal Esser · Simon Roburin · Julius Kunze · Agnieszka Słowik · Frederik Benzing · Futong Liu · Hongyi Li · Ryotaro Mitsuboshi · Grigory Malinovsky · Jayadev Naram · Zhize Li · Igor Sokolov · Sharan Vaswani -
2021 : Contributed Talks in Session 1 (Zoom) »
Sebastian Stich · Futong Liu · Abdurakhmon Sadiev · Frederik Benzing · Simon Roburin