NeurIPS Fast, Exact Subsampled Natural Gradients and First-Order KFAC

Spotlight
in
Workshop: OPT 2021: Optimization for Machine Learning

Fast, Exact Subsampled Natural Gradients and First-Order KFAC

Frederik Benzing

[ Abstract ]

Abstract:

Second-order optimization algorithms hold the potential to speed up learning in neural networks, but are notoriously hard to compute due to the enormous size of the curvature matrix. This problem has inspired approximations of the curvature matrix, which allow for efficient computations, most prominently the kronecker-factored curvature approximation (KFAC). Indeed, KFAC shows significant speed-ups for optimization compared to standard baselines. In this context, we challenge two common beliefs: Firstly we show that, when subsampling the curvature matrix (in our case the Fisher Information), second-order updates can be computed efficiently and exactly: The PyTorch implementation of our method requires less than twice as much amortised wall-clock time per parameter update than SGD. Secondly, through a careful set of experiments, we demonstrate that KFAC does not owe its performance to approximating the curvature matrix, but rather is closely linked to a new, simple first-order optimization algorithm. We propose and analyse using this first-order optimizer and demonstrate that it outperforms KFAC both in terms of computation cost and optimization progress per parameter update.

Spotlight in Workshop: OPT 2021: Optimization for Machine Learning

Fast, Exact Subsampled Natural Gradients and First-Order KFAC

Frederik Benzing

Spotlight
in
Workshop: OPT 2021: Optimization for Machine Learning