Skip to yearly menu bar Skip to main content

Workshop: OPT 2023: Optimization for Machine Learning

Structured Inverse-Free Natural Gradient: Memory-Efficient & Numerically-Stable KFAC for Large Neural Nets

Wu Lin · Felix Dangel · Runa Eschenhagen · Kirill Neklyudov · Agustinus Kristiadi · Richard Turner · Alireza Makhzani


Second-order methods for deep learning—such as KFAC—can be useful for neural network training.However, they are often memory-inefficient and numerically unstable for low-precision training since their preconditioning Kronecker factors are dense, and require high-precision matrix inversion or decomposition. Thus, such methods are not widely used for training large neural networks such as transformer-based models. We address these two issues by (i) formulating an inverse-free update of KFAC and (ii) imposing structures in each of the Kronecker factors, resulting in a method we term structured inverse-free natural gradient descent (SINGD). On large modern neural networks, we show that, in contrast to KFAC, SINGD is memory efficient and numerically robust.

Chat is not available.