Timezone: »

 
Fishy: Layerwise Fisher Approximation for Higher-order Neural Network Optimization
Abel Peirson · Ehsan Amid · Yatong Chen · Vladimir Feinberg · Manfred Warmuth · Rohan Anil
Event URL: https://openreview.net/forum?id=cScb-RrBQC »

We introduce Fishy, a local approximation of the Fisher information matrix at each layer for natural gradient descent training of deep neural networks. The true Fisher approximation for deep networks involves sampling labels from the model's predictive distribution at the output layer and performing a full backward pass -- Fishy defines a Bregman exponential family distribution at each layer, performing the sampling locally. Local sampling allows for model parallelism when forming the preconditioner, removing the need for the extra backward pass. We demonstrate our approach through the Shampoo optimizer, replacing its preconditioner gradients with our locally sampled gradients. Our training results on deep autoencoder and VGG16 image classification models indicate the efficacy of our construction.

Author Information

Abel Peirson (Stanford University)
Ehsan Amid (Google)
Yatong Chen (UC Santa Cruz, Google Brain)
Vladimir Feinberg
Manfred Warmuth (University of California, Santa Cruz)
Rohan Anil (Google Research, Brain)

More from the Same Authors