Timezone: »
We introduce Fishy, a local approximation of the Fisher information matrix at each layer for natural gradient descent training of deep neural networks. The true Fisher approximation for deep networks involves sampling labels from the model's predictive distribution at the output layer and performing a full backward pass -- Fishy defines a Bregman exponential family distribution at each layer, performing the sampling locally. Local sampling allows for model parallelism when forming the preconditioner, removing the need for the extra backward pass. We demonstrate our approach through the Shampoo optimizer, replacing its preconditioner gradients with our locally sampled gradients. Our training results on deep autoencoder and VGG16 image classification models indicate the efficacy of our construction.
Author Information
Abel Peirson (Stanford University)
Ehsan Amid (Google)
Yatong Chen (UC Santa Cruz, Google Brain)
Vladimir Feinberg
Manfred Warmuth (University of California, Santa Cruz)
Rohan Anil (Google Research, Brain)
More from the Same Authors
-
2021 Spotlight: Efficiently Identifying Task Groupings for Multi-Task Learning »
Chris Fifty · Ehsan Amid · Zhe Zhao · Tianhe Yu · Rohan Anil · Chelsea Finn -
2021 : A deep ensemble approach to X-ray polarimetry »
Abel Peirson -
2022 : Tier Balancing: Towards Dynamic Fairness over Underlying Causal Factors »
Zeyu Tang · Yatong Chen · Yang Liu · Kun Zhang -
2022 : Fast Implicit Constrained Optimization of Non-decomposable Objectives for Deep Networks »
Yatong Chen · Abhishek Kumar · Yang Liu · Ehsan Amid -
2023 Poster: Sketchy: Memory-efficient Adaptive Regularization with Frequent Directions »
Vladimir Feinberg · Xinyi Chen · Y. Jennifer Sun · Rohan Anil · Elad Hazan -
2023 Poster: A Computationally Efficient Sparsified Online Newton Method »
Fnu Devvrit · Sai Surya Duvvuri · Rohan Anil · Vineet Gupta · Cho-Jui Hsieh · Inderjit Dhillon -
2023 Poster: Boosting with Tempered Exponential Measures »
Richard Nock · Ehsan Amid · Manfred Warmuth -
2022 Poster: Fairness Transferability Subject to Bounded Distribution Shift »
Yatong Chen · Reilly Raab · Jialu Wang · Yang Liu -
2021 : Bounded Fairness Transferability subject to Distribution Shift »
Reilly Raab · Yatong Chen · Yang Liu -
2021 Poster: Efficiently Identifying Task Groupings for Multi-Task Learning »
Chris Fifty · Ehsan Amid · Zhe Zhao · Tianhe Yu · Rohan Anil · Chelsea Finn -
2020 : Contributed Talk 4: Strategic Recourse in Linear Classification »
Yatong Chen · Yang Liu -
2020 Poster: Stochastic Optimization with Laggard Data Pipelines »
Naman Agarwal · Rohan Anil · Tomer Koren · Kunal Talwar · Cyril Zhang -
2019 : Spotlight talks »
Damien Scieur · Konstantin Mishchenko · Rohan Anil -
2019 : Poster Session »
Eduard Gorbunov · Alexandre d'Aspremont · Lingxiao Wang · Liwei Wang · Boris Ginsburg · Alessio Quaglino · Camille Castera · Saurabh Adya · Diego Granziol · Rudrajit Das · Raghu Bollapragada · Fabian Pedregosa · Martin Takac · Majid Jahani · Sai Praneeth Karimireddy · Hilal Asi · Balint Daroczy · Leonard Adolphs · Aditya Rawal · Nicolas Brandt · Minhan Li · Giuseppe Ughi · Orlando Romero · Ivan Skorokhodov · Damien Scieur · Kiwook Bae · Konstantin Mishchenko · Rohan Anil · Vatsal Sharan · Aditya Balu · Chao Chen · Zhewei Yao · Tolga Ergen · Paul Grigas · Chris Junchi Li · Jimmy Ba · Stephen J Roberts · Sharan Vaswani · Armin Eftekhari · Chhavi Sharma -
2019 Poster: Memory Efficient Adaptive Optimization »
Rohan Anil · Vineet Gupta · Tomer Koren · Yoram Singer -
2019 Poster: Robust Bi-Tempered Logistic Loss Based on Bregman Divergences »
Ehsan Amid · Manfred K. Warmuth · Rohan Anil · Tomer Koren