Timezone: »
Optimization algorithms that leverage gradient covariance information, such as variants of natural gradient descent (Amari, 1998), offer the prospect of yielding more effective descent directions. For models with many parameters, the covari- ance matrix they are based on becomes gigantic, making them inapplicable in their original form. This has motivated research into both simple diagonal approxima- tions and more sophisticated factored approximations such as KFAC (Heskes, 2000; Martens & Grosse, 2015; Grosse & Martens, 2016). In the present work we draw inspiration from both to propose a novel approximation that is provably better than KFAC and amendable to cheap partial updates. It consists in tracking a diagonal variance, not in parameter coordinates, but in a Kronecker-factored eigenbasis, in which the diagonal approximation is likely to be more effective. Experiments show improvements over KFAC in optimization speed for several deep network architectures.
Author Information
Thomas George (MILA, Université de Montréal)
César Laurent (Mila - Université de Montréal)
Xavier Bouthillier (Université de Montréal)
Nicolas Ballas (Facebook FAIR)
Pascal Vincent (Facebook and U. Montreal)
More from the Same Authors
-
2020 Poster: Adversarial Example Games »
Joey Bose · Gauthier Gidel · Hugo Berard · Andre Cianflone · Pascal Vincent · Simon Lacoste-Julien · Will Hamilton -
2019 : Catered Lunch and Poster Viewing (in Workshop Room) »
Gustavo Stolovitzky · Prabhu Pradhan · Pablo Duboue · Zhiwen Tang · Aleksei Natekin · Elizabeth Bondi-Kelly · Xavier Bouthillier · Stephanie Milani · Heimo Müller · Andreas T. Holzinger · Stefan Harrer · Ben Day · Andrey Ustyuzhanin · William Guss · Mahtab Mirmomeni -
2019 Workshop: Retrospectives: A Venue for Self-Reflection in ML Research »
Ryan Lowe · Yoshua Bengio · Joelle Pineau · Michela Paganini · Jessica Forde · Shagun Sodhani · Abhishek Gupta · Joel Lehman · Peter Henderson · Kanika Madan · Koustuv Sinha · Xavier Bouthillier -
2019 Poster: Gossip-based Actor-Learner Architectures for Deep Reinforcement Learning »
Mahmoud Assran · Joshua Romoff · Nicolas Ballas · Joelle Pineau · Mike Rabbat -
2015 Poster: Efficient Exact Gradient Update for training Deep Networks with Very Large Sparse Targets »
Pascal Vincent · Alexandre de Brébisson · Xavier Bouthillier -
2015 Oral: Efficient Exact Gradient Update for training Deep Networks with Very Large Sparse Targets »
Pascal Vincent · Alexandre de Brébisson · Xavier Bouthillier -
2013 Poster: Generalized Denoising Auto-Encoders as Generative Models »
Yoshua Bengio · Li Yao · Guillaume Alain · Pascal Vincent -
2011 Oral: The Manifold Tangent Classifier »
Salah Rifai · Yann N Dauphin · Pascal Vincent · Yoshua Bengio · Xavier Muller -
2011 Poster: The Manifold Tangent Classifier »
Salah Rifai · Yann N Dauphin · Pascal Vincent · Yoshua Bengio · Xavier Muller