Timezone: »
This paper introduces a structured memory which can be easily integrated into a neural network. The memory is very large by design and significantly increases the capacity of the architecture, by up to a billion parameters with a negligible computational overhead. Its design and access pattern is based on product keys, which enable fast and exact nearest neighbor search. The ability to increase the number of parameters while keeping the same computational budget lets the overall system strike a better trade-off between prediction accuracy and computation efficiency both at training and test time. This memory layer allows us to tackle very large scale language modeling tasks. In our experiments we consider a dataset with up to 30 billion words, and we plug our memory layer in a state-of-the-art transformer-based architecture. In particular, we found that a memory augmented model with only 12 layers outperforms a baseline transformer model with 24 layers, while being twice faster at inference time. We release our code for reproducibility purposes.
Author Information
Guillaume Lample (Facebook AI Research)
Alexandre Sablayrolles (Facebook AI Research)
Marc'Aurelio Ranzato (Facebook AI Research)
Ludovic Denoyer (Facebook - FAIR)
Herve Jegou (Facebook AI Research)
Related Events (a corresponding poster, oral, or spotlight)
-
2019 Poster: Large Memory Layers with Product Keys »
Fri. Dec 13th 01:00 -- 03:00 AM Room East Exhibition Hall B + C #193
More from the Same Authors
-
2021 : Opacus: User-Friendly Differential Privacy Library in PyTorch »
Ashkan Yousefpour · Igor Shilov · Alexandre Sablayrolles · Karthik Prasad · Mani Malek Esmaeili · John Nguyen · Sayan Ghosh · Akash Bharadwaj · Jessica Zhao · Graham Cormode · Ilya Mironov -
2021 : ResNet strikes back: An improved training procedure in timm »
Ross Wightman · Hugo Touvron · Herve Jegou -
2021 : Learning a Subspace of Policies for Online Adaptation in Reinforcement Learning »
Jean-Baptiste Gaya · Laure Soulier · Ludovic Denoyer -
2022 : Draft, Sketch, and Prove: Guiding Formal Theorem Provers with Informal Proofs »
Albert Jiang · Sean Welleck · Jin Peng Zhou · Timothee Lacroix · Jiacheng Liu · Wenda Li · Mateja Jamnik · Guillaume Lample · Yuhuai Wu -
2022 : Multi-step Planning for Automated Hyperparameter Optimization with OptFormer »
Lucio M Dery · Abram Friesen · Nando de Freitas · Marc'Aurelio Ranzato · Yutian Chen -
2022 Poster: Towards Learning Universal Hyperparameter Optimizers with Transformers »
Yutian Chen · Xingyou Song · Chansoo Lee · Zi Wang · Richard Zhang · David Dohan · Kazuya Kawakami · Greg Kochanski · Arnaud Doucet · Marc'Aurelio Ranzato · Sagi Perel · Nando de Freitas -
2021 Poster: XCiT: Cross-Covariance Image Transformers »
Alaaeldin Ali · Hugo Touvron · Mathilde Caron · Piotr Bojanowski · Matthijs Douze · Armand Joulin · Ivan Laptev · Natalia Neverova · Gabriel Synnaeve · Jakob Verbeek · Herve Jegou -
2021 Poster: DOBF: A Deobfuscation Pre-Training Objective for Programming Languages »
Marie-Anne Lachaux · Baptiste Roziere · Marc Szafraniec · Guillaume Lample -
2020 Poster: Unsupervised Translation of Programming Languages »
Baptiste Roziere · Marie-Anne Lachaux · Lowik Chanussot · Guillaume Lample -
2019 Poster: Fixing the train-test resolution discrepancy »
Hugo Touvron · Andrea Vedaldi · Matthijs Douze · Herve Jegou -
2019 Poster: Unsupervised Object Segmentation by Redrawing »
Mickaël Chen · Thierry Artières · Ludovic Denoyer -
2018 : Invited Speaker #3 Marc'Aurelio Ranzato »
Marc'Aurelio Ranzato -
2018 Tutorial: Unsupervised Deep Learning »
Alex Graves · Marc'Aurelio Ranzato -
2017 Poster: Fader Networks:Manipulating Images by Sliding Attributes »
Guillaume Lample · Neil Zeghidour · Nicolas Usunier · Antoine Bordes · Ludovic DENOYER · Marc'Aurelio Ranzato -
2017 Poster: Gradient Episodic Memory for Continual Learning »
David Lopez-Paz · Marc'Aurelio Ranzato -
2015 Symposium: Deep Learning Symposium »
Yoshua Bengio · Marc'Aurelio Ranzato · Honglak Lee · Max Welling · Andrew Y Ng -
2014 Session: Oral Session 4 »
Marc'Aurelio Ranzato -
2013 Poster: DeViSE: A Deep Visual-Semantic Embedding Model »
Andrea Frome · Greg Corrado · Jonathon Shlens · Samy Bengio · Jeff Dean · Marc'Aurelio Ranzato · Tomas Mikolov -
2013 Poster: Predicting Parameters in Deep Learning »
Misha Denil · Babak Shakibi · Laurent Dinh · Marc'Aurelio Ranzato · Nando de Freitas -
2012 Poster: Large Scale Distributed Deep Networks »
Jeff Dean · Greg Corrado · Rajat Monga · Kai Chen · Matthieu Devin · Quoc V Le · Mark Mao · Marc'Aurelio Ranzato · Andrew Senior · Paul Tucker · Ke Yang · Andrew Y Ng -
2011 Workshop: Challenges in Learning Hierarchical Models: Transfer Learning and Optimization »
Quoc V. Le · Marc'Aurelio Ranzato · Russ Salakhutdinov · Josh Tenenbaum · Andrew Y Ng -
2010 Workshop: Deep Learning and Unsupervised Feature Learning »
Honglak Lee · Marc'Aurelio Ranzato · Yoshua Bengio · Geoffrey E Hinton · Yann LeCun · Andrew Y Ng -
2010 Poster: Generating more realistic images using gated MRF's »
Marc'Aurelio Ranzato · Volodymyr Mnih · Geoffrey E Hinton -
2010 Poster: Phone Recognition with the Mean-Covariance Restricted Boltzmann Machine »
George Dahl · Marc'Aurelio Ranzato · Abdel-rahman Mohamed · Geoffrey E Hinton -
2007 Poster: Sparse Feature Learning for Deep Belief Networks »
Marc'Aurelio Ranzato · Y-Lan Boureau · Yann LeCun -
2006 Poster: Efficient Learning of Sparse Representations with an Energy-Based Model »
Marc'Aurelio Ranzato · Christopher Poultney · Sumit Chopra · Yann LeCun -
2006 Spotlight: Efficient Learning of Sparse Representations with an Energy-Based Model »
Marc'Aurelio Ranzato · Christopher Poultney · Sumit Chopra · Yann LeCun