Timezone: »
Recent work in unsupervised feature learning and deep learning has shown that being able to train large models can dramatically improve performance. In this paper, we consider the problem of training a deep network with billions of parameters using tens of thousands of CPU cores. We have developed a software framework called DistBelief that can utilize computing clusters with thousands of machines to train large models. Within this framework, we have developed two algorithms for large-scale distributed training: (i) Downpour SGD, an asynchronous stochastic gradient descent procedure supporting a large number of model replicas, and (ii) Sandblaster, a framework that supports for a variety of distributed batch optimization procedures, including a distributed implementation of L-BFGS. Downpour SGD and Sandblaster L-BFGS both increase the scale and speed of deep network training. We have successfully used our system to train a deep network 100x larger than previously reported in the literature, and achieves state-of-the-art performance on ImageNet, a visual object recognition task with 16 million images and 21k categories. We show that these same techniques dramatically accelerate the training of a more modestly sized deep network for a commercial speech recognition service. Although we focus on and report performance of these methods as applied to training large neural networks, the underlying algorithms are applicable to any gradient-based machine learning algorithm.
Author Information
Jeff Dean (Google Research)
Jeff joined Google in 1999 and is currently a Google Senior Fellow. He currently leads Google's Research and Health divisions, where he co-founded the Google Brain team. He has co-designed/implemented multiple generations of Google's distributed machine learning systems for neural network training and inference, as well as multiple generations of Google's crawling, indexing, and query serving systems, and major pieces of Google's initial advertising and AdSense for Content systems. He is also a co-designer and co-implementor of Google's distributed computing infrastructure, including the MapReduce, BigTable and Spanner systems, protocol buffers, LevelDB, systems infrastructure for statistical machine translation, and a variety of internal and external libraries and developer tools. He received a Ph.D. in Computer Science from the University of Washington in 1996, working with Craig Chambers on compiler techniques for object-oriented languages. He is a Fellow of the ACM, a Fellow of the AAAS, a member of the U.S. National Academy of Engineering, and a recipient of the Mark Weiser Award and the ACM Prize in Computing.
Greg Corrado (Google Health)
Rajat Monga (Google)
Kai Chen (Google Research)
Matthieu Devin
Quoc V Le (Stanford)
Mark Mao
Marc'Aurelio Ranzato (DeepMind)
Andrew Senior (DeepMind)
Paul Tucker
Ke Yang (Google Inc.)
Andrew Y Ng (DeepLearning.AI)
More from the Same Authors
-
2021 : Inferring a Continuous Distribution of Atom Coordinates from Cryo-EM Images using VAEs »
Dan Rosenbaum · Marta Garnelo · Michal Zielinski · Charles Beattie · Ellen Clancy · Andrea Huber · Pushmeet Kohli · Andrew Senior · John Jumper · Carl Doersch · S. M. Ali Eslami · Olaf Ronneberger · Jonas Adler -
2022 : Multi-step Planning for Automated Hyperparameter Optimization with OptFormer »
Lucio M Dery · Abram Friesen · Nando de Freitas · Marc'Aurelio Ranzato · Yutian Chen -
2022 : Jeff Dean - Invited Talk »
Jeff Dean -
2022 Poster: Towards Learning Universal Hyperparameter Optimizers with Transformers »
Yutian Chen · Xingyou Song · Chansoo Lee · Zi Wang · Richard Zhang · David Dohan · Kazuya Kawakami · Greg Kochanski · Arnaud Doucet · Marc'Aurelio Ranzato · Sagi Perel · Nando de Freitas -
2021 : Inferring a Continuous Distribution of Atom Coordinates from Cryo-EM Images using VAEs »
Dan Rosenbaum · Marta Garnelo · Michal Zielinski · Charles Beattie · Ellen Clancy · Andrea Huber · Pushmeet Kohli · Andrew Senior · John Jumper · Carl Doersch · S. M. Ali Eslami · Olaf Ronneberger · Jonas Adler -
2019 : Invited Speaker: Jeff Dean »
Jeff Dean -
2019 : Climate Change: A Grand Challenge for ML »
Yoshua Bengio · Carla Gomes · Andrew Ng · Jeff Dean · Lester Mackey -
2019 : Jeff Dean (Google AI) »
Jeff Dean -
2019 Poster: Large Memory Layers with Product Keys »
Guillaume Lample · Alexandre Sablayrolles · Marc'Aurelio Ranzato · Ludovic Denoyer · Herve Jegou -
2019 Spotlight: Large Memory Layers with Product Keys »
Guillaume Lample · Alexandre Sablayrolles · Marc'Aurelio Ranzato · Ludovic Denoyer · Herve Jegou -
2018 : Lunch provided and Open Source ML Systems Showcase (TensorFlow, PyTorch 1.0, MxNET, Keras, CoreML, Ray, Chainer) »
Rajat Monga · Soumith Chintala · Thierry Moreau · Francois Chollet · Daniel Crankshaw · Robert Nishihara · Seiya Tokui -
2018 : Invited Speaker #3 Marc'Aurelio Ranzato »
Marc'Aurelio Ranzato -
2018 Tutorial: Unsupervised Deep Learning »
Alex Graves · Marc'Aurelio Ranzato -
2017 : Future Hardware Directions »
Gregory Diamos · Jeff Dean · Simon Knowles · Michael James · Scott Gray -
2017 : On-Device ML Frameworks »
Jeff Gehlhaar · Yangqing Jia · Rajat Monga -
2017 : Data center to the edge: a journey with TensorFlow »
Rajat Monga -
2017 : Greg Corrado, Google »
Greg Corrado -
2017 : Invited Talk: Machine Learning for Systems and Systems for Machine Learning, Jeff Dean, Google Brain »
Jeff Dean -
2017 : Updates from Current ML Systems (TensorFlow, PyTorch, Caffe2, CNTK, MXNet, TVM, Clipper, MacroBase, ModelDB) »
Rajat Monga · Soumith Chintala · Cha Zhang · Tianqi Chen · Daniel Crankshaw · Kai Sheng Tai · Andrew Tulloch · Manasi Vartak -
2017 Poster: Fader Networks:Manipulating Images by Sliding Attributes »
Guillaume Lample · Neil Zeghidour · Nicolas Usunier · Antoine Bordes · Ludovic DENOYER · Marc'Aurelio Ranzato -
2017 Poster: Gradient Episodic Memory for Continual Learning »
David Lopez-Paz · Marc'Aurelio Ranzato -
2016 : Invited Talk: Scaling Machine Learning Using TensorFlow (Jeff Dean, Google Brain) »
Jeff Dean -
2016 : Jeff Dean – TensorFlow: Future Directions for Simplifying Large-Scale Machine Learning »
Jeff Dean -
2016 Poster: Scaling Memory-Augmented Neural Networks with Sparse Reads and Writes »
Jack Rae · Jonathan J Hunt · Ivo Danihelka · Tim Harley · Andrew Senior · Gregory Wayne · Alex Graves · Timothy Lillicrap -
2015 : TensorFlow: A system for machine learning on heterogeneous systems »
Jeff Dean -
2015 Symposium: Deep Learning Symposium »
Yoshua Bengio · Marc'Aurelio Ranzato · Honglak Lee · Max Welling · Andrew Y Ng -
2015 Tutorial: Large-Scale Distributed Systems for Training Neural Networks »
Jeff Dean · Oriol Vinyals -
2014 Workshop: Deep Learning and Representation Learning »
Andrew Y Ng · Yoshua Bengio · Adam Coates · Roland Memisevic · Sharanyan Chetlur · Geoffrey E Hinton · Shamim Nemati · Bryan Catanzaro · Surya Ganguli · Herbert Jaeger · Phil Blunsom · Leon Bottou · Volodymyr Mnih · Chen-Yu Lee · Rich M Schwartz -
2014 Session: Oral Session 4 »
Marc'Aurelio Ranzato -
2013 Poster: DeViSE: A Deep Visual-Semantic Embedding Model »
Andrea Frome · Greg Corrado · Jonathon Shlens · Samy Bengio · Jeff Dean · Marc'Aurelio Ranzato · Tomas Mikolov -
2013 Demonstration: Easy Text Classification with Machine Learning »
Richard Socher · Romain Paulus · Bryan McCann · Andrew Y Ng -
2013 Demonstration: Distributed Representations of Words and Phrases and their Compositionality »
Tomas Mikolov · Kai Chen · Greg Corrado -
2013 Poster: Reasoning With Neural Tensor Networks for Knowledge Base Completion »
Richard Socher · Danqi Chen · Christopher D Manning · Andrew Y Ng -
2013 Poster: Zero-Shot Learning Through Cross-Modal Transfer »
Richard Socher · Milind Ganjoo · Christopher D Manning · Andrew Y Ng -
2013 Poster: Predicting Parameters in Deep Learning »
Misha Denil · Babak Shakibi · Laurent Dinh · Marc'Aurelio Ranzato · Nando de Freitas -
2013 Poster: Distributed Representations of Words and Phrases and their Compositionality »
Tomas Mikolov · Ilya Sutskever · Kai Chen · Greg Corrado · Jeff Dean -
2012 Poster: Recursive Deep Learning on 3D Point Clouds »
Richard Socher · Bharath Bath · Brody Huval · Christopher D Manning · Andrew Y Ng -
2012 Poster: Deep Learning of invariant features via tracked video sequences »
Will Y Zou · Andrew Y Ng · Shenghuo Zhu · Kai Yu -
2012 Poster: Emergence of Object-Selective Features in Unsupervised Feature Learning »
Adam Coates · Andrej Karpathy · Andrew Y Ng -
2011 Workshop: Challenges in Learning Hierarchical Models: Transfer Learning and Optimization »
Quoc V. Le · Marc'Aurelio Ranzato · Russ Salakhutdinov · Josh Tenenbaum · Andrew Y Ng -
2011 Workshop: Deep Learning and Unsupervised Feature Learning »
Yoshua Bengio · Adam Coates · Yann LeCun · Nicolas Le Roux · Andrew Y Ng -
2011 Poster: ICA with Reconstruction Cost for Efficient Overcomplete Feature Learning »
Quoc V. Le · Alexandre Karpenko · Jiquan Ngiam · Andrew Y Ng -
2011 Poster: Unfolding Recursive Autoencoders for Paraphrase Detection »
Richard Socher · Eric H Huang · Jeffrey Pennin · Andrew Y Ng · Christopher D Manning -
2011 Poster: Sparse Filtering »
Jiquan Ngiam · Pang Wei Koh · Zhenghao Chen · Sonia A Bhaskar · Andrew Y Ng -
2011 Spotlight: Sparse Filtering »
Jiquan Ngiam · Pang Wei Koh · Zhenghao Chen · Sonia A Bhaskar · Andrew Y Ng -
2011 Demonstration: Haptic Belt with Pedestrian Detection »
Jean Feng · Marc Rasi · Andrew Y Ng · Quoc V. Le · Morgan Quigley · Justin K Chen · Tiffany Low · Will Y Zou -
2011 Poster: Selecting Receptive Fields in Deep Networks »
Adam Coates · Andrew Y Ng -
2011 Poster: Unsupervised learning models of primary cortical receptive fields and receptive field plasticity »
Andrew M Saxe · Maneesh Bhand · Ritvik Mudur · Bipin Suresh · Andrew Y Ng -
2010 Workshop: Deep Learning and Unsupervised Feature Learning »
Honglak Lee · Marc'Aurelio Ranzato · Yoshua Bengio · Geoffrey E Hinton · Yann LeCun · Andrew Y Ng -
2010 Poster: Generating more realistic images using gated MRF's »
Marc'Aurelio Ranzato · Volodymyr Mnih · Geoffrey E Hinton -
2010 Poster: Tiled convolutional neural networks »
Quoc V. Le · Jiquan Ngiam · Zhenghao Chen · Daniel Jin hao Chia · Pang Wei Koh · Andrew Y Ng -
2010 Poster: Phone Recognition with the Mean-Covariance Restricted Boltzmann Machine »
George Dahl · Marc'Aurelio Ranzato · Abdel-rahman Mohamed · Geoffrey E Hinton -
2010 Poster: Energy Disaggregation via Discriminative Sparse Coding »
J. Zico Kolter · Siddarth Batra · Andrew Y Ng -
2009 Mini Symposium: Machine Learning for Sustainability »
J. Zico Kolter · Thomas Dietterich · Andrew Y Ng -
2009 Poster: Measuring Invariances in Deep Networks »
Ian Goodfellow · Quoc V. Le · Andrew M Saxe · Andrew Y Ng -
2009 Poster: Unsupervised feature learning for audio classification using convolutional deep belief networks »
Honglak Lee · Peter Pham · Yan Largman · Andrew Y Ng -
2008 Poster: Tighter Bounds for Structured Estimation »
Olivier Chapelle · Chuong B Do · Quoc V Le · Alexander Smola · Choon Hui Teo -
2008 Demonstration: High-Accuracy 3D Sensing for Mobile Manipulators »
Stephen Gould · Morgan Quigley · Siddarth Batra · Ellen Klingbiel · Quoc V Le · Andrew Y Ng -
2007 Poster: Sparse deep belief net model for visual area V2 »
Honglak Lee · Ekanadham Chaitanya · Andrew Y Ng -
2007 Demonstration: Holistic Scene Understanding from Visual and Range Data »
Stephen Gould · Morgan Quigley · Andrew Y Ng · Daphne Koller -
2007 Spotlight: Hierarchical Apprenticeship Learning with Application to Quadruped Locomotion »
J. Zico Kolter · Pieter Abbeel · Andrew Y Ng -
2007 Spotlight: Bundle Methods for Machine Learning »
Alexander Smola · Vishwanathan S V N · Quoc V Le -
2007 Poster: COFI RANK - Maximum Margin Matrix Factorization for Collaborative Ranking »
Markus Weimer · Alexandros Karatzoglou · Quoc V Le · Alexander Smola -
2007 Poster: Bundle Methods for Machine Learning »
Alexander Smola · Vishwanathan S V N · Quoc V Le -
2007 Poster: Hierarchical Apprenticeship Learning with Application to Quadruped Locomotion »
J. Zico Kolter · Pieter Abbeel · Andrew Y Ng -
2007 Spotlight: COFI RANK - Maximum Margin Matrix Factorization for Collaborative Ranking »
Markus Weimer · Alexandros Karatzoglou · Quoc V Le · Alexander Smola -
2007 Demonstration: Building a 3-D Model From a Single Still Image »
Ashutosh Saxena · min sun · Andrew Y Ng -
2007 Poster: Efficient multiple hyperparameter learning for log-linear models »
Chuong B Do · Chuan-Sheng Foo · Andrew Y Ng -
2007 Poster: Sparse Feature Learning for Deep Belief Networks »
Marc'Aurelio Ranzato · Y-Lan Boureau · Yann LeCun -
2006 Poster: Efficient Learning of Sparse Representations with an Energy-Based Model »
Marc'Aurelio Ranzato · Christopher Poultney · Sumit Chopra · Yann LeCun -
2006 Demonstration: Peripheral-Foveal Vision for Real-time Object Recognition »
Benjamin Sapp · Stephen Gould · Adrian Kaehler · Gary R Bradski · Andrew Y Ng -
2006 Spotlight: Efficient Learning of Sparse Representations with an Energy-Based Model »
Marc'Aurelio Ranzato · Christopher Poultney · Sumit Chopra · Yann LeCun -
2006 Poster: Robotic Grasping of Novel Objects »
Ashutosh Saxena · Justin Driemeyer · Justin Kearns · Andrew Y Ng -
2006 Poster: Map-Reduce for Machine Learning on Multicore »
Cheng-Tao Chu · Sang Kyun Kim · Yi-An Lin · YuanYuan Yu · Gary R Bradski · Andrew Y Ng · Kunle Olukotun -
2006 Poster: An Application of Reinforcement Learning to Aerobatic Helicopter Flight »
Pieter Abbeel · Adam P Coates · Andrew Y Ng · Morgan Quigley -
2006 Talk: Map-Reduce for Machine Learning on Multicore »
Cheng-Tao Chu · Sang Kyun Kim · Yi-An Lin · YuanYuan Yu · Gary R Bradski · Andrew Y Ng · Kunle Olukotun -
2006 Spotlight: Robotic Grasping of Novel Objects »
Ashutosh Saxena · Justin Driemeyer · Justin Kearns · Andrew Y Ng -
2006 Talk: An Application of Reinforcement Learning to Aerobatic Helicopter Flight »
Pieter Abbeel · Adam P Coates · Andrew Y Ng · Morgan Quigley -
2006 Poster: Learning to Rank with Nonsmooth Cost Functions »
Chris J Burges · Quoc Le · Robert J Ragno -
2006 Poster: Efficient sparse coding algorithms, end-stopping and nCRF surround suppression »
Honglak Lee · Alexis Battle · Raina Rajat · Andrew Y Ng