Timezone: »
We study the problem of stochastic optimization for deep learning in the parallel computing environment under communication constraints. A new algorithm is proposed in this setting where the communication and coordination of work among concurrent processes (local workers), is based on an elastic force which links the parameters they compute with a center variable stored by the parameter server (master). The algorithm enables the local workers to perform more exploration, i.e. the algorithm allows the local variables to fluctuate further from the center variable by reducing the amount of communication between local workers and the master. We empirically demonstrate that in the deep learning setting, due to the existence of many local optima, allowing more exploration can lead to the improved performance. We propose synchronous and asynchronous variants of the new algorithm. We provide the stability analysis of the asynchronous variant in the round-robin scheme and compare it with the more common parallelized method ADMM. We show that the stability of EASGD is guaranteed when a simple stability condition is satisfied, which is not the case for ADMM. We additionally propose the momentum-based version of our algorithm that can be applied in both synchronous and asynchronous settings. Asynchronous variant of the algorithm is applied to train convolutional neural networks for image classification on the CIFAR and ImageNet datasets. Experiments demonstrate that the new algorithm accelerates the training of deep architectures compared to DOWNPOUR and other common baseline approaches and furthermore is very communication efficient.
Author Information
Sixin Zhang (New York University)
Anna Choromanska (Courant Institute, NYU)
Yann LeCun (New York University)
Yann LeCun is Director of AI Research at Facebook, and Silver Professor of Data Science, Computer Science, Neural Science, and Electrical Engineering at New York University. He received the Electrical Engineer Diploma from ESIEE, Paris in 1983, and a PhD in Computer Science from Université Pierre et Marie Curie (Paris) in 1987. After a postdoc at the University of Toronto, he joined AT&T Bell Laboratories in Holmdel, NJ in 1988. He became head of the Image Processing Research Department at AT&T Labs-Research in 1996, and joined NYU as a professor in 2003, after a brief period as a Fellow of the NEC Research Institute in Princeton. From 2012 to 2014 he directed NYU's initiative in data science and became the founding director of the NYU Center for Data Science. He was named Director of AI Research at Facebook in late 2013 and retains a part-time position on the NYU faculty. His current interests include AI, machine learning, computer perception, mobile robotics, and computational neuroscience. He has published over 180 technical papers and book chapters on these topics as well as on neural networks, handwriting recognition, image processing and compression, and on dedicated circuits for computer perception.
More from the Same Authors
-
2022 Poster: The Effects of Regularization and Data Augmentation are Class Dependent »
Randall Balestriero · Leon Bottou · Yann LeCun -
2022 Poster: VICRegL: Self-Supervised Learning of Local Visual Features »
Adrien Bardes · Jean Ponce · Yann LeCun -
2022 Poster: Coarse-to-Fine Vision-Language Pre-training with Fusion in the Backbone »
Zi-Yi Dou · Aishwarya Kamath · Zhe Gan · Pengchuan Zhang · Jianfeng Wang · Linjie Li · Zicheng Liu · Ce Liu · Yann LeCun · Nanyun Peng · Jianfeng Gao · Lijuan Wang -
2022 Poster: Pre-Train Your Loss: Easy Bayesian Transfer Learning with Informative Priors »
Ravid Shwartz-Ziv · Micah Goldblum · Hossein Souri · Sanyam Kapoor · Chen Zhu · Yann LeCun · Andrew Wilson -
2022 Poster: A Data-Augmentation Is Worth A Thousand Samples: Analytical Moments And Sampling-Free Training »
Randall Balestriero · Ishan Misra · Yann LeCun -
2022 Poster: projUNN: efficient method for training deep networks with unitary matrices »
Bobak Kiani · Randall Balestriero · Yann LeCun · Seth Lloyd -
2022 Poster: Contrastive and Non-Contrastive Self-Supervised Learning Recover Global and Local Spectral Embedding Methods »
Randall Balestriero · Yann LeCun -
2020 : Panel Discussion & Closing »
Yejin Choi · Alexei Efros · Chelsea Finn · Kristen Grauman · Quoc V Le · Yann LeCun · Ruslan Salakhutdinov · Eric Xing -
2020 : QA: Yann LeCun »
Yann LeCun -
2020 : Invited Talk: Yann LeCun »
Yann LeCun -
2019 : TBD »
Yann LeCun -
2017 : Panel Session »
Neil Lawrence · Finale Doshi-Velez · Zoubin Ghahramani · Yann LeCun · Max Welling · Yee Whye Teh · Ole Winther -
2017 Tutorial: Geometric Deep Learning on Graphs and Manifolds »
Michael Bronstein · Joan Bruna · arthur szlam · Xavier Bresson · Yann LeCun -
2016 : Discussion panel »
Ian Goodfellow · Soumith Chintala · Arthur Gretton · Sebastian Nowozin · Aaron Courville · Yann LeCun · Emily Denton -
2016 : Energy-Based Adversarial Training and Video Prediction »
Yann LeCun -
2016 Workshop: Nonconvex Optimization for Machine Learning: Theory and Practice »
Hossein Mobahi · Anima Anandkumar · Percy Liang · Stefanie Jegelka · Anna Choromanska -
2016 Symposium: Deep Learning Symposium »
Yoshua Bengio · Yann LeCun · Navdeep Jaitly · Roger Grosse -
2015 Poster: Learning to Linearize Under Uncertainty »
Ross Goroshin · Michael Mathieu · Yann LeCun -
2015 Poster: Logarithmic Time Online Multiclass prediction »
Anna Choromanska · John Langford -
2015 Spotlight: Logarithmic Time Online Multiclass prediction »
Anna Choromanska · John Langford -
2015 Poster: Character-level Convolutional Networks for Text Classification »
Xiang Zhang · Junbo (Jake) Zhao · Yann LeCun -
2015 Poster: Deep learning with Elastic Averaging SGD »
Sixin Zhang · Anna Choromanska · Yann LeCun -
2015 Tutorial: Deep Learning »
Geoffrey E Hinton · Yoshua Bengio · Yann LeCun