Timezone: »
We consider distributed optimization under communication constraints for training deep learning models. We propose a new algorithm, whose parameter updates rely on two forces: a regular gradient step, and a corrective direction dictated by the currently best-performing worker (leader). Our method differs from the parameter-averaging scheme EASGD in a number of ways: (i) our objective formulation does not change the location of stationary points compared to the original optimization problem; (ii) we avoid convergence decelerations caused by pulling local workers descending to different local minima to each other (i.e. to the average of their parameters); (iii) our update by design breaks the curse of symmetry (the phenomenon of being trapped in poorly generalizing sub-optimal solutions in symmetric non-convex landscapes); and (iv) our approach is more communication efficient since it broadcasts only parameters of the leader rather than all workers. We provide theoretical analysis of the batch version of the proposed algorithm, which we call Leader Gradient Descent (LGD), and its stochastic variant (LSGD). Finally, we implement an asynchronous version of our algorithm and extend it to the multi-leader setting, where we form groups of workers, each represented by its own local leader (the best performer in a group), and update each worker with a corrective direction comprised of two attractive forces: one to the local, and one to the global leader (the best performer among all workers). The multi-leader setting is well-aligned with current hardware architecture, where local workers forming a group lie within a single computational node and different groups correspond to different nodes. For training convolutional neural networks, we empirically demonstrate that our approach compares favorably to state-of-the-art baselines.
Author Information
Yunfei Teng (New York University)
Wenbo Gao (Columbia University)
François Chalus (Credit Suisse)
Anna Choromanska (NYU)
Donald Goldfarb (Columbia University)
Adrian Weller (Cambridge, Alan Turing Institute)
Adrian Weller is Programme Director for AI at The Alan Turing Institute, the UK national institute for data science and AI, where he is also a Turing Fellow leading work on safe and ethical AI. He is a Principal Research Fellow in Machine Learning at the University of Cambridge, and at the Leverhulme Centre for the Future of Intelligence where he is Programme Director for Trust and Society. His interests span AI, its commercial applications and helping to ensure beneficial outcomes for society. He serves on several boards including the Centre for Data Ethics and Innovation. Previously, Adrian held senior roles in finance.
More from the Same Authors
-
2020 : Continual learning with direction-constrained optimization »
Yunfei Teng -
2021 Spotlight: Tensor Normal Training for Deep Learning Models »
Yi Ren · Donald Goldfarb -
2022 Poster: Scalable Infomin Learning »
Yanzhi Chen · weihao sun · Yingzhen Li · Adrian Weller -
2022 : Conformal Prediction for Resource Prioritisation in Predicting Rare and Dangerous Outcomes »
Varun Babbar · Umang Bhatt · Miri Zilka · Adrian Weller -
2022 : Efficient Second-Order Stochastic Methods for Machine Learning »
Donald Goldfarb -
2022 Poster: Concept Embedding Models: Beyond the Accuracy-Explainability Trade-Off »
Mateo Espinosa Zarlenga · Pietro Barbiero · Gabriele Ciravegna · Giuseppe Marra · Francesco Giannini · Michelangelo Diligenti · Zohreh Shams · Frederic Precioso · Stefano Melacci · Adrian Weller · Pietro Lió · Mateja Jamnik -
2022 Poster: Chefs' Random Tables: Non-Trigonometric Random Features »
Valerii Likhosherstov · Krzysztof M Choromanski · Kumar Avinava Dubey · Frederick Liu · Tamas Sarlos · Adrian Weller -
2022 Poster: A Survey and Datasheet Repository of Publicly Available US Criminal Justice Datasets »
Miri Zilka · Bradley Butcher · Adrian Weller -
2021 Workshop: Privacy in Machine Learning (PriML) 2021 »
Yu-Xiang Wang · Borja Balle · Giovanni Cherubin · Kamalika Chaudhuri · Antti Honkela · Jonathan Lebensold · Casey Meehan · Mi Jung Park · Adrian Weller · Yuqing Zhu -
2021 Workshop: Human Centered AI »
Michael Muller · Plamen P Angelov · Shion Guha · Marina Kogan · Gina Neff · Nuria Oliver · Manuel Rodriguez · Adrian Weller -
2021 Workshop: AI for Science: Mind the Gaps »
Payal Chandak · Yuanqi Du · Tianfan Fu · Wenhao Gao · Kexin Huang · Shengchao Liu · Ziming Liu · Gabriel Spadon · Max Tegmark · Hanchen Wang · Adrian Weller · Max Welling · Marinka Zitnik -
2021 Poster: Tensor Normal Training for Deep Learning Models »
Yi Ren · Donald Goldfarb -
2020 : Introduction: Danielle Bassett »
Anna Choromanska -
2020 : Introduction: Yoshua Bengio »
Anna Choromanska -
2020 : Live Intro »
Mateusz Malinowski · Viorica Patraucean · Grzegorz Swirszcz · Sindy Löwe · Anna Choromanska · Marco Gori · Yanping Huang -
2020 : Invited speaker: Practical Kronecker-factored BFGS and L-BFGS methods for training deep neural networks, Donald Goldfarb »
Donald Goldfarb -
2020 Workshop: Privacy Preserving Machine Learning - PriML and PPML Joint Edition »
Borja Balle · James Bell · Aurélien Bellet · Kamalika Chaudhuri · Adria Gascon · Antti Honkela · Antti Koskela · Casey Meehan · Olga Ohrimenko · Mi Jung Park · Mariana Raykova · Mary Anne Smart · Yu-Xiang Wang · Adrian Weller -
2020 Poster: Practical Quasi-Newton Methods for Training Deep Neural Networks »
Donald Goldfarb · Yi Ren · Achraf Bahamou -
2020 Poster: Ode to an ODE »
Krzysztof Choromanski · Jared Quincy Davis · Valerii Likhosherstov · Xingyou Song · Jean-Jacques Slotine · Jacob Varley · Honglak Lee · Adrian Weller · Vikas Sindhwani -
2020 Spotlight: Practical Quasi-Newton Methods for Training Deep Neural Networks »
Donald Goldfarb · Yi Ren · Achraf Bahamou -
2019 Workshop: Privacy in Machine Learning (PriML) »
Borja Balle · Kamalika Chaudhuri · Antti Honkela · Antti Koskela · Casey Meehan · Mi Jung Park · Mary Anne Smart · Mary Anne Smart · Adrian Weller -
2019 : Poster Session »
Jonathan Scarlett · Piotr Indyk · Ali Vakilian · Adrian Weller · Partha P Mitra · Benjamin Aubin · Bruno Loureiro · Florent Krzakala · Lenka Zdeborová · Kristina Monakhova · Joshua Yurtsever · Laura Waller · Hendrik Sommerhoff · Michael Moeller · Rushil Anirudh · Shuang Qiu · Xiaohan Wei · Zhuoran Yang · Jayaraman Thiagarajan · Salman Asif · Michael Gillhofer · Johannes Brandstetter · Sepp Hochreiter · Felix Petersen · Dhruv Patel · Assad Oberai · Akshay Kamath · Sushrut Karmalkar · Eric Price · Ali Ahmed · Zahra Kadkhodaie · Sreyas Mohan · Eero Simoncelli · Carlos Fernandez-Granda · Oscar Leong · Wesam Sakla · Rebecca Willett · Stephan Hoyer · Jascha Sohl-Dickstein · Sam Greydanus · Gauri Jagatap · Chinmay Hegde · Michael Kellman · Jonathan Tamir · Nouamane Laanait · Ousmane Dia · Mirco Ravanelli · Jonathan Binas · Negar Rostamzadeh · Shirin Jalali · Tiantian Fang · Alex Schwing · Sébastien Lachapelle · Philippe Brouillard · Tristan Deleu · Simon Lacoste-Julien · Stella Yu · Arya Mazumdar · Ankit Singh Rawat · Yue Zhao · Jianshu Chen · Xiaoyang Li · Hubert Ramsauer · Gabrio Rizzuti · Nikolaos Mitsakos · Dingzhou Cao · Thomas Strohmer · Yang Li · Pei Peng · Gregory Ongie -
2019 : Economical use of second-order information in training machine learning models »
Donald Goldfarb -
2019 Workshop: Workshop on Human-Centric Machine Learning »
Plamen P Angelov · Nuria Oliver · Adrian Weller · Manuel Rodriguez · Isabel Valera · Silvia Chiappa · Hoda Heidari · Niki Kilbertus -
2018 Workshop: Privacy Preserving Machine Learning »
Adria Gascon · Aurélien Bellet · Niki Kilbertus · Olga Ohrimenko · Mariana Raykova · Adrian Weller -
2018 Poster: Geometrically Coupled Monte Carlo Sampling »
Mark Rowland · Krzysztof Choromanski · François Chalus · Aldo Pacchiano · Tamas Sarlos · Richard Turner · Adrian Weller -
2018 Spotlight: Geometrically Coupled Monte Carlo Sampling »
Mark Rowland · Krzysztof Choromanski · François Chalus · Aldo Pacchiano · Tamas Sarlos · Richard Turner · Adrian Weller -
2017 : Invited talk: Challenges for Transparency »
Adrian Weller -
2017 : Closing remarks »
Adrian Weller -
2017 Symposium: Kinds of intelligence: types, tests and meeting the needs of society »
José Hernández-Orallo · Zoubin Ghahramani · Tomaso Poggio · Adrian Weller · Matthew Crosby -
2017 Poster: From Parity to Preference-based Notions of Fairness in Classification »
Muhammad Bilal Zafar · Isabel Valera · Manuel Rodriguez · Krishna Gummadi · Adrian Weller -
2017 Poster: The Unreasonable Effectiveness of Structured Random Orthogonal Embeddings »
Krzysztof Choromanski · Mark Rowland · Adrian Weller -
2017 Poster: Uprooting and Rerooting Higher-Order Graphical Models »
Mark Rowland · Adrian Weller -
2016 Workshop: Reliable Machine Learning in the Wild »
Dylan Hadfield-Menell · Adrian Weller · David Duvenaud · Jacob Steinhardt · Percy Liang -
2016 Symposium: Machine Learning and the Law »
Adrian Weller · Thomas D. Grant · Conrad McDonnell · Jatinder Singh -
2015 Symposium: Algorithms Among Us: the Societal Impacts of Machine Learning »
Michael A Osborne · Adrian Weller · Murray Shanahan -
2014 Poster: Clamping Variables and Approximate Inference »
Adrian Weller · Tony Jebara -
2014 Oral: Clamping Variables and Approximate Inference »
Adrian Weller · Tony Jebara -
2010 Poster: Sparse Inverse Covariance Selection via Alternating Linearization Methods »
Katya Scheinberg · Shiqian Ma · Donald Goldfarb