Timezone: »
Stochastic Gradient Descent (SGD) is a popular algorithm that can achieve state-of-the-art performance on a variety of machine learning tasks. Several researchers have recently proposed schemes to parallelize SGD, but all require performance-destroying memory locking and synchronization. This work aims to show using novel theoretical analysis, algorithms, and implementation that SGD can be implemented without any locking. We present an update scheme called Hogwild which allows processors access to shared memory with the possibility of overwriting each other's work. We show that when the associated optimization problem is sparse, meaning most gradient updates only modify small parts of the decision variable, then Hogwild achieves a nearly optimal rate of convergence. We demonstrate experimentally that Hogwild outperforms alternative schemes that use locking by an order of magnitude.
Author Information
Benjamin Recht (UW-Madison)
Christopher Re (UW-Madison)
Stephen Wright (UW-Madison)
Steve Wright is a Professor of Computer Sciences at the University of Wisconsin-Madison. His research interests lie in computational optimization and its applications to science and engineering. Prior to joining UW-Madison in 2001, Wright was a Senior Computer Scientist (1997-2001) and Computer Scientist (1990-1997) at Argonne National Laboratory, and Professor of Computer Science at the University of Chicago (2000-2001). He is the past Chair of the Mathematical Optimization Society (formerly the Mathematical Programming Society), the leading professional society in optimization, and a member of the Board of the Society for Industrial and Applied Mathematics (SIAM). Wright is the author or co-author of four widely used books in numerical optimization, including "Primal Dual Interior-Point Methods" (SIAM, 1997) and "Numerical Optimization" (with J. Nocedal, Second Edition, Springer, 2006). He has also authored over 85 refereed journal papers on optimization theory, algorithms, software, and applications. He is coauthor of widely used interior-point software for linear and quadratic optimization. His recent research includes algorithms, applications, and theory for sparse optimization (including applications in compressed sensing and machine learning).
Feng Niu (University of Wisconsin-Madison)
More from the Same Authors
-
2022 : BOME! Bilevel Optimization Made Easy: A Simple First-Order Approach »
Mao Ye · Bo Liu · Stephen Wright · Peter Stone · Qiang Liu -
2022 Poster: BOME! Bilevel Optimization Made Easy: A Simple First-Order Approach »
Bo Liu · Mao Ye · Stephen Wright · Peter Stone · Qiang Liu -
2022 Poster: Coordinate Linear Variance Reduction for Generalized Linear Programming »
Chaobing Song · Cheuk Yin Lin · Stephen Wright · Jelena Diakonikolas -
2020 Oral: Hogwild!: A Lock-Free Approach to Parallelizing Stochastic Gradient Descent »
Benjamin Recht · Christopher RĂ© · Stephen Wright · Feng Niu -
2019 : Second-order methods for nonconvex optimization with complexity guarantees »
Stephen Wright -
2018 Poster: ATOMO: Communication-efficient Learning via Atomic Sparsification »
Hongyi Wang · Scott Sievert · Shengchao Liu · Zachary Charles · Dimitris Papailiopoulos · Stephen Wright -
2017 Poster: k-Support and Ordered Weighted Sparsity for Overlapping Groups: Hardness and Algorithms »
Cong Han Lim · Stephen Wright -
2014 Poster: Beyond the Birkhoff Polytope: Convex Relaxations for Vector Permutation Problems »
Cong Han Lim · Stephen Wright -
2013 Workshop: Large Scale Matrix Analysis and Inference »
Reza Zadeh · Gunnar Carlsson · Michael Mahoney · Manfred K. Warmuth · Wouter M Koolen · Nati Srebro · Satyen Kale · Malik Magdon-Ismail · Ashish Goel · Matei A Zaharia · David Woodruff · Ioannis Koutis · Benjamin Recht -
2013 Poster: An Approximate, Efficient LP Solver for LP Rounding »
Srikrishna Sridhar · Stephen Wright · Christopher Re · Ji Liu · Victor Bittorf · Ce Zhang -
2012 Workshop: Log-Linear Models »
Dimitri Kanevsky · Tony Jebara · Li Deng · Stephen Wright · Georg Heigold · Avishy Carmi -
2012 Poster: Factoring nonnegative matrices with linear programs »
Benjamin Recht · Christopher Re · Joel A Tropp · Victor Bittorf -
2012 Poster: Query Complexity of Derivative-Free Optimization »
Kevin G Jamieson · Rob Nowak · Benjamin Recht -
2012 Spotlight: Factoring nonnegative matrices with linear programs »
Benjamin Recht · Christopher Re · Joel A Tropp · Victor Bittorf -
2011 Workshop: Optimization for Machine Learning »
Suvrit Sra · Stephen Wright · Sebastian Nowozin -
2010 Workshop: Optimization for Machine Learning »
Suvrit Sra · Sebastian Nowozin · Stephen Wright -
2010 Poster: Transduction with Matrix Completion: Three Birds with One Stone »
Andrew B Goldberg · Jerry Zhu · Benjamin Recht · Junming Sui · Rob Nowak -
2010 Poster: Practical Large-Scale Optimization for Max-norm Regularization »
Jason D Lee · Benjamin Recht · Russ Salakhutdinov · Nati Srebro · Joel A Tropp -
2010 Tutorial: Optimization Algorithms in Machine Learning »
Stephen Wright -
2009 Workshop: Optimization for Machine Learning »
Sebastian Nowozin · Suvrit Sra · S.V.N Vishwanthan · Stephen Wright