Timezone: »
For distributed computing environment, we consider the empirical risk minimization problem and propose a distributed and communication-efficient Newton-type optimization method. At every iteration, each worker locally finds an Approximate NewTon (ANT) direction, which is sent to the main driver. The main driver, then, averages all the ANT directions received from workers to form a Globally Improved ANT (GIANT) direction. GIANT is highly communication efficient and naturally exploits the trade-offs between local computations and global communications in that more local computations result in fewer overall rounds of communications. Theoretically, we show that GIANT enjoys an improved convergence rate as compared with first-order methods and existing distributed Newton-type methods. Further, and in sharp contrast with many existing distributed Newton-type methods, as well as popular first-order methods, a highly advantageous practical feature of GIANT is that it only involves one tuning parameter. We conduct large-scale experiments on a computer cluster and, empirically, demonstrate the superior performance of GIANT.
Author Information
Shusen Wang (UC Berkeley)
Fred Roosta (University of Queensland)
Peng Xu (Stanford University)
Michael Mahoney (UC Berkeley)
More from the Same Authors
-
2021 Spotlight: Newton-LESS: Sparsification without Trade-offs for the Sketched Newton Update »
Michal Derezinski · Jonathan Lacotte · Mert Pilanci · Michael Mahoney -
2023 Poster: Characterizing Scaling and Transfer Learning of Neural Networks for Scientific Machine Learning »
Shashank Subramanian · Peter Harrington · Kurt Keutzer · Wahid Bhimji · Dmitriy Morozov · Michael Mahoney · Amir Gholami -
2023 Poster: Temperature Balancing, Layer-wise Weight Analysis, and Neural Network Training »
Yefan Zhou · TIANYU PANG · Keqin Liu · charles martin · Michael Mahoney · Yaoqing Yang -
2023 Poster: When are ensembles really effective? »
Ryan Theisen · Hyunsuk Kim · Yaoqing Yang · Liam Hodgkinson · Michael Mahoney -
2023 Poster: A Heavy-Tailed Algebra for Probabilistic Programming »
Feynman Liang · Liam Hodgkinson · Michael Mahoney -
2023 Poster: Big Little Transformer Decoder »
Sehoon Kim · Karttikeya Mangalam · Suhong Moon · Jitendra Malik · Michael Mahoney · Amir Gholami · Kurt Keutzer -
2023 Tutorial: Recent and Upcoming Developments in Randomized Numerical Linear Algebra for ML »
Michal Derezinski · Michael Mahoney -
2023 Workshop: Heavy Tails in ML: Structure, Stability, Dynamics »
Mert Gurbuzbalaban · Stefanie Jegelka · Michael Mahoney · Umut Simsekli -
2022 Poster: A Fast Post-Training Pruning Framework for Transformers »
Woosuk Kwon · Sehoon Kim · Michael Mahoney · Joseph Hassoun · Kurt Keutzer · Amir Gholami -
2022 Poster: Squeezeformer: An Efficient Transformer for Automatic Speech Recognition »
Sehoon Kim · Amir Gholami · Albert Shaw · Nicholas Lee · Karttikeya Mangalam · Jitendra Malik · Michael Mahoney · Kurt Keutzer -
2022 Poster: LSAR: Efficient Leverage Score Sampling Algorithm for the Analysis of Big Time Series Data »
Ali Eshragh · Fred Roosta · Asef Nazari · Michael Mahoney -
2021 : Q&A with Michael Mahoney »
Michael Mahoney -
2021 : Putting Randomized Matrix Algorithms in LAPACK, and Connections with Second-order Stochastic Optimization, Michael Mahoney »
Michael Mahoney -
2021 Poster: Newton-LESS: Sparsification without Trade-offs for the Sketched Newton Update »
Michal Derezinski · Jonathan Lacotte · Mert Pilanci · Michael Mahoney -
2021 Poster: Noisy Recurrent Neural Networks »
Soon Hoe Lim · N. Benjamin Erichson · Liam Hodgkinson · Michael Mahoney -
2021 Poster: Hessian Eigenspectra of More Realistic Nonlinear Models »
Zhenyu Liao · Michael Mahoney -
2021 Poster: Characterizing possible failure modes in physics-informed neural networks »
Aditi Krishnapriyan · Amir Gholami · Shandian Zhe · Robert Kirby · Michael Mahoney -
2021 Poster: Taxonomizing local versus global structure in neural network loss landscapes »
Yaoqing Yang · Liam Hodgkinson · Ryan Theisen · Joe Zou · Joseph Gonzalez · Kannan Ramchandran · Michael Mahoney -
2021 Poster: Stateful ODE-Nets using Basis Function Expansions »
Alejandro Queiruga · N. Benjamin Erichson · Liam Hodgkinson · Michael Mahoney -
2021 Oral: Hessian Eigenspectra of More Realistic Nonlinear Models »
Zhenyu Liao · Michael Mahoney -
2020 Poster: Boundary thickness and robustness in learning models »
Yaoqing Yang · Rajiv Khanna · Yaodong Yu · Amir Gholami · Kurt Keutzer · Joseph Gonzalez · Kannan Ramchandran · Michael Mahoney -
2020 Poster: Debiasing Distributed Second Order Optimization with Surrogate Sketching and Scaled Regularization »
Michal Derezinski · Burak Bartan · Mert Pilanci · Michael Mahoney -
2020 Poster: Exact expressions for double descent and implicit regularization via surrogate random design »
Michal Derezinski · Feynman Liang · Michael Mahoney -
2020 Poster: Improved guarantees and a multiple-descent curve for Column Subset Selection and the Nystrom method »
Michal Derezinski · Rajiv Khanna · Michael Mahoney -
2020 Poster: Precise expressions for random projections: Low-rank approximation and randomized Newton »
Michal Derezinski · Feynman Liang · Zhenyu Liao · Michael Mahoney -
2020 Oral: Improved guarantees and a multiple-descent curve for Column Subset Selection and the Nystrom method »
Michal Derezinski · Rajiv Khanna · Michael Mahoney -
2020 Poster: A random matrix analysis of random Fourier features: beyond the Gaussian kernel, a precise phase transition, and the corresponding double descent »
Zhenyu Liao · Romain Couillet · Michael Mahoney -
2020 Poster: A Statistical Framework for Low-bitwidth Training of Deep Neural Networks »
Jianfei Chen · Yu Gai · Zhewei Yao · Michael Mahoney · Joseph Gonzalez -
2019 : Final remarks »
Anastasios Kyrillidis · Albert Berahas · Fred Roosta · Michael Mahoney -
2019 Workshop: Beyond first order methods in machine learning systems »
Anastasios Kyrillidis · Albert Berahas · Fred Roosta · Michael Mahoney -
2019 : Opening Remarks »
Anastasios Kyrillidis · Albert Berahas · Fred Roosta · Michael Mahoney -
2019 Poster: ANODEV2: A Coupled Neural ODE Framework »
Tianjun Zhang · Zhewei Yao · Amir Gholami · Joseph Gonzalez · Kurt Keutzer · Michael Mahoney · George Biros -
2019 Poster: DINGO: Distributed Newton-Type Method for Gradient-Norm Optimization »
Rixon Crane · Fred Roosta -
2019 Poster: Distributed estimation of the inverse Hessian by determinantal averaging »
Michal Derezinski · Michael Mahoney -
2018 Poster: Hessian-based Analysis of Large Batch Training and Robustness to Adversaries »
Zhewei Yao · Amir Gholami · Qi Lei · Kurt Keutzer · Michael Mahoney -
2017 Poster: Union of Intersections (UoI) for Interpretable Data Driven Discovery and Prediction »
Kristofer Bouchard · Alejandro Bujan · Farbod Roosta-Khorasani · Shashanka Ubaru · Mr. Prabhat · Antoine Snijders · Jian-Hua Mao · Edward Chang · Michael W Mahoney · Sharmodeep Bhattacharya -
2016 Poster: Feature-distributed sparse regression: a screen-and-clean approach »
Jiyan Yang · Michael Mahoney · Michael Saunders · Yuekai Sun -
2016 Poster: Sub-sampled Newton Methods with Non-uniform Sampling »
Peng Xu · Jiyan Yang · Farbod Roosta-Khorasani · Christopher RĂ© · Michael Mahoney -
2015 : Challenges in Multiresolution Methods for Graph-based Learning »
Michael Mahoney -
2015 : Using Local Spectral Methods in Theory and in Practice »
Michael Mahoney -
2015 Poster: Fast Randomized Kernel Ridge Regression with Statistical Guarantees »
Ahmed Alaoui · Michael Mahoney -
2013 Workshop: Large Scale Matrix Analysis and Inference »
Reza Zadeh · Gunnar Carlsson · Michael Mahoney · Manfred K. Warmuth · Wouter M Koolen · Nati Srebro · Satyen Kale · Malik Magdon-Ismail · Ashish Goel · Matei A Zaharia · David Woodruff · Ioannis Koutis · Benjamin Recht -
2012 Poster: Semi-supervised Eigenvectors for Locally-biased Learning »
Toke Jansen Hansen · Michael W Mahoney -
2012 Poster: A Scalable CUR Matrix Decomposition Algorithm: Lower Time Complexity and Tighter Bound »
Shusen Wang · Zhihua Zhang -
2011 Workshop: Sparse Representation and Low-rank Approximation »
Ameet S Talwalkar · Lester W Mackey · Mehryar Mohri · Michael W Mahoney · Francis Bach · Mike Davies · Remi Gribonval · Guillaume R Obozinski -
2011 Poster: Regularized Laplacian Estimation and Fast Eigenvector Approximation »
Patrick O Perry · Michael W Mahoney -
2010 Workshop: Low-rank Methods for Large-scale Machine Learning »
Arthur Gretton · Michael W Mahoney · Mehryar Mohri · Ameet S Talwalkar -
2010 Poster: CUR from a Sparse Optimization Viewpoint »
Jacob Bien · Ya Xu · Michael W Mahoney -
2009 Poster: Unsupervised Feature Selection for the $k$-means Clustering Problem »
Christos Boutsidis · Michael W Mahoney · Petros Drineas