Poster
Sketching for Distributed Deep Learning: A Sharper Analysis
Mayank Shrivastava · Berivan Isik · Qiaobo Li · Sanmi Koyejo · Arindam Banerjee
West Ballroom A-D #6004
[
Abstract
]
Thu 12 Dec 11 a.m. PST
— 2 p.m. PST
Abstract:
The high communication cost between the server and the clients is a significant bottleneck in scaling distributed learning for modern overparameterized deep models. One popular approach to reduce this cost is linear sketching, where the sender projects the updates into a lower dimension before communication, and the receiver desketches before any subsequent computation. While sketched distributed learning is known to scale effectively in practice, existing theoretical analyses suggest that the convergence error depends on the ambient dimension -- impacting scalability. This paper aims to shed light on this apparent mismatch between theory and practice. Our main result is a tighter analysis that eliminates the dimension dependence in sketching without imposing unrealistic restrictive assumptions in the distributed learning setup. With the approximate restricted strong smoothness property of overparameterized deep models and using the second-order geometry of the loss, we present optimization results for the single-local step and $K$-local step distributed learning and subsequent bounds on communication complexity, with implications for analyzing and implementing distributed learning for overparameterized deep models.
Live content is unavailable. Log in and register to view live content