Why (and When) does Local SGD Generalize Better than SGD?
Xinran Gu · Kaifeng Lyu · Longbo Huang · Sanjeev Arora
Thu 14:00 Variance Reduced ProxSkip: Algorithm, Theory and Application to Federated Learning
Grigory Malinovsky · Kai Yi · Peter Richtarik
Escaping Saddle Points with Bias-Variance Reduced Local Perturbed SGD for Communication Efficient Nonconvex Distributed Learning
Tomoya Murata · Taiji Suzuki