Skip to yearly menu bar Skip to main content

Workshop: OPT 2023: Optimization for Machine Learning

On the Convergence of Local SGD Under Third-Order Smoothness and Hessian Similarity

Ali Zindari · Ruichen Luo · Sebastian Stich


Local SGD (i.e., Federated Averaging without client sampling) is widely used for solving federated optimization problems in the presence of heterogeneous data.However, there is a gap between the existing convergence rates for Local SGD and its observed performance on real-world problems. It seems that current rates do not correctly capture the effectiveness Local SGD. We first show that the existing rates for Local SGD in heterogeneous setting cannot recover the correct rate when the global function is a quadratic. Then we first derive a new rate for the case that the global function is a general strongly convex function depending on third-order smoothness and Hessian similarity. These additional parameters allow us to capture the problem in a more refined way and to overcome some of the limitations of the previous worst-case results derived under the standard assumptions. Then we show a rate for Local SGD when all clients are non-convex quadratic functions with identical Hessians.

Chat is not available.