`

Timezone: »

On Server-Side Stepsizes in Federated Optimization: Theory Explaining the Heuristics
Grigory Malinovsky · Konstantin Mishchenko · Peter Richtarik
We present a theoretical study of server-side optimization in federated learning. Our results are the first to show that the widely popular heuristic of scaling the client updates with an extra parameter is extremely useful in the context of Federated Averaging (FedAvg) with local passes over the client data. In particular, we prove that whenever the local stepsizes are small and the update direction is given by FedAvg over all clients, one can take a big leap in the obtained direction and improve the nonconvex rate of convergence from $\mathcal{O}(\varepsilon^{-3})$ to $\mathcal{O}(\varepsilon^{-2})$. In contrast, if the local stepsizes are large, we prove that the noise of client sampling can be controlled by using a small server-side stepsize. Together, our results on the advantage of large and small sever-side stepsizes give a formal theoretical justification for the practice of adaptive server-side optimization in federated learning. Moreover, we consider multiple strategies of client participation and cover the options of uniform client sampling, deterministic or adversarial permutation of clients, as well as random permutation of clients.