Timezone: »

On the Unreasonable Effectiveness of Federated Averaging with Heterogenous Data
Jianyu Wang

Fri Dec 02 01:30 PM -- 01:48 PM (PST) @

Existing theory predicts that data heterogeneity will degrade the performance of the Federated Averaging (FedAvg) algorithm. However, in practice, the simple FedAvg algorithm converges very well. In this talk, we explain the seemingly unreasonable effectiveness of FedAvg that contradicts the previous theoretical predictions. We find that the key assumption of bounded gradient dissimilarity in previous theoretical analyses is too pessimistic to characterize data heterogeneity in practical applications. For a simple quadratic problem, we demonstrate there exist regimes where large gradient dissimilarity does not have any negative impact on the convergence of FedAvg. Motivated by this observation, we propose a new quantity average drift at optimum to measure the effects of data heterogeneity and explicitly use it to present a new theoretical analysis of FedAvg. We show that the average drift at optimum is nearly zero across many real-world federated training tasks, whereas the gradient dissimilarity can be large. And our new analysis suggests FedAvg can have identical convergence rates in homogeneous and heterogeneous data settings, and hence, leads to a better understanding of its empirical success.

Author Information

Jianyu Wang (Meta)

More from the Same Authors