We focus on federated learning in practical recommender systems and natural language processing scenarios. The global model for federated optimization typically contains a large and sparse embedding layer, while each client’s local data tend to interact with part of features, updating only a small submodel with the feature-related embedding vectors. We identify a new and important issue that distinct data features normally involve different numbers of clients, generating the differentiation of hot and cold features. We further reveal that the classical federated averaging algorithm (FedAvg) or its variants, which randomly selects clients to participate and uniformly averages their submodel updates, will be severely slowed down, because different parameters of the global model are optimized at different speeds. More specifically, the model parameters related to hot (resp., cold) features will be updated quickly (resp., slowly). We thus propose federated submodel averaging (FedSubAvg), which introduces the number of feature-related clients as the metric of feature heat to correct the aggregation of submodel updates. We prove that due to the dispersion of feature heat, the global objective is ill-conditioned, and FedSubAvg works as a suitable diagonal preconditioner. We also rigorously analyze FedSubAvg’s convergence rate to stationary points. We finally evaluate FedSubAvg over several public and industrial datasets. The evaluation results demonstrate that FedSubAvg significantly outperforms FedAvg and its variants.
Yucheng Ding (Shanghai Jiao Tong University)
Chaoyue Niu (Shanghai Jiao Tong University)
I received my Ph.D. degree in the Department of Computer Science and Engineering, Shanghai Jiao Tong University, in 2021. I received my B.S. degree from the IEEE Honor Class, Shanghai Jiao Tong University, in 2017. My research interests include data sharing (e.g., cross-device federated learning and data trading), deep learning (e.g., recommender systems and distributed optimization), as well as security and privacy (e.g., privacy preservation and verifiable computation).
Fan Wu (Shanghai Jiao Tong University)
Shaojie Tang (University of Texas, Dallas)
yanghe feng (National University of Defense Technology)
Guihai Chen (Shanghai Jiao Tong University)
More from the Same Authors
2019 Poster: Learning Latent Process from High-Dimensional Event Sequences via Efficient Sampling »
Qitian Wu · Zixuan Zhang · Xiaofeng Gao · Junchi Yan · Guihai Chen