Poster
CANITA: Faster Rates for Distributed Convex Optimization with Communication Compression
Zhize Li · Peter Richtarik
Virtual
Keywords: [ Optimization ] [ Federated Learning ]
Abstract:
Due to the high communication cost in distributed and federated learning, methods relying on compressed communication are becoming increasingly popular. Besides, the best theoretically and practically performing gradient-type methods invariably rely on some form of acceleration/momentum to reduce the number of communications (faster convergence), e.g., Nesterov's accelerated gradient descent [31, 32] and Adam [14]. In order to combine the benefits of communication compression and convergence acceleration, we propose a \emph{compressed and accelerated} gradient method based on ANITA [20] for distributed optimization, which we call CANITA. Our CANITA achieves the \emph{first accelerated rate} O(√(1+√ω3n)Lϵ+ω(1ϵ)13), which improves upon the state-of-the-art non-accelerated rate O((1+ωn)Lϵ+ω2+ωω+n1ϵ) of DIANA [12] for distributed general convex problems, where ϵ is the target error, L is the smooth parameter of the objective, n is the number of machines/devices, and ω is the compression parameter (larger ω means more compression can be applied, and no compression implies ω=0). Our results show that as long as the number of devices n is large (often true in distributed/federated learning), or the compression ω is not very high, CANITA achieves the faster convergence rate O(√Lϵ), i.e., the number of communication rounds is O(√Lϵ) (vs. O(Lϵ) achieved by previous works). As a result, CANITA enjoys the advantages of both compression (compressed communication in each round) and acceleration (much fewer communication rounds).
Chat is not available.