Oral Poster
The Sample-Communication Complexity Trade-off in Federated Q-Learning
Sudeep Salgia · Yuejie Chi
West Ballroom A-D #7008
[
Abstract
]
Oral
presentation:
Oral Session 2B: Reinforcement Learning
Wed 11 Dec 3:30 p.m. PST — 4:30 p.m. PST
Wed 11 Dec 4:30 p.m. PST
— 7:30 p.m. PST
Wed 11 Dec 3:30 p.m. PST — 4:30 p.m. PST
Abstract:
We consider the problem of Federated Q-learning, where $M$ agents aim to collaboratively learn the optimal Q-function of an unknown infinite horizon Markov Decision Process with finite state and action spaces. We investigate the trade-off between sample and communication complexity for the widely used class of intermittent communication algorithms. We first establish the converse result, where we show that any Federated Q-learning that offers a linear speedup with respect to number of agents in sample complexity needs to incur a communication cost of at least $\Omega(\frac{1}{1-\gamma})$, where $\gamma$ is the discount factor. We also propose a new Federated Q-learning algorithm, called Fed-DVR-Q, which is the first Federated Q-learning algorithm to simultaneously achieve order-optimal sample and communication complexities. Thus, together these results provide a complete characterization of the sample-communication complexity trade-off in Federated Q-learning.
Live content is unavailable. Log in and register to view live content