Timezone: »
Poster
Decentralized Cooperative Stochastic Bandits
David MartÃnez-Rubio · Varun Kanade · Patrick Rebeschini
Tue Dec 10 10:45 AM -- 12:45 PM (PST) @ East Exhibition Hall B + C #19
We study a decentralized cooperative stochastic multi-armed bandit problem with K arms on a network of N agents. In our model, the reward distribution of each arm is the same for each agent and rewards are drawn independently across agents and time steps. In each round, each agent chooses an arm to play and subsequently sends a message to her neighbors. The goal is to minimize the overall regret of the entire network. We design a fully decentralized algorithm that uses an accelerated consensus procedure to compute (delayed) estimates of the average of rewards obtained by all the agents for each arm, and then uses an upper confidence bound (UCB) algorithm that accounts for the delay and error of the estimates. We analyze the regret of our algorithm and also provide a lower bound. The regret is bounded by the optimal centralized regret plus a natural and simple term depending on the spectral gap of the communication matrix. Our algorithm is simpler to analyze than those proposed in prior work and it achieves better regret bounds, while requiring less information about the underlying network. It also performs better empirically.
Author Information
David MartÃnez-Rubio (University of Oxford)
Varun Kanade (University of Oxford)
Patrick Rebeschini (University of Oxford)
More from the Same Authors
-
2023 Poster: A Novel Framework for Policy Mirror Descent with General Parametrization and Linear Convergence »
Carlo Alfano · Rui Yuan · Patrick Rebeschini -
2023 Poster: Optimal Convergence Rate for Exact Policy Mirror Descent in Discounted Markov Decision Processes »
Emmeran Johnson · Ciara Pike-Burke · Patrick Rebeschini -
2021 Poster: Implicit Regularization in Matrix Sensing via Mirror Descent »
Fan Wu · Patrick Rebeschini -
2021 Poster: Distributed Machine Learning with Sparse Heterogeneous Data »
Dominic Richards · Sahand Negahban · Patrick Rebeschini -
2021 Poster: Time-independent Generalization Bounds for SGLD in Non-convex Settings »
Tyler Farghly · Patrick Rebeschini -
2021 Poster: On Optimal Interpolation in Linear Regression »
Eduard Oravkin · Patrick Rebeschini -
2020 Poster: A Continuous-Time Mirror Descent Approach to Sparse Phase Retrieval »
Fan Wu · Patrick Rebeschini -
2020 Spotlight: A Continuous-Time Mirror Descent Approach to Sparse Phase Retrieval »
Fan Wu · Patrick Rebeschini -
2020 Poster: The Statistical Complexity of Early-Stopped Mirror Descent »
Tomas Vaskevicius · Varun Kanade · Patrick Rebeschini -
2020 Spotlight: The Statistical Complexity of Early-Stopped Mirror Descent »
Tomas Vaskevicius · Varun Kanade · Patrick Rebeschini -
2020 Poster: Adaptive Reduced Rank Regression »
Qiong Wu · Felix MF Wong · Yanhua Li · Zhenming Liu · Varun Kanade -
2019 Poster: Implicit Regularization for Optimal Sparse Recovery »
Tomas Vaskevicius · Varun Kanade · Patrick Rebeschini -
2019 Poster: Optimal Statistical Rates for Decentralised Non-Parametric Regression with Linear Speed-Up »
Dominic Richards · Patrick Rebeschini -
2019 Poster: On the Hardness of Robust Classification »
Pascale Gourdeau · Varun Kanade · Marta Kwiatkowska · James Worrell -
2019 Spotlight: On the Hardness of Robust Classification »
Pascale Gourdeau · Varun Kanade · Marta Kwiatkowska · James Worrell -
2017 Poster: Accelerated consensus via Min-Sum Splitting »
Patrick Rebeschini · Sekhar C Tatikonda