Timezone: »
Poster
Cooperative Stochastic Bandits with Asynchronous Agents and Constrained Feedback
Lin Yang · Yu-Zhen Janice Chen · Stephen Pasteris · Mohammad Hajiesmaili · John C. S. Lui · Don Towsley
This paper studies a cooperative multi-armed bandit problem with $M$ agents cooperating together to solve the same instance of a $K$-armed stochastic bandit problem with the goal of maximizing the cumulative reward of agents. The agents are heterogeneous in (i) their limited access to a local subset of arms; and (ii) their decision-making rounds, i.e., agents are asynchronous with different decision-making gaps. The goal is to find the global optimal arm and agents are able to pull any arm, however, they observe the reward only when the selected arm is local.The challenge is a tradeoff for agents between pulling a local arm with the possibility of observing the feedback, or relying on the observations of other agents that might occur at different rates. Naive extensions of traditional algorithms lead to an arbitrarily poor regret as a function of aggregate action frequency of any $\textit{suboptimal}$ arm located in slow agents. We resolve this issue by proposing a novel two-stage learning algorithm, called $\texttt{CO-LCB}$ algorithm, whose regret is a function of aggregate action frequency of agents containing the $\textit{optimal}$ arm. We also show that the regret of $\texttt{CO-LCB}$ matches the regret lower bound up to a small factor.
Author Information
Lin Yang (UMass)
Yu-Zhen Janice Chen (College of Information and Computer Science, University of Massachusetts, Amherst)
Stephen Pasteris (University College London)
Mohammad Hajiesmaili (UMass Amherst)
John C. S. Lui (The Chinese University of Hong Kong)
Don Towsley (UMass - Amherst)
More from the Same Authors
-
2022 Poster: Batch-Size Independent Regret Bounds for Combinatorial Semi-Bandits with Probabilistically Triggered Arms or Independent Arms »
Xutong Liu · Jinhang Zuo · Siwei Wang · Carlee Joe-Wong · John C.S. Lui · Wei Chen -
2021 Poster: A Gang of Adversarial Bandits »
Mark Herbster · Stephen Pasteris · Fabio Vitale · Massimiliano Pontil -
2021 Poster: Pareto-Optimal Learning-Augmented Algorithms for Online Conversion Problems »
Bo Sun · Russell Lee · Mohammad Hajiesmaili · Adam Wierman · Danny Tsang -
2020 Poster: Adversarial Bandits with Corruptions »
Lin Yang · Mohammad Hajiesmaili · Mohammad Sadegh Talebi · John C. S. Lui · Wing Shing Wong -
2020 Poster: Online Matrix Completion with Side Information »
Mark Herbster · Stephen Pasteris · Lisa Tse -
2020 Poster: Restless-UCB, an Efficient and Low-complexity Algorithm for Online Restless Bandits »
Siwei Wang · Longbo Huang · John C. S. Lui -
2020 Poster: Online Multitask Learning with Long-Term Memory »
Mark Herbster · Stephen Pasteris · Lisa Tse -
2018 Poster: Community Exploration: From Offline Optimization to Online Learning »
Xiaowei Chen · Weiran Huang · Wei Chen · John C. S. Lui -
2016 Poster: Diffusion-Convolutional Neural Networks »
James Atwood · Don Towsley -
2015 Poster: Online Prediction at the Limit of Zero Temperature »
Mark Herbster · Stephen Pasteris · Shaona Ghosh