Timezone: »

Heterogeneous Multi-player Multi-armed Bandits: Closing the Gap and Generalization
Chengshuai Shi · Wei Xiong · Cong Shen · Jing Yang

Tue Dec 07 08:30 AM -- 10:00 AM (PST) @

Despite the significant interests and many progresses in decentralized multi-player multi-armed bandits (MP-MAB) problems in recent years, the regret gap to the natural centralized lower bound in the heterogeneous MP-MAB setting remains open. In this paper, we propose BEACON -- Batched Exploration with Adaptive COmmunicatioN -- that closes this gap. BEACON accomplishes this goal with novel contributions in implicit communication and efficient exploration. For the former, we propose a novel adaptive differential communication (ADC) design that significantly improves the implicit communication efficiency. For the latter, a carefully crafted batched exploration scheme is developed to enable incorporation of the combinatorial upper confidence bound (CUCB) principle. We then generalize the existing linear-reward MP-MAB problems, where the system reward is always the sum of individually collected rewards, to a new MP-MAB problem where the system reward is a general (nonlinear) function of individual rewards. We extend BEACON to solve this problem and prove a logarithmic regret. BEACON bridges the algorithm design and regret analysis of combinatorial MAB (CMAB) and MP-MAB, two largely disjointed areas in MAB, and the results in this paper suggest that this previously ignored connection is worth further investigation.

Author Information

Chengshuai Shi (University of Virginia)
Wei Xiong (Hong Kong University of Science and Technology)
Cong Shen (University of Virginia)
Jing Yang (Pennsylvania State University)

More from the Same Authors