Skip to yearly menu bar Skip to main content


Statistical and Computational Trade-off in Multi-Agent Multi-Armed Bandits

Filippo Vannella · Alexandre Proutiere · Jaeseong Jeong

Great Hall & Hall B1+B2 (level 1) #1802
[ ]
[ Paper [ Poster [ OpenReview
Tue 12 Dec 3:15 p.m. PST — 5:15 p.m. PST


We study the problem of regret minimization in Multi-Agent Multi-Armed Bandits (MAMABs) where the rewards are defined through a factor graph. We derive an instance-specific regret lower bound and characterize the minimal expected number of times each global action should be explored. Unfortunately, this bound and the corresponding optimal exploration process are obtained by solving a combinatorial optimization problem with a set of variables and constraints exponentially growing with the number of agents. We approximate the regret lower bound problem via Mean Field techniques to reduce the number of variables and constraints. By tuning the latter, we explore the trade-off between achievable regret and complexity. We devise Efficient Sampling for MAMAB (ESM), an algorithm whose regret asymptotically matches the corresponding approximated lower bound. We assess the regret and computational complexity of ESM numerically, using both synthetic and real-world experiments in radio communications networks.

Chat is not available.