Skip to yearly menu bar Skip to main content

Contributed Talk
Workshop: Your Model is Wrong: Robustness and misspecification in probabilistic modeling

Your Bandit Model is Not Perfect: Introducing Robustness to Restless Bandits Enabled by Deep Reinforcement Learning

Jackson Killian


Restless multi-arm bandits (RMABs) are receiving renewed attention for their potential to model real-world planning problems under resource constraints. However, few RMAB models have surpassed theoretical interest, since they make the limiting assumption that model parameters are perfectly known. In the real world, model parameters often must be estimated via historical data or expert input, introducing uncertainty. In this light, we introduce a new paradigm, \emph{Robust RMABs}, a challenging generalization of RMABs that incorporates interval uncertainty over parameters of the dynamic model of each arm. This uncovers several new challenges for RMABs and inspires new algorithmic techniques of general interest. Our contributions are: (i)~We introduce the Robust Restless Bandit problem with interval uncertainty and solve a minimax regret objective; (ii)~We tackle the complexity of the robust objective via a double oracle (DO) approach and analyze its convergence; (iii)~To enable our DO approach, we introduce RMABPPO, a novel deep reinforcement learning (RL) algorithm for solving RMABs, of potential general interest.; (iv)~We design the first adversary algorithm for RMABs, required to implement the notoriously difficult minimax regret adversary oracle and also of general interest, by formulating it as a multi-agent RL problem and solving with a multi-agent extension of RMABPPO.