Timezone: »
We propose and study Collapsing Bandits, a new restless multi-armed bandit (RMAB) setting in which each arm follows a binary-state Markovian process with a special structure: when an arm is played, the state is fully observed, thus“collapsing” any uncertainty, but when an arm is passive, no observation is made, thus allowing uncertainty to evolve. The goal is to keep as many arms in the “good” state as possible by planning a limited budget of actions per round. Such CollapsingBandits are natural models for many healthcare domains in which health workers must simultaneously monitor patients and deliver interventions in a way that maximizes the health of their patient cohort. Our main contributions are as follows: (i) Building on the Whittle index technique for RMABs, we derive conditions under which the Collapsing Bandits problem is indexable. Our derivation hinges on novel conditions that characterize when the optimal policies may take the form of either“forward” or “reverse” threshold policies. (ii) We exploit the optimality of threshold policies to build fast algorithms for computing the Whittle index, including a closed-form. (iii) We evaluate our algorithm on several data distributions including data from a real-world healthcare task in which a worker must monitor and deliver interventions to maximize their patients’ adherence to tuberculosis medication. Our algorithm achieves a 3-order-of-magnitude speedup compared to state-of-the-art RMAB techniques, while achieving similar performance. The code is available at:https://github.com/AdityaMate/collapsing_bandits
Author Information
Aditya Mate (Harvard University)
Jackson Killian (Harvard University)
Haifeng Xu (University of Virginia)
Andrew Perrault (Harvard University)
Milind Tambe (Harvard University/Google Research India)
More from the Same Authors
-
2021 Spotlight: Learning MDPs from Features: Predict-Then-Optimize for Sequential Decision Making by Reinforcement Learning »
Kai Wang · Sanket Shah · Haipeng Chen · Andrew Perrault · Finale Doshi-Velez · Milind Tambe -
2021 : Your Bandit Model is Not Perfect: Introducing Robustness to Restless Bandits Enabled by Deep Reinforcement Learning »
Jackson Killian · Lily Xu · Arpita Biswas · Milind Tambe -
2022 : Case Study: Applying Decision Focused Learning in the Real World »
Shresth Verma · Aditya Mate · Kai Wang · Aparna Taneja · Milind Tambe -
2022 : Invited Talk: Milind Tambe »
Milind Tambe -
2022 Poster: Decision-Focused Learning without Decision-Making: Learning Locally Optimized Decision Losses »
Sanket Shah · Kai Wang · Bryan Wilder · Andrew Perrault · Milind Tambe -
2021 : Your Bandit Model is Not Perfect: Introducing Robustness to Restless Bandits Enabled by Deep Reinforcement Learning »
Jackson Killian -
2021 : Restless Bandits in the Field: Real-World Study for Improving Maternal and Child Health Outcomes »
Aditya Mate -
2021 : Restless Bandits in the Field: Real-World Study for Improving Maternal and Child Health Outcomes »
Aditya Mate -
2021 : Invite Talk Q&A »
Milind Tambe · Tejumade Afonja · Paula Rodriguez Diaz -
2021 : Invited Talk: AI for Social Impact: Results from Deployments for Public Health »
Milind Tambe -
2021 Poster: The Limits of Optimal Pricing in the Dark »
Quinlan Dawkins · Minbiao Han · Haifeng Xu -
2021 Poster: (Almost) Free Incentivized Exploration from Decentralized Learning Agents »
Chengshuai Shi · Haifeng Xu · Wei Xiong · Cong Shen -
2021 Poster: Least Square Calibration for Peer Reviews »
Sijun Tan · Jibang Wu · Xiaohui Bei · Haifeng Xu -
2021 Poster: Learning MDPs from Features: Predict-Then-Optimize for Sequential Decision Making by Reinforcement Learning »
Kai Wang · Sanket Shah · Haipeng Chen · Andrew Perrault · Finale Doshi-Velez · Milind Tambe -
2020 : Q/A and Panel Discussion for People-Earth with Dan Kammen and Milind Tambe »
Daniel Kammen · Milind Tambe · Giulio De Leo · Mayur Mudigonda · Surya Karthik Mukkavilli -
2020 : Q/A and Discussion »
Surya Karthik Mukkavilli · Mayur Mudigonda · Milind Tambe -
2020 : Milind Tambe »
Milind Tambe -
2020 Poster: Automatically Learning Compact Quality-aware Surrogates for Optimization Problems »
Kai Wang · Bryan Wilder · Andrew Perrault · Milind Tambe -
2020 Spotlight: Automatically Learning Compact Quality-aware Surrogates for Optimization Problems »
Kai Wang · Bryan Wilder · Andrew Perrault · Milind Tambe