Timezone: »
Incentivized exploration in multi-armed bandits (MAB) has witnessed increasing interests and many progresses in recent years, where a principal offers bonuses to agents to do explorations on her behalf. However, almost all existing studies are confined to temporary myopic agents. In this work, we break this barrier and study incentivized exploration with multiple and long-term strategic agents, who have more complicated behaviors that often appear in real-world applications. An important observation of this work is that strategic agents' intrinsic needs of learning benefit (instead of harming) the principal's explorations by providing "free pulls". Moreover, it turns out that increasing the population of agents significantly lowers the principal's burden of incentivizing. The key and somewhat surprising insight revealed from our results is that when there are sufficiently many learning agents involved, the exploration process of the principal can be (almost) free. Our main results are built upon three novel components which may be of independent interest: (1) a simple yet provably effective incentive-provision strategy; (2) a carefully crafted best arm identification algorithm for rewards aggregated under unequal confidences; (3) a high-probability finite-time lower bound of UCB algorithms. Experimental results are provided to complement the theoretical analysis.
Author Information
Chengshuai Shi (University of Virginia)
Haifeng Xu (University of Virginia)
Wei Xiong (Hong Kong University of Science and Technology)
Cong Shen (University of Virginia)
More from the Same Authors
-
2021 Poster: Federated Linear Contextual Bandits »
Ruiquan Huang · Weiqiang Wu · Jing Yang · Cong Shen -
2021 Poster: The Limits of Optimal Pricing in the Dark »
Quinlan Dawkins · Minbiao Han · Haifeng Xu -
2021 Poster: Least Square Calibration for Peer Reviews »
Sijun Tan · Jibang Wu · Xiaohui Bei · Haifeng Xu -
2021 Poster: Distributional Reinforcement Learning for Multi-Dimensional Reward Functions »
Pushi Zhang · Xiaoyu Chen · Li Zhao · Wei Xiong · Tao Qin · Tie-Yan Liu -
2021 Poster: Heterogeneous Multi-player Multi-armed Bandits: Closing the Gap and Generalization »
Chengshuai Shi · Wei Xiong · Cong Shen · Jing Yang -
2020 Poster: Robust Recursive Partitioning for Heterogeneous Treatment Effects with Uncertainty Quantification »
Hyun-Suk Lee · Yao Zhang · William Zame · Cong Shen · Jang-Won Lee · Mihaela van der Schaar -
2020 Poster: Collapsing Bandits and Their Application to Public Health Intervention »
Aditya Mate · Jackson Killian · Haifeng Xu · Andrew Perrault · Milind Tambe