Timezone: »
We consider \emph{incentivized exploration}: a version of multi-armed bandits where the choice of arms is controlled by self-interested agents, and the algorithm can only issue recommendations. The algorithm controls the flow of information, and the information asymmetry can incentivize the agents to explore. Prior work achieves optimal regret rates up to multiplicative factors that become arbitrarily large depending on the Bayesian priors, and scale exponentially in the number of arms. A more basic problem of sampling each arm once runs into similar factors.We focus on the \emph{price of incentives}: the loss in performance, broadly construed, incurred for the sake of incentive-compatibility. We prove that Thompson Sampling, a standard bandit algorithm, is incentive-compatible if initialized with sufficiently many data points. The performance loss due to incentives is therefore limited to the initial rounds when these data points are collected.The problem is largely reduced to that of sample complexity: how many rounds are needed? We address this question, providing matching upper and lower bounds and instantiating them in various corollaries. Typically, the optimal sample complexity is polynomial in the number of arms and exponential in the ``strength of beliefs".
Author Information
Mark Sellke (Stanford University)
Aleksandrs Slivkins (Microsoft Research NYC)
More from the Same Authors
-
2021 : Exploration and Incentives in Reinforcement Learning »
Max Simchowitz · Aleksandrs Slivkins -
2021 : Exploration and Incentives in Reinforcement Learning »
Max Simchowitz · Aleksandrs Slivkins -
2021 : The Price of Incentivizing Exploration: A Characterization via Thompson Sampling and Sample Complexity »
Mark Sellke · Aleksandrs Slivkins -
2022 Poster: Incentivizing Combinatorial Bandit Exploration »
Xinyan Hu · Dung Ngo · Aleksandrs Slivkins · Steven Wu -
2021 : Spotlight 1: Exploration and Incentives in Reinforcement Learning »
Max Simchowitz · Aleksandrs Slivkins -
2021 Poster: A Universal Law of Robustness via Isoperimetry »
Sebastien Bubeck · Mark Sellke -
2021 Poster: Bandits with Knapsacks beyond the Worst Case »
Karthik Abinav Sankararaman · Aleksandrs Slivkins -
2021 Oral: A Universal Law of Robustness via Isoperimetry »
Sebastien Bubeck · Mark Sellke -
2020 Poster: Efficient Contextual Bandits with Continuous Actions »
Maryam Majzoubi · Chicheng Zhang · Rajan Chari · Akshay Krishnamurthy · John Langford · Aleksandrs Slivkins -
2020 Poster: Constrained episodic reinforcement learning in concave-convex and knapsack settings »
Kianté Brantley · Miro Dudik · Thodoris Lykouris · Sobhan Miryoosefi · Max Simchowitz · Aleksandrs Slivkins · Wen Sun -
2017 : The Unfair Externalities of Exploration »
Aleksandrs Slivkins · Jennifer Wortman Vaughan -
2011 Poster: Multi-armed bandits on implicit metric spaces »
Aleksandrs Slivkins -
2009 Poster: Adapting to the Shifting Intent of Search Queries »
Umar Syed · Aleksandrs Slivkins · Nina Mishra