Timezone: »
In many platforms, user arrivals exhibit a self-reinforcing behavior: future user arrivals are likely to have preferences similar to users who were satisfied in the past. In other words, arrivals exhibit {\em positive externalities}. We study multiarmed bandit (MAB) problems with positive externalities. We show that the self-reinforcing preferences may lead standard benchmark algorithms such as UCB to exhibit linear regret. We develop a new algorithm, Balanced Exploration (BE), which explores arms carefully to avoid suboptimal convergence of arrivals before sufficient evidence is gathered. We also introduce an adaptive variant of BE which successively eliminates suboptimal arms. We analyze their asymptotic regret, and establish optimality by showing that no algorithm can perform better.
Author Information
Virag Shah (Stanford)
Jose Blanchet (Stanford University)
Ramesh Johari (Stanford University)
More from the Same Authors
-
2020 Poster: Adaptive Experimental Design with Temporal Interference: A Maximum Likelihood Approach »
Peter W Glynn · Ramesh Johari · Mohammad Rasouli -
2020 Poster: Unreasonable Effectiveness of Greedy Algorithms in Multi-Armed Bandit with Many Arms »
Mohsen Bayati · Nima Hamidi · Ramesh Johari · Khashayar Khosravi -
2020 Spotlight: Unreasonable Effectiveness of Greedy Algorithms in Multi-Armed Bandit with Many Arms »
Mohsen Bayati · Nima Hamidi · Ramesh Johari · Khashayar Khosravi -
2019 Poster: Semi-Parametric Dynamic Contextual Pricing »
Virag Shah · Ramesh Johari · Jose Blanchet