Timezone: »
Starting with the Thomspon sampling algorithm, recent years have seen a resurgence of interest in Bayesian algorithms for the Multi-armed Bandit (MAB) problem. These algorithms seek to exploit prior information on arm biases and while several have been shown to be regret optimal, their design has not emerged from a principled approach. In contrast, if one cared about Bayesian regret discounted over an infinite horizon at a fixed, pre-specified rate, the celebrated Gittins index theorem offers an optimal algorithm. Unfortunately, the Gittins analysis does not appear to carry over to minimizing Bayesian regret over all sufficiently large horizons and computing a Gittins index is onerous relative to essentially any incumbent index scheme for the Bayesian MAB problem. The present paper proposes a sequence of 'optimistic' approximations to the Gittins index. We show that the use of these approximations in concert with the use of an increasing discount factor appears to offer a compelling alternative to a variety of index schemes proposed for the Bayesian MAB problem in recent years. In addition, we show that the simplest of these approximations yields regret that matches the Lai-Robbins lower bound, including achieving matching constants.
Author Information
Eli Gutin (Massachusetts Institute of Tec)
Vivek Farias (Massachusetts Institute of Technology)
More from the Same Authors
-
2021 Spotlight: Fair Exploration via Axiomatic Bargaining »
Jackie Baek · Vivek Farias -
2021 : Learning Treatment Effects in Panels with General Intervention Patterns »
Vivek Farias · Andrew Li · Tianyi Peng -
2021 : The Limits to Learning a Diffusion Model »
Jackie Baek · Vivek Farias · ANDREEA GEORGESCU · Retsef Levi · Tianyi Peng · Joshua Wilde · Andrew Zheng -
2021 : The Limits to Learning a Diffusion Model »
Jackie Baek · Vivek Farias · ANDREEA GEORGESCU · Retsef Levi · Tianyi Peng · Joshua Wilde · Andrew Zheng -
2022 Poster: Markovian Interference in Experiments »
Vivek Farias · Andrew Li · Tianyi Peng · Andrew Zheng -
2021 Oral: Learning Treatment Effects in Panels with General Intervention Patterns »
Vivek Farias · Andrew Li · Tianyi Peng -
2021 Poster: Fair Exploration via Axiomatic Bargaining »
Jackie Baek · Vivek Farias -
2021 Poster: Learning Treatment Effects in Panels with General Intervention Patterns »
Vivek Farias · Andrew Li · Tianyi Peng -
2012 Poster: Non-parametric Approximate Dynamic Programming via the Kernel Method »
Nikhil Bhat · Ciamac C Moallemi · Vivek Farias -
2009 Poster: A Data-Driven Approach to Modeling Choice »
Vivek Farias · Srikanth Jagabathula · Devavrat Shah -
2009 Spotlight: A Data-Driven Approach to Modeling Choice »
Vivek Farias · Srikanth Jagabathula · Devavrat Shah -
2009 Poster: A Smoothed Approximate Linear Program »
Vijay Desai · Vivek Farias · Ciamac C Moallemi -
2009 Spotlight: A Smoothed Approximate Linear Program »
Vijay Desai · Vivek Farias · Ciamac C Moallemi