Timezone: »
Poster
Thompson Sampling for 1-Dimensional Exponential Family Bandits
Nathaniel Korda · Emilie Kaufmann · Remi Munos
Sat Dec 07 07:00 PM -- 11:59 PM (PST) @ Harrah's Special Events Center, 2nd Floor
Thompson Sampling has been demonstrated in many complex bandit models, however the theoretical guarantees available for the parametric multi-armed bandit are still limited to the Bernoulli case. Here we extend them by proving asymptotic optimality of the algorithm using the Jeffreys prior for $1$-dimensional exponential family bandits. Our proof builds on previous work, but also makes extensive use of closed forms for Kullback-Leibler divergence and Fisher information (and thus Jeffreys prior) available in an exponential family. This allow us to give a finite time exponential concentration inequality for posterior distributions on exponential families that may be of interest in its own right. Moreover our analysis covers some distributions for which no optimistic algorithm has yet been proposed, including heavy-tailed exponential families.
Author Information
Nathaniel Korda (INRIA)
Emilie Kaufmann (Telecom ParisTech)
Remi Munos (Google DeepMind)
More from the Same Authors
-
2016 Poster: On Explore-Then-Commit strategies »
Aurélien Garivier · Tor Lattimore · Emilie Kaufmann -
2015 Poster: Black-box optimization of noisy functions with unknown smoothness »
Jean-Bastien Grill · Michal Valko · Remi Munos · Remi Munos -
2014 Poster: Active Regression by Stratification »
Sivan Sabato · Remi Munos -
2014 Poster: Best-Arm Identification in Linear Bandits »
Marta Soare · Alessandro Lazaric · Remi Munos -
2014 Poster: Bounded Regret for Finite-Armed Structured Bandits »
Tor Lattimore · Remi Munos -
2014 Poster: Efficient learning by implicit exploration in bandit problems with side observations »
Tomáš Kocák · Gergely Neu · Michal Valko · Remi Munos -
2014 Poster: Optimistic Planning in Markov Decision Processes Using a Generative Model »
Balázs Szörényi · Gunnar Kedenburg · Remi Munos -
2013 Workshop: Bayesian Optimization in Theory and Practice »
Matthew Hoffman · Jasper Snoek · Nando de Freitas · Michael A Osborne · Ryan Adams · Sebastien Bubeck · Philipp Hennig · Remi Munos · Andreas Krause -
2013 Poster: Aggregating Optimistic Planning Trees for Solving Markov Decision Processes »
Gunnar Kedenburg · Raphael Fonteneau · Remi Munos -
2012 Poster: Bandit Algorithms boost Brain Computer Interfaces for motor-task selection of a brain-controlled button »
Joan Fruitet · Alexandra Carpentier · Remi Munos · Maureen Clerc -
2012 Poster: Adaptive Stratified Sampling for Monte-Carlo integration of Differentiable functions »
Alexandra Carpentier · Remi Munos -
2012 Poster: Risk-Aversion in Multi-armed Bandits »
Amir Sani · Alessandro Lazaric · Remi Munos -
2011 Poster: Finite Time Analysis of Stratified Sampling for Monte Carlo »
Alexandra Carpentier · Remi Munos -
2011 Poster: Selecting the State-Representation in Reinforcement Learning »
Odalric-Ambrym Maillard · Remi Munos · Daniil Ryabko -
2011 Poster: Sparse Recovery with Brownian Sensing »
Alexandra Carpentier · Odalric-Ambrym Maillard · Remi Munos -
2011 Session: Spotlight Session 2 »
Remi Munos -
2011 Session: Oral Session 1 »
Remi Munos -
2011 Poster: Optimistic Optimization of Deterministic Functions »
Remi Munos -
2011 Poster: Speedy Q-Learning »
Mohammad Gheshlaghi Azar · Remi Munos · Mohammad Ghavamzadeh · Hilbert J Kappen -
2010 Spotlight: LSTD with Random Projections »
Mohammad Ghavamzadeh · Alessandro Lazaric · Odalric-Ambrym Maillard · Remi Munos -
2010 Poster: LSTD with Random Projections »
Mohammad Ghavamzadeh · Alessandro Lazaric · Odalric-Ambrym Maillard · Remi Munos -
2010 Poster: Scrambled Objects for Least-Squares Regression »
Odalric-Ambrym Maillard · Remi Munos -
2010 Poster: Error Propagation for Approximate Policy and Value Iteration »
Amir-massoud Farahmand · Remi Munos · Csaba Szepesvari -
2009 Poster: Sensitivity analysis in HMMs with application to likelihood maximization »
Pierre-Arnaud Coquelin · Romain Deguest · Remi Munos -
2009 Poster: Compressed Least-Squares Regression »
Odalric-Ambrym Maillard · Remi Munos -
2008 Poster: Online Optimization in X-Armed Bandits »
Sebastien Bubeck · Remi Munos · Gilles Stoltz · Csaba Szepesvari -
2008 Poster: Algorithms for Infinitely Many-Armed Bandits »
Yizao Wang · Jean-Yves Audibert · Remi Munos -
2008 Spotlight: Algorithms for Infinitely Many-Armed Bandits »
Yizao Wang · Jean-Yves Audibert · Remi Munos -
2008 Poster: Particle Filter-based Policy Gradient in POMDPs »
Pierre-Arnaud Coquelin · Romain Deguest · Remi Munos -
2007 Poster: Fitted Q-iteration in continuous action-space MDPs »
Remi Munos · András Antos · Csaba Szepesvari