Timezone: »
Spotlight
Variational Bayesian Optimistic Sampling
Brendan O'Donoghue · Tor Lattimore
@
We consider online sequential decision problems where an agent must balance exploration and exploitation. We derive a set of Bayesian `optimistic' policies which, in the stochastic multi-armed bandit case, includes the Thompson sampling policy. We provide a new analysis showing that any algorithm producing policies in the optimistic set enjoys $\tilde O(\sqrt{AT})$ Bayesian regret for a problem with $A$ actions after $T$ rounds. We extend the regret analysis for optimistic policies to bilinear saddle-point problems which include zero-sum matrix games and constrained bandits as special cases. In this case we show that Thompson sampling can produce policies outside of the optimistic set and suffer linear regret in some instances. Finding a policy inside the optimistic set amounts to solving a convex optimization problem and we call the resulting algorithm `variational Bayesian optimistic sampling' (VBOS). The procedure works for any posteriors, \ie, it does not require the posterior to have any special properties, such as log-concavity, unimodality, or smoothness. The variational view of the problem has many useful properties, including the ability to tune the exploration-exploitation tradeoff, add regularization, incorporate constraints, and linearly parameterize the policy.
Author Information
Brendan O'Donoghue (DeepMind)
Tor Lattimore (DeepMind)
Related Events (a corresponding poster, oral, or spotlight)
-
2021 Poster: Variational Bayesian Optimistic Sampling »
Thu. Dec 9th 08:30 -- 10:00 AM Room
More from the Same Authors
-
2021 Spotlight: Information Directed Sampling for Sparse Linear Bandits »
Botao Hao · Tor Lattimore · Wei Deng -
2021 Spotlight: Reward is enough for convex MDPs »
Tom Zahavy · Brendan O'Donoghue · Guillaume Desjardins · Satinder Singh -
2021 Poster: Reward is enough for convex MDPs »
Tom Zahavy · Brendan O'Donoghue · Guillaume Desjardins · Satinder Singh -
2021 Poster: Practical Large-Scale Linear Programming using Primal-Dual Hybrid Gradient »
David Applegate · Mateo Diaz · Oliver Hinder · Haihao Lu · Miles Lubin · Brendan O'Donoghue · Warren Schudy -
2021 Poster: Variational Bayesian Reinforcement Learning with Regret Bounds »
Brendan O'Donoghue -
2021 Poster: Information Directed Sampling for Sparse Linear Bandits »
Botao Hao · Tor Lattimore · Wei Deng -
2021 Poster: Bandit Phase Retrieval »
Tor Lattimore · Botao Hao -
2020 Poster: High-Dimensional Sparse Linear Bandits »
Botao Hao · Tor Lattimore · Mengdi Wang -
2020 Poster: Model Selection in Contextual Stochastic Bandit Problems »
Aldo Pacchiano · My Phan · Yasin Abbasi Yadkori · Anup Rao · Julian Zimmert · Tor Lattimore · Csaba Szepesvari -
2020 Poster: Gaussian Gated Linear Networks »
David Budden · Adam Marblestone · Eren Sezener · Tor Lattimore · Gregory Wayne · Joel Veness -
2019 Poster: Hamiltonian descent for composite objectives »
Brendan O'Donoghue · Chris Maddison -
2019 Poster: A Geometric Perspective on Optimal Representations for Reinforcement Learning »
Marc Bellemare · Will Dabney · Robert Dadashi · Adrien Ali Taiga · Pablo Samuel Castro · Nicolas Le Roux · Dale Schuurmans · Tor Lattimore · Clare Lyle -
2019 Poster: Connections Between Mirror Descent, Thompson Sampling and the Information Ratio »
Julian Zimmert · Tor Lattimore