Timezone: »
Bayes-adaptive planning offers a principled solution to the exploration-exploitation trade-off under model uncertainty. It finds the optimal policy in belief space, which explicitly accounts for the expected effect on future rewards of reductions in uncertainty. However, the Bayes-adaptive solution is typically intractable in domains with large or continuous state spaces. We present a tractable method for approximating the Bayes-adaptive solution by combining simulation-based search with a novel value function approximation technique that generalises over belief space. Our method outperforms prior approaches in both discrete bandit tasks and simple continuous navigation and control tasks.
Author Information
Arthur Guez (DeepMind)
Nicolas Heess (Gatsby Unit)
David Silver (DeepMind)
Peter Dayan (Gatsby Unit, UCL)
I am Director of the Gatsby Computational Neuroscience Unit at University College London. I studied mathematics at the University of Cambridge and then did a PhD at the University of Edinburgh, specialising in associative memory and reinforcement learning. I did postdocs with Terry Sejnowski at the Salk Institute and Geoff Hinton at the University of Toronto, then became an Assistant Professor in Brain and Cognitive Science at the Massachusetts Institute of Technology before moving to UCL.
More from the Same Authors
-
2021 Spotlight: Proper Value Equivalence »
Christopher Grimm · Andre Barreto · Greg Farquhar · David Silver · Satinder Singh -
2021 Spotlight: Online and Offline Reinforcement Learning by Planning with a Learned Model »
Julian Schrittwieser · Thomas Hubert · Amol Mandhane · Mohammadamin Barekatain · Ioannis Antonoglou · David Silver -
2022 Poster: Large-Scale Retrieval for Reinforcement Learning »
Peter Humphreys · Arthur Guez · Olivier Tieleman · Laurent Sifre · Theophane Weber · Timothy Lillicrap -
2021 Workshop: Deep Reinforcement Learning »
Pieter Abbeel · Chelsea Finn · David Silver · Matthew Taylor · Martha White · Srijita Das · Yuqing Du · Andrew Patterson · Manan Tomar · Olivia Watkins -
2021 : Bootstrapped Meta-Learning »
Sebastian Flennerhag · Yannick Schroecker · Tom Zahavy · Hado van Hasselt · David Silver · Satinder Singh -
2021 Poster: Proper Value Equivalence »
Christopher Grimm · Andre Barreto · Greg Farquhar · David Silver · Satinder Singh -
2021 Poster: Discovery of Options via Meta-Learned Subgoals »
Vivek Veeriah · Tom Zahavy · Matteo Hessel · Zhongwen Xu · Junhyuk Oh · Iurii Kemaev · Hado van Hasselt · David Silver · Satinder Singh -
2021 Poster: Self-Consistent Models and Values »
Greg Farquhar · Kate Baumli · Zita Marinho · Angelos Filos · Matteo Hessel · Hado van Hasselt · David Silver -
2021 Poster: Online and Offline Reinforcement Learning by Planning with a Learned Model »
Julian Schrittwieser · Thomas Hubert · Amol Mandhane · Mohammadamin Barekatain · Ioannis Antonoglou · David Silver -
2020 Workshop: Deep Reinforcement Learning »
Pieter Abbeel · Chelsea Finn · Joelle Pineau · David Silver · Satinder Singh · Coline Devin · Misha Laskin · Kimin Lee · Janarthanan Rajendran · Vivek Veeriah -
2020 Poster: Discovering Reinforcement Learning Algorithms »
Junhyuk Oh · Matteo Hessel · Wojciech Czarnecki · Zhongwen Xu · Hado van Hasselt · Satinder Singh · David Silver -
2020 Poster: Value-driven Hindsight Modelling »
Arthur Guez · Fabio Viola · Theophane Weber · Lars Buesing · Steven Kapturowski · Doina Precup · David Silver · Nicolas Heess -
2020 Poster: Meta-Gradient Reinforcement Learning with an Objective Discovered Online »
Zhongwen Xu · Hado van Hasselt · Matteo Hessel · Junhyuk Oh · Satinder Singh · David Silver -
2020 Poster: A Self-Tuning Actor-Critic Algorithm »
Tom Zahavy · Zhongwen Xu · Vivek Veeriah · Matteo Hessel · Junhyuk Oh · Hado van Hasselt · David Silver · Satinder Singh -
2020 Poster: The Value Equivalence Principle for Model-Based Reinforcement Learning »
Christopher Grimm · Andre Barreto · Satinder Singh · David Silver -
2019 : Late-Breaking Papers (Talks) »
David Silver · Simon Du · Matthias Plappert -
2019 Workshop: Deep Reinforcement Learning »
Pieter Abbeel · Chelsea Finn · Joelle Pineau · David Silver · Satinder Singh · Joshua Achiam · Carlos Florensa · Christopher Grimm · Haoran Tang · Vivek Veeriah -
2019 Poster: Discovery of Useful Questions as Auxiliary Tasks »
Vivek Veeriah · Matteo Hessel · Zhongwen Xu · Janarthanan Rajendran · Richard L Lewis · Junhyuk Oh · Hado van Hasselt · David Silver · Satinder Singh -
2019 Poster: The Option Keyboard: Combining Skills in Reinforcement Learning »
Andre Barreto · Diana Borsa · Shaobo Hou · Gheorghe Comanici · Eser Aygün · Philippe Hamel · Daniel Toyama · jonathan j hunt · Shibl Mourad · David Silver · Doina Precup -
2018 : David Silver »
David Silver -
2018 Workshop: Deep Reinforcement Learning »
Pieter Abbeel · David Silver · Satinder Singh · Joelle Pineau · Joshua Achiam · Rein Houthooft · Aravind Srinivas -
2018 Poster: Integrated accounts of behavioral and neuroimaging data using flexible recurrent neural network models »
Amir Dezfouli · Richard Morris · Fabio Ramos · Peter Dayan · Bernard Balleine -
2018 Oral: Integrated accounts of behavioral and neuroimaging data using flexible recurrent neural network models »
Amir Dezfouli · Richard Morris · Fabio Ramos · Peter Dayan · Bernard Balleine -
2018 Poster: Meta-Gradient Reinforcement Learning »
Zhongwen Xu · Hado van Hasselt · David Silver -
2017 : Panel Discussion »
Matt Botvinick · Emma Brunskill · Marcos Campos · Jan Peters · Doina Precup · David Silver · Josh Tenenbaum · Roy Fox -
2017 : Deep Reinforcement Learning with Subgoals (David Silver) »
David Silver -
2017 Symposium: Deep Reinforcement Learning »
Pieter Abbeel · Yan Duan · David Silver · Satinder Singh · Junhyuk Oh · Rein Houthooft -
2017 Poster: Natural Value Approximators: Learning when to Trust Past Estimates »
Zhongwen Xu · Joseph Modayil · Hado van Hasselt · Andre Barreto · David Silver · Tom Schaul -
2017 Poster: Successor Features for Transfer in Reinforcement Learning »
Andre Barreto · Will Dabney · Remi Munos · Jonathan Hunt · Tom Schaul · David Silver · Hado van Hasselt -
2017 Poster: A Unified Game-Theoretic Approach to Multiagent Reinforcement Learning »
Marc Lanctot · Vinicius Zambaldi · Audrunas Gruslys · Angeliki Lazaridou · Karl Tuyls · Julien Perolat · David Silver · Thore Graepel -
2017 Poster: Imagination-Augmented Agents for Deep Reinforcement Learning »
Sébastien Racanière · Theophane Weber · David Reichert · Lars Buesing · Arthur Guez · Danilo Jimenez Rezende · Adrià Puigdomènech Badia · Oriol Vinyals · Nicolas Heess · Yujia Li · Razvan Pascanu · Peter Battaglia · Demis Hassabis · David Silver · Daan Wierstra -
2017 Spotlight: Successor Features for Transfer in Reinforcement Learning »
Andre Barreto · Will Dabney · Remi Munos · Jonathan Hunt · Tom Schaul · David Silver · Hado van Hasselt -
2017 Spotlight: Natural Value Approximators: Learning when to Trust Past Estimates »
Zhongwen Xu · Joseph Modayil · Hado van Hasselt · Andre Barreto · David Silver · Tom Schaul -
2017 Oral: Imagination-Augmented Agents for Deep Reinforcement Learning »
Sébastien Racanière · Theophane Weber · David Reichert · Lars Buesing · Arthur Guez · Danilo Jimenez Rezende · Adrià Puigdomènech Badia · Oriol Vinyals · Nicolas Heess · Yujia Li · Razvan Pascanu · Peter Battaglia · Demis Hassabis · David Silver · Daan Wierstra -
2016 Poster: Learning values across many orders of magnitude »
Hado van Hasselt · Arthur Guez · Arthur Guez · Matteo Hessel · Volodymyr Mnih · David Silver -
2015 Workshop: Deep Reinforcement Learning »
Pieter Abbeel · John Schulman · Satinder Singh · David Silver -
2015 Poster: Learning Continuous Control Policies by Stochastic Value Gradients »
Nicolas Heess · Gregory Wayne · David Silver · Timothy Lillicrap · Tom Erez · Yuval Tassa -
2014 Workshop: Novel Trends and Applications in Reinforcement Learning »
Csaba Szepesvari · Marc Deisenroth · Sergey Levine · Pedro Ortega · Brian Ziebart · Emma Brunskill · Naftali Tishby · Gerhard Neumann · Daniel Lee · Sridhar Mahadevan · Pieter Abbeel · David Silver · Vicenç Gómez -
2013 Invited Talk: Neural Reinforcement Learning »
Peter Dayan -
2013 Poster: Correlations strike back (again): the case of associative memory retrieval »
Cristina Savin · Peter Dayan · Mate Lengyel -
2013 Poster: Learning to Pass Expectation Propagation Messages »
Nicolas Heess · Danny Tarlow · John Winn -
2013 Oral: Correlations strike back (again): the case of associative memory retrieval »
Cristina Savin · Peter Dayan · Mate Lengyel -
2012 Poster: Searching for objects driven by context »
Bogdan Alexe · Nicolas Heess · Yee Whye Teh · Vittorio Ferrari -
2012 Poster: Efficient Bayes-Adaptive Reinforcement Learning using Sample-Based Search »
Arthur Guez · David Silver · Peter Dayan -
2012 Spotlight: Searching for objects driven by context »
Bogdan Alexe · Nicolas Heess · Yee Whye Teh · Vittorio Ferrari -
2011 Poster: Two is better than one: distinct roles for familiarity and recollection in retrieving palimpsest memories »
Cristina Savin · Peter Dayan · Mate Lengyel -
2010 Poster: Monte-Carlo Planning in Large POMDPs »
David Silver · Joel Veness -
2009 Poster: Bootstrapping from Game Tree Search »
Joel Veness · David Silver · William Uther · Alan Blair -
2009 Poster: Know Thy Neighbour: A Normative Theory of Synaptic Depression »
Jean-Pascal Pfister · Peter Dayan · Mate Lengyel -
2009 Oral: Bootstrapping from Game Tree Search »
Joel Veness · David Silver · William Uther · Alan Blair -
2009 Oral: Know Thy Neighbour: A Normative Theory of Synaptic Depression »
Jean-Pascal Pfister · Peter Dayan · Mate Lengyel -
2009 Poster: Statistical Models of Linear and Nonlinear Contextual Interactions in Early Visual Processing »
Ruben Coen-Cagli · Peter Dayan · Odelia Schwartz -
2009 Spotlight: Statistical Models of Linear and Nonlinear Contextual Interactions in Early Visual Processing »
Ruben Coen-Cagli · Peter Dayan · Odelia Schwartz -
2009 Poster: Convergent Temporal-Difference Learning with Arbitrary Smooth Function Approximation »
Hamid R Maei · Csaba Szepesvari · Shalabh Batnaghar · Doina Precup · David Silver · Richard Sutton -
2009 Spotlight: Convergent Temporal-Difference Learning with Arbitrary Smooth Function Approximation »
Hamid R Maei · Csaba Szepesvari · Shalabh Batnaghar · Doina Precup · David Silver · Richard Sutton -
2008 Oral: Load and Attentional Bayes »
Peter Dayan -
2008 Poster: Load and Attentional Bayes »
Peter Dayan -
2008 Poster: Depression: an RL formulation and a behavioural test »
Quentin J Huys · Joshua T Vogelstein · Peter Dayan -
2008 Poster: Bayesian Model of Behaviour in Economic Games »
Debajyoti Ray · Brooks King-Casas · P. Read Montague · Peter Dayan -
2007 Oral: Hippocampal Contributions to Control: The Third Way »
Mate Lengyel · Peter Dayan -
2007 Poster: Hippocampal Contributions to Control: The Third Way »
Mate Lengyel · Peter Dayan -
2006 Poster: Uncertainty, phase and oscillatory hippocampal recall »
Mate Lengyel · Peter Dayan -
2006 Talk: Uncertainty, phase and oscillatory hippocampal recall »
Mate Lengyel · Peter Dayan