Timezone: »

Bayes-Adaptive Simulation-based Search with Value Function Approximation
Arthur Guez · Nicolas Heess · David Silver · Peter Dayan

Wed Dec 10 04:00 PM -- 08:59 PM (PST) @ Level 2, room 210D

Bayes-adaptive planning offers a principled solution to the exploration-exploitation trade-off under model uncertainty. It finds the optimal policy in belief space, which explicitly accounts for the expected effect on future rewards of reductions in uncertainty. However, the Bayes-adaptive solution is typically intractable in domains with large or continuous state spaces. We present a tractable method for approximating the Bayes-adaptive solution by combining simulation-based search with a novel value function approximation technique that generalises over belief space. Our method outperforms prior approaches in both discrete bandit tasks and simple continuous navigation and control tasks.

Author Information

Arthur Guez (DeepMind)
Nicolas Heess (Gatsby Unit)
David Silver (DeepMind)
Peter Dayan (Gatsby Unit, UCL)

I am Director of the Gatsby Computational Neuroscience Unit at University College London. I studied mathematics at the University of Cambridge and then did a PhD at the University of Edinburgh, specialising in associative memory and reinforcement learning. I did postdocs with Terry Sejnowski at the Salk Institute and Geoff Hinton at the University of Toronto, then became an Assistant Professor in Brain and Cognitive Science at the Massachusetts Institute of Technology before moving to UCL.

More from the Same Authors