Timezone: »
Much of model-based reinforcement learning involves learning a model of an agent's world, and training an agent to leverage this model to perform a task more efficiently. While these models are demonstrably useful for agents, every naturally occurring model of the world of which we are aware---e.g., a brain---arose as the byproduct of competing evolutionary pressures for survival, not minimization of a supervised forward-predictive loss via gradient descent. That useful models can arise out of the messy and slow optimization process of evolution suggests that forward-predictive modeling can arise as a side-effect of optimization under the right circumstances. Crucially, this optimization process need not explicitly be a forward-predictive loss. In this work, we introduce a modification to traditional reinforcement learning which we call observational dropout, whereby we limit the agents ability to observe the real environment at each timestep. In doing so, we can coerce an agent into learning a world model to fill in the observation gaps during reinforcement learning. We show that the emerged world model, while not explicitly trained to predict the future, can help the agent learn key skills required to perform well in its environment. Videos of our results available at https://learningtopredict.github.io/
Author Information
Daniel Freeman (Google Brain)
David Ha (Google Brain)
Luke Metz (Google Brain)
More from the Same Authors
-
2020 : Training more effective learned optimizers »
Luke Metz -
2021 : Brax - A Differentiable Physics Engine for Large Scale Rigid Body Simulation »
Daniel Freeman · Erik Frey · Anton Raichuk · Sertan Girgin · Igor Mordatch · Olivier Bachem -
2022 : Meta-Learning General-Purpose Learning Algorithms with Transformers »
Louis Kirsch · Luke Metz · James Harrison · Jascha Sohl-Dickstein -
2022 : Meta-Learning General-Purpose Learning Algorithms with Transformers »
Louis Kirsch · Luke Metz · James Harrison · Jascha Sohl-Dickstein -
2022 Poster: A Closer Look at Learned Optimization: Stability, Robustness, and Inductive Biases »
James Harrison · Luke Metz · Jascha Sohl-Dickstein -
2022 Poster: Discovered Policy Optimisation »
Chris Lu · Jakub Kuba · Alistair Letcher · Luke Metz · Christian Schroeder de Witt · Jakob Foerster -
2021 : Luke Metz Q&A »
Luke Metz -
2021 : Luke Metz »
Luke Metz -
2021 Poster: Reverse engineering learned optimizers reveals known and novel mechanisms »
Niru Maheswaranathan · David Sussillo · Luke Metz · Ruoxi Sun · Jascha Sohl-Dickstein -
2020 : Reverse engineering learned optimizers reveals known and novel mechanisms »
Niru Maheswaranathan · David Sussillo · Luke Metz · Ruoxi Sun · Jascha Sohl-Dickstein -
2020 Poster: Ridge Rider: Finding Diverse Solutions by Following Eigenvectors of the Hessian »
Jack Parker-Holder · Luke Metz · Cinjon Resnick · Hengyuan Hu · Adam Lerer · Alistair Letcher · Alexander Peysakhovich · Aldo Pacchiano · Jakob Foerster -
2019 : Innate Bodies, Innate Brains, and Innate World Models »
David Ha -
2019 Poster: Weight Agnostic Neural Networks »
Adam Gaier · David Ha -
2019 Spotlight: Weight Agnostic Neural Networks »
Adam Gaier · David Ha -
2018 : Learned optimizers that outperform SGD on wall-clock and validation loss »
Luke Metz -
2018 : David Ha »
David Ha -
2018 Poster: Recurrent World Models Facilitate Policy Evolution »
David Ha · Jürgen Schmidhuber -
2018 Oral: Recurrent World Models Facilitate Policy Evolution »
David Ha · Jürgen Schmidhuber -
2017 Workshop: Machine Learning for Creativity and Design »
Douglas Eck · David Ha · S. M. Ali Eslami · Sander Dieleman · Rebecca Fiebrink · Luba Elliott