Timezone: »

Learning General World Models in a Handful of Reward-Free Deployments
Yingchen Xu · Jack Parker-Holder · Aldo Pacchiano · Philip Ball · Oleh Rybkin · S Roberts · Tim Rocktäschel · Edward Grefenstette

Thu Dec 01 09:00 AM -- 11:00 AM (PST) @ Hall J #520

Building generally capable agents is a grand challenge for deep reinforcement learning (RL). To approach this challenge practically, we outline two key desiderata: 1) to facilitate generalization, exploration should be task agnostic; 2) to facilitate scalability, exploration policies should collect large quantities of data without costly centralized retraining. Combining these two properties, we introduce the reward-free deployment efficiency setting, a new paradigm for RL research. We then present CASCADE, a novel approach for self-supervised exploration in this new setting. CASCADE seeks to learn a world model by collecting data with a population of agents, using an information theoretic objective inspired by Bayesian Active Learning. CASCADE achieves this by specifically maximizing the diversity of trajectories sampled by the population through a novel cascading objective. We provide theoretical intuition for CASCADE which we show in a tabular setting improves upon naïve approaches that do not account for population diversity. We then demonstrate that CASCADE collects diverse task-agnostic datasets and learns agents that generalize zero-shot to novel, unseen downstream tasks on Atari, MiniGrid, Crafter and the DM Control Suite. Code and videos are available at https://ycxuyingchen.github.io/cascade/

Author Information

Yingchen Xu (University College London, University of London)
Jack Parker-Holder (DeepMind)
Aldo Pacchiano (Microsoft Research)
Philip Ball (University of Oxford)
Oleh Rybkin (University of Pennsylvania)

I am a Ph.D. student in the GRASP laboratory at the University of Pennsylvania, where I work on computer vision and deep learning with Kostas Daniilidis. Previously, I received my bachelor's degree from Czech Technical University in Prague, where I was advised by Tomas Pajdla. I have spent two summers at INRIA and TiTech, with Josef Sivic and Akihiko Torii respectively. I am working in artificial intelligence, computer vision, and robotics. More specifically, my main interest is machine understanding of intuitive physics for real-world robotic manipulation. My latest work has been on motion understanding via video prediction. During my bachelor's, I also worked on camera geometry for structure from motion.

S Roberts (University of Oxford)
Tim Rocktäschel (University College London, Facebook AI Research)

Tim is a Researcher at Facebook AI Research (FAIR) London, an Associate Professor at the Centre for Artificial Intelligence in the Department of Computer Science at University College London (UCL), and a Scholar of the European Laboratory for Learning and Intelligent Systems (ELLIS). Prior to that, he was a Postdoctoral Researcher in Reinforcement Learning at the University of Oxford, a Junior Research Fellow in Computer Science at Jesus College, and a Stipendiary Lecturer in Computer Science at Hertford College. Tim obtained his Ph.D. from UCL under the supervision of Sebastian Riedel, and he was awarded a Microsoft Research Ph.D. Scholarship in 2013 and a Google Ph.D. Fellowship in 2017. His work focuses on reinforcement learning in open-ended environments that require intrinsically motivated agents capable of transferring commonsense, world and domain knowledge in order to systematically generalize to novel situations.

Edward Grefenstette (Cohere & University College London)

More from the Same Authors