Timezone: »
Pretraining on noisy, internet-scale datasets has been heavily studied as a technique for training models with broad, general capabilities for text, images, and other modalities. However, for many sequential decision domains such as robotics, video games, and computer use, publicly available data does not contain the labels required to train behavioral priors in the same way. We extend the internet-scale pretraining paradigm to sequential decision domains through semi-supervised imitation learning wherein agents learn to act by watching online unlabeled videos. Specifically, we show that with a small amount of labeled data we can train an inverse dynamics model accurate enough to label a huge unlabeled source of online data -- here, online videos of people playing Minecraft -- from which we can then train a general behavioral prior. Despite using the native human interface (mouse and keyboard at 20Hz), we show that this behavioral prior has nontrivial zero-shot capabilities and that it can be fine-tuned, with both imitation learning and reinforcement learning, to hard-exploration tasks that are impossible to learn from scratch via reinforcement learning. For many tasks our models exhibit human-level performance, and we are the first to report computer agents that can craft diamond tools, which can take proficient humans upwards of 20 minutes (24,000 environment actions) of gameplay to accomplish.
Author Information
Bowen Baker (OpenAI)
Ilge Akkaya (OpenAI)
Peter Zhokov
Joost Huizinga (OpenAI)
Jie Tang (UC Berkeley)
Adrien Ecoffet (OpenAI)
Brandon Houghton (OpenAI)
Raul Sampedro
Jeff Clune (University of British Columbia)
More from the Same Authors
-
2022 : Fifteen-minute Competition Overview Video »
Byron Galbraith · Anssi Kanervisto · Steven Wang · Stephanie Milani · Sharada Mohanty · Rohin Shah · Karolis Ramanauskas · Brandon Houghton -
2023 Poster: Thought Cloning: Learning to Think while Acting by Imitating Human Thinking »
Shengran Hu · Jeff Clune -
2023 Poster: BEDD: The MineRL BASALT Evaluation and Demonstrations Dataset for Training and Benchmarking Agents that Solve Fuzzy Tasks »
Stephanie Milani · Anssi Kanervisto · Karolis Ramanauskas · Sander Schulhoff · Brandon Houghton · Rohin Shah -
2023 Oral: BEDD: The MineRL BASALT Evaluation and Demonstrations Dataset for Training and Benchmarking Agents that Solve Fuzzy Tasks »
Stephanie Milani · Anssi Kanervisto · Karolis Ramanauskas · Sander Schulhoff · Brandon Houghton · Rohin Shah -
2022 Panel: Panel 1B-4: Video PreTraining (VPT):… & Energy-Based Contrastive Learning… »
Beomsu Kim · Bowen Baker -
2022 Competition: The MineRL BASALT Competition on Fine-tuning from Human Feedback »
Anssi Kanervisto · Stephanie Milani · Karolis Ramanauskas · Byron Galbraith · Steven Wang · Brandon Houghton · Sharada Mohanty · Rohin Shah -
2021 : BASALT: A MineRL Competition on Solving Human-Judged Task + Q&A »
Rohin Shah · Cody Wild · Steven Wang · Neel Alex · Brandon Houghton · William Guss · Sharada Mohanty · Stephanie Milani · Nicholay Topin · Pieter Abbeel · Stuart Russell · Anca Dragan -
2021 : Diamond: A MineRL Competition on Training Sample-Efficient Agents + Q&A »
William Guss · Alara Dirik · Byron Galbraith · Brandon Houghton · Anssi Kanervisto · Noboru Kuno · Stephanie Milani · Sharada Mohanty · Karolis Ramanauskas · Ruslan Salakhutdinov · Rohin Shah · Nicholay Topin · Steven Wang · Cody Wild -
2020 : Contributed Talk: Asymmetric self-play for automatic goal discovery in robotic manipulation »
OpenAI Robotics · Matthias Plappert · Raul Sampedro · Tao Xu · Ilge Akkaya · Vineet Kosaraju · Peter Welinder · Ruben D'Sa · Arthur Petron · Henrique Ponde · Alex Paino · Hyeonwoo Noh Noh · Lilian Weng · Qiming Yuan · Casey Chu · Wojciech Zaremba -
2020 Workshop: Meta-Learning »
Jane Wang · Joaquin Vanschoren · Erin Grant · Jonathan Richard Schwarz · Francesco Visin · Jeff Clune · Roberto Calandra -
2010 Spotlight: On a Connection between Importance Sampling and the Likelihood Ratio Policy Gradient »
Jie Tang · Pieter Abbeel