Timezone: »
Constructing AI models that respond to text instructions is challenging, especially for sequential decision-making tasks. This work introduces an instruction-tuned Video Pretraining (VPT) model for Minecraft called STEVE-1, demonstrating that the unCLIP approach, utilized in DALL•E 2, is also effective for creating instruction-following sequential decision-making agents. STEVE-1 is trained in two steps: adapting the pretrained VPT model to follow commands in MineCLIP's latent space, then training a prior to predict latent codes from text. This allows us to finetune VPT through self-supervised behavioral cloning and hindsight relabeling, bypassing the need for costly human text annotations. By leveraging pretrained models like VPT and MineCLIP and employing best practices from text-conditioned image generation, STEVE-1 costs just $60 to train and can follow short-horizon open-ended text and visual instructions in Minecraft. STEVE-1 sets a new bar for open-ended instruction following in Minecraft with low-level controls (mouse and keyboard) and raw pixel inputs, far outperforming previous baselines and robustly completing 12 of 13 tasks in our early-game evaluation suite. We provide experimental evidence highlighting key factors for downstream performance, including pretraining, classifier-free guidance, and data scaling. All resources, including our model weights, training scripts, and evaluation tools are made available for further research.
Author Information
Shalev Lifshitz (University of Toronto & Vector Institute)
Keiran Paster (University of Toronto)
Harris Chan (Google DeepMind, University of Toronto, Vector Institute)
Jimmy Ba (University of Toronto / Vector Institute)
Sheila McIlraith (University of Toronto and Vector Institute)
More from the Same Authors
-
2020 : Poster #6 »
Sheila McIlraith -
2021 : Avoiding Negative Side Effects by Considering Others »
Parand A. Alamdari · Toryn Klassen · Rodrigo Toro Icarte · Sheila McIlraith -
2021 : BLAST: Latent Dynamics Models from Bootstrapping »
Keiran Paster · Lev McKinney · Sheila McIlraith · Jimmy Ba -
2022 : Large Language Models Are Human-Level Prompt Engineers »
Yongchao Zhou · Andrei Muresanu · Ziwen Han · Silviu Pitis · Harris Chan · Keiran Paster · Jimmy Ba -
2022 : Return Augmentation gives Supervised RL Temporal Compositionality »
Keiran Paster · Silviu Pitis · Sheila McIlraith · Jimmy Ba -
2022 : Skill Acquisition by Instruction Augmentation on Offline Datasets »
Ted Xiao · Harris Chan · Pierre Sermanet · Ayzaan Wahid · Anthony Brohan · Karol Hausman · Sergey Levine · Jonathan Tompson -
2022 : Noisy Symbolic Abstractions for Deep RL: A case study with Reward Machines »
Andrew Li · Zizhao Chen · Pashootan Vaezipoor · Toryn Klassen · Rodrigo Toro Icarte · Sheila McIlraith -
2022 : Temporary Goals for Exploration »
Haoyang Xu · Jimmy Ba · Silviu Pitis · Harris Chan -
2022 : Return Augmentation gives Supervised RL Temporal Compositionality »
Keiran Paster · Silviu Pitis · Sheila McIlraith · Jimmy Ba -
2022 : Guiding Exploration Towards Impactful Actions »
Vaibhav Saxena · Jimmy Ba · Danijar Hafner -
2022 : Steering Large Language Models using APE »
Yongchao Zhou · Andrei Muresanu · Ziwen Han · Keiran Paster · Silviu Pitis · Harris Chan · Jimmy Ba -
2022 : Epistemic Side Effects & Avoiding Them (Sometimes) »
Toryn Klassen · Parand A. Alamdari · Sheila McIlraith -
2022 : Rational Multi-Objective Agents Must Admit Non-Markov Reward Representations »
Silviu Pitis · Duncan Bailey · Jimmy Ba -
2023 : OpenWebMath: An Open Dataset of High-Quality Mathematical Web Text - Poster »
Keiran Paster · Marco Dos Santos · Zhangir Azerbayev · Jimmy Ba -
2023 : Llemma: An Open Language Model For Mathematics »
Zhangir Azerbayev · Hailey Schoelkopf · Keiran Paster · Marco Dos Santos · Stephen McAleer · Albert Q. Jiang · Jia Deng · Stella Biderman · Sean Welleck -
2023 : Remembering to Be Fair: On Non-Markovian Fairness in Sequential Decision Making »
Parand A. Alamdari · Toryn Klassen · Elliot Creager · Sheila McIlraith -
2023 : STEVE-1: A Generative Model for Text-to-Behavior in Minecraft »
Shalev Lifshitz · Keiran Paster · Harris Chan · Jimmy Ba · Sheila McIlraith -
2023 : Vision-Language Models as a Source of Rewards »
Harris Chan · Volodymyr Mnih · Feryal Behbahani · Michael Laskin · Luyu Wang · Fabio Pardo · Maxime Gazeau · Himanshu Sahni · Daniel Horgan · Kate Baumli · Yannick Schroecker · Stephen Spencer · Richie Steigerwald · John Quan · Gheorghe Comanici · Sebastian Flennerhag · Alexander Neitz · Lei Zhang · Tom Schaul · Satinder Singh · Clare Lyle · Tim Rocktäschel · Jack Parker-Holder · Kristian Holsheimer -
2023 : STEVE-1: A Generative Model for Text-to-Behavior in Minecraft »
Shalev Lifshitz · Keiran Paster · Harris Chan · Jimmy Ba · Sheila McIlraith -
2023 : STEVE-1: A Generative Model for Text-to-Behavior in Minecraft »
Shalev Lifshitz · Keiran Paster · Harris Chan · Jimmy Ba · Sheila McIlraith -
2023 : Identifying the Risks of LM Agents with an LM-Emulated Sandbox »
Yangjun Ruan · Honghua Dong · Andrew Wang · Silviu Pitis · Yongchao Zhou · Jimmy Ba · Yann Dubois · Chris Maddison · Tatsunori Hashimoto -
2023 : Identifying the Risks of LM Agents with an LM-Emulated Sandbox »
Yangjun Ruan · Honghua Dong · Andrew Wang · Silviu Pitis · Yongchao Zhou · Jimmy Ba · Yann Dubois · Chris Maddison · Tatsunori Hashimoto -
2023 : Using Large Language Models for Hyperparameter Optimization »
Michael Zhang · Nishkrit Desai · Juhan Bae · Jonathan Lorraine · Jimmy Ba -
2023 : Using Large Language Models for Hyperparameter Optimization »
Michael Zhang · Nishkrit Desai · Juhan Bae · Jonathan Lorraine · Jimmy Ba -
2023 : Remembering to Be Fair: On Non-Markovian Fairness in Sequential Decision Making »
Parand A. Alamdari · Toryn Klassen · Elliot Creager · Sheila McIlraith -
2023 Poster: Learning in the Presence of Low-dimensional Structure: A Spiked Random Matrix Perspective »
Jimmy Ba · Murat Erdogdu · Taiji Suzuki · Zhichao Wang · Denny Wu -
2023 Poster: AlpacaFarm: A Simulation Framework for Methods that Learn from Human Feedback »
Yann Dubois · Chen Xuechen Li · Rohan Taori · Tianyi Zhang · Ishaan Gulrajani · Jimmy Ba · Carlos Guestrin · Percy Liang · Tatsunori Hashimoto -
2022 : Invited Talk by Jimmy Ba »
Jimmy Ba -
2022 Poster: Learning to Follow Instructions in Text-Based Games »
Mathieu Tuli · Andrew Li · Pashootan Vaezipoor · Toryn Klassen · Scott Sanner · Sheila McIlraith -
2022 Poster: High-dimensional Asymptotics of Feature Learning: How One Gradient Step Improves the Representation »
Jimmy Ba · Murat Erdogdu · Taiji Suzuki · Zhichao Wang · Denny Wu · Greg Yang -
2022 Poster: You Can’t Count on Luck: Why Decision Transformers and RvS Fail in Stochastic Environments »
Keiran Paster · Sheila McIlraith · Jimmy Ba -
2022 Poster: Dataset Distillation using Neural Feature Regression »
Yongchao Zhou · Ehsan Nezhadarya · Jimmy Ba -
2021 Poster: Clockwork Variational Autoencoders »
Vaibhav Saxena · Jimmy Ba · Danijar Hafner -
2021 Poster: Learning Domain Invariant Representations in Goal-conditioned Block MDPs »
Beining Han · Chongyi Zheng · Harris Chan · Keiran Paster · Michael Zhang · Jimmy Ba -
2021 Poster: How does a Neural Network's Architecture Impact its Robustness to Noisy Labels? »
Jingling Li · Mozhi Zhang · Keyulu Xu · John Dickerson · Jimmy Ba -
2020 : Contributed Talk #2: Evaluating Agents Without Rewards »
Brendon Matusch · Danijar Hafner · Jimmy Ba -
2020 : Contributed Talk: Planning from Pixels using Inverse Dynamics Models »
Keiran Paster · Sheila McIlraith · Jimmy Ba -
2020 Session: Orals & Spotlights Track 34: Deep Learning »
Tuo Zhao · Jimmy Ba -
2019 : Poster and Coffee Break 2 »
Karol Hausman · Kefan Dong · Ken Goldberg · Lihong Li · Lin Yang · Lingxiao Wang · Lior Shani · Liwei Wang · Loren Amdahl-Culleton · Lucas Cassano · Marc Dymetman · Marc Bellemare · Marcin Tomczak · Margarita Castro · Marius Kloft · Marius-Constantin Dinu · Markus Holzleitner · Martha White · Mengdi Wang · Michael Jordan · Mihailo Jovanovic · Ming Yu · Minshuo Chen · Moonkyung Ryu · Muhammad Zaheer · Naman Agarwal · Nan Jiang · Niao He · Nikolaus Yasui · Nikos Karampatziakis · Nino Vieillard · Ofir Nachum · Olivier Pietquin · Ozan Sener · Pan Xu · Parameswaran Kamalaruban · Paul Mineiro · Paul Rolland · Philip Amortila · Pierre-Luc Bacon · Prakash Panangaden · Qi Cai · Qiang Liu · Quanquan Gu · Raihan Seraj · Richard Sutton · Rick Valenzano · Robert Dadashi · Rodrigo Toro Icarte · Roshan Shariff · Roy Fox · Ruosong Wang · Saeed Ghadimi · Samuel Sokota · Sean Sinclair · Sepp Hochreiter · Sergey Levine · Sergio Valcarcel Macua · Sham Kakade · Shangtong Zhang · Sheila McIlraith · Shie Mannor · Shimon Whiteson · Shuai Li · Shuang Qiu · Wai Lok Li · Siddhartha Banerjee · Sitao Luan · Tamer Basar · Thinh Doan · Tianhe Yu · Tianyi Liu · Tom Zahavy · Toryn Klassen · Tuo Zhao · Vicenç Gómez · Vincent Liu · Volkan Cevher · Wesley Suttle · Xiao-Wen Chang · Xiaohan Wei · Xiaotong Liu · Xingguo Li · Xinyi Chen · Xingyou Song · Yao Liu · YiDing Jiang · Yihao Feng · Yilun Du · Yinlam Chow · Yinyu Ye · Yishay Mansour · · Yonathan Efroni · Yongxin Chen · Yuanhao Wang · Bo Dai · Chen-Yu Wei · Harsh Shrivastava · Hongyang Zhang · Qinqing Zheng · SIDDHARTHA SATPATHI · Xueqing Liu · Andreu Vall -
2019 : Posters »
Colin Graber · Yuan-Ting Hu · Tiantian Fang · Jessica Hamrick · Giorgio Giannone · John Co-Reyes · Boyang Deng · Eric Crawford · Andrea Dittadi · Peter Karkus · Matthew Dirks · Rakshit Trivedi · Sunny Raj · Javier Felip Leon · Harris Chan · Jan Chorowski · Jeff Orchard · Aleksandar Stanić · Adam Kortylewski · Ben Zinberg · Chenghui Zhou · Wei Sun · Vikash Mansinghka · Chun-Liang Li · Marco Cusumano-Towner -
2019 : Poster Session »
Eduard Gorbunov · Alexandre d'Aspremont · Lingxiao Wang · Liwei Wang · Boris Ginsburg · Alessio Quaglino · Camille Castera · Saurabh Adya · Diego Granziol · Rudrajit Das · Raghu Bollapragada · Fabian Pedregosa · Martin Takac · Majid Jahani · Sai Praneeth Karimireddy · Hilal Asi · Balint Daroczy · Leonard Adolphs · Aditya Rawal · Nicolas Brandt · Minhan Li · Giuseppe Ughi · Orlando Romero · Ivan Skorokhodov · Damien Scieur · Kiwook Bae · Konstantin Mishchenko · Rohan Anil · Vatsal Sharan · Aditya Balu · Chao Chen · Zhewei Yao · Tolga Ergen · Paul Grigas · Chris Junchi Li · Jimmy Ba · Stephen J Roberts · Sharan Vaswani · Armin Eftekhari · Chhavi Sharma -
2019 Poster: Lookahead Optimizer: k steps forward, 1 step back »
Michael Zhang · James Lucas · Jimmy Ba · Geoffrey E Hinton -
2019 Poster: Graph Normalizing Flows »
Jenny Liu · Aviral Kumar · Jimmy Ba · Jamie Kiros · Kevin Swersky -
2019 Poster: Learning Reward Machines for Partially Observable Reinforcement Learning »
Rodrigo Toro Icarte · Ethan Waldie · Toryn Klassen · Rick Valenzano · Margarita Castro · Sheila McIlraith -
2019 Spotlight: Learning Reward Machines for Partially Observable Reinforcement Learning »
Rodrigo Toro Icarte · Ethan Waldie · Toryn Klassen · Rick Valenzano · Margarita Castro · Sheila McIlraith -
2018 : Poster Session »
Carl Trimbach · Mennatullah Siam · Rodrigo Toro Icarte · Zhongtian Dai · Sheila McIlraith · Matthew Rahtz · Robert Sheline · Christopher MacLellan · Carolin Lawrence · Stefan Riezler · Dylan Hadfield-Menell · Fang-I Hsiao -
2018 : Teaching Multiple Tasks to an RL Agent using LTL »
Rodrigo Toro Icarte · Sheila McIlraith -
2018 Poster: Reversible Recurrent Neural Networks »
Matthew MacKay · Paul Vicol · Jimmy Ba · Roger Grosse