Timezone: »

 
Spotlight Poster
STEVE-1: A Generative Model for Text-to-Behavior in Minecraft
Shalev Lifshitz · Keiran Paster · Harris Chan · Jimmy Ba · Sheila McIlraith

Wed Dec 13 03:00 PM -- 05:00 PM (PST) @ Great Hall & Hall B1+B2 #1417
Event URL: https://sites.google.com/view/steve-1 »

Constructing AI models that respond to text instructions is challenging, especially for sequential decision-making tasks. This work introduces an instruction-tuned Video Pretraining (VPT) model for Minecraft called STEVE-1, demonstrating that the unCLIP approach, utilized in DALL•E 2, is also effective for creating instruction-following sequential decision-making agents. STEVE-1 is trained in two steps: adapting the pretrained VPT model to follow commands in MineCLIP's latent space, then training a prior to predict latent codes from text. This allows us to finetune VPT through self-supervised behavioral cloning and hindsight relabeling, bypassing the need for costly human text annotations. By leveraging pretrained models like VPT and MineCLIP and employing best practices from text-conditioned image generation, STEVE-1 costs just $60 to train and can follow short-horizon open-ended text and visual instructions in Minecraft. STEVE-1 sets a new bar for open-ended instruction following in Minecraft with low-level controls (mouse and keyboard) and raw pixel inputs, far outperforming previous baselines and robustly completing 12 of 13 tasks in our early-game evaluation suite. We provide experimental evidence highlighting key factors for downstream performance, including pretraining, classifier-free guidance, and data scaling. All resources, including our model weights, training scripts, and evaluation tools are made available for further research.

Author Information

Shalev Lifshitz (University of Toronto & Vector Institute)
Keiran Paster (University of Toronto)
Harris Chan (Google DeepMind, University of Toronto, Vector Institute)
Jimmy Ba (University of Toronto / Vector Institute)
Sheila McIlraith (University of Toronto and Vector Institute)

More from the Same Authors