Skip to yearly menu bar Skip to main content


Poster
in
Workshop: Agent Learning in Open-Endedness Workshop

JARVIS-1: Open-Ended Multi-task Agents with Memory-Augmented Multimodal Language Models

Zihao Wang · Shaofei Cai · Anji Liu · Xiaojian (Shawn) Ma · Yitao Liang

Keywords: [ Minecraft ] [ Multi-task Agent ] [ Open-ended Agent ]


Abstract:

We propose a multi-task agent JARVIS-1 designed for the complex environment of Minecraft, marks a significant advancement in achieving human-like planning within an open-world setting. By leveraging pre-trained Vision-Language Models, JARVIS-1 not only effectively interprets multimodal inputs but also adeptly translates them into actions. Its integration of a multimodal memory, which draws from both ingrained knowledge and real-time game experiences, enhances its decision-making capabilities. The empirical evidence of its prowess is evident in its impressive performance across a wide array of tasks in Minecraft. Notably, its achievement in the long-horizon diamond pickaxe task, where it achieved a completion rate that surpasses VPT by up to 5 times, underscores its potential and the strides made in this domain. This breakthrough sets the stage for the future of more versatile and adaptable agents in complex virtual environments.

Chat is not available.