Poster
in
Workshop: Foundation Models for Decision Making

Multimodal Pretrained Models for Verifiable Sequential Decision-Making: Planning, Grounding, and Perception

Yunhao Yang · Cyrus Neary · Ufuk Topcu

Project Page [ OpenReview]

Abstract

Recently developed multimodal pretrained models can encode rich world knowledge expressed in multiple modalities, such as text and images. However, the outputs of these models cannot be integrated into algorithms to solve sequential decision-making tasks. We develop an algorithm that utilizes the knowledge from the pretrained models to construct and verify controllers for sequential decision-making tasks and to ground these controllers to task environments through visual observations. In particular, the algorithm constructs an automaton-based controller that encodes the task-relevant knowledge extracted from the pretrained model. It then verifies whether the knowledge encoded in the controller is consistent with other independently available knowledge, which may include abstract information on the environment or user-provided specifications. If this verification step discovers any inconsistency, the algorithm automatically refines the controller to resolve the inconsistency. Next, the algorithm leverages the vision and language capabilities of pretrained models to ground the controller to the task environment. We demonstrate the algorithm's ability to construct, verify, and ground automaton-based controllers through a suite of real-world tasks.

Chat is not available.