Timezone: »
The rise of generalist large-scale models in natural language and vision has made us expect that a massive data-driven approach could achieve broader generalization in other domains such as continuous control. In this work, we explore a method for learning a single policy that manipulates various forms of agents to solve various tasks by distilling a large amount of proficient behavioral data. In order to align input-output (IO) interface among multiple tasks and diverse agent morphologies while preserving essential 3D geometric relations, we introduce control graph, which treats observations, actions and goals/task in a unified graph representation. We also develop MxT-Bench for fast large-scale behavior generation, which supports procedural generation of diverse morphology-task combinations with a minimal blueprint and hardware-accelerated simulator. Through efficient representation and architecture selection on MxT-Bench, we find out that a control graph representation coupled with Transformer architecture improves the multi-task performances compared to other baselines including recent discrete tokenization, and provides better prior knowledge for zero-shot transfer or sample efficiency in downstream multi-task imitation learning. Our work suggests large diverse offline datasets, unified IO representation, and policy representation and architecture selection through supervised learning form a promising approach for studying and advancing morphology task generalization.
Author Information
Hiroki Furuta (The University of Tokyo)
Yusuke Iwasawa (The University of Tokyo)
Yutaka Matsuo (University of Tokyo)
Shixiang (Shane) Gu (Google Brain)
More from the Same Authors
-
2021 Spotlight: Test-Time Classifier Adjustment Module for Model-Agnostic Domain Generalization »
Yusuke Iwasawa · Yutaka Matsuo -
2021 : Distributional Decision Transformer for Offline Hindsight Information Matching »
Hiroki Furuta · Yutaka Matsuo · Shixiang (Shane) Gu -
2022 : What Makes Certain Pre-Trained Visual Representations Better for Robotic Learning? »
Kyle Hsu · Tyler Lum · Ruohan Gao · Shixiang (Shane) Gu · Jiajun Wu · Chelsea Finn -
2022 : Control Graph as Unified IO for Morphology-Task Generalization »
Hiroki Furuta · Yusuke Iwasawa · Yutaka Matsuo · Shixiang (Shane) Gu -
2022 : What Makes Certain Pre-Trained Visual Representations Better for Robotic Learning? »
Kyle Hsu · Tyler Lum · Ruohan Gao · Shixiang (Shane) Gu · Jiajun Wu · Chelsea Finn -
2022 Workshop: Foundation Models for Decision Making »
Mengjiao (Sherry) Yang · Yilun Du · Jack Parker-Holder · Siddharth Karamcheti · Igor Mordatch · Shixiang (Shane) Gu · Ofir Nachum -
2022 Poster: Large Language Models are Zero-Shot Reasoners »
Takeshi Kojima · Shixiang (Shane) Gu · Machel Reid · Yutaka Matsuo · Yusuke Iwasawa -
2022 Poster: Langevin Autoencoders for Learning Deep Latent Variable Models »
Shohei Taniguchi · Yusuke Iwasawa · Wataru Kumagai · Yutaka Matsuo -
2022 Poster: Why So Pessimistic? Estimating Uncertainties for Offline RL through Ensembles, and Why Their Independence Matters »
Kamyar Ghasemipour · Shixiang (Shane) Gu · Ofir Nachum -
2021 Workshop: Ecological Theory of Reinforcement Learning: How Does Task Design Influence Agent Learning? »
Manfred Díaz · Hiroki Furuta · Elise van der Pol · Lisa Lee · Shixiang (Shane) Gu · Pablo Samuel Castro · Simon Du · Marc Bellemare · Sergey Levine -
2021 Poster: Co-Adaptation of Algorithmic and Implementational Innovations in Inference-based Deep Reinforcement Learning »
Hiroki Furuta · Tadashi Kozuno · Tatsuya Matsushima · Yutaka Matsuo · Shixiang (Shane) Gu -
2021 Poster: Test-Time Classifier Adjustment Module for Model-Agnostic Domain Generalization »
Yusuke Iwasawa · Yutaka Matsuo