Poster
in
Workshop: Foundation Models for Decision Making

What Makes Certain Pre-Trained Visual Representations Better for Robotic Learning?

Kyle Hsu · Tyler Lum · Ruohan Gao · Shixiang (Shane) Gu · Jiajun Wu · Chelsea Finn

Project Page [ Poster] [ OpenReview]

Abstract

Deep learning for robotics is data-intensive, but collecting high-quality robotics data at scale is prohibitively expensive. One approach to mitigate this is to leverage visual representations pre-trained on relatively abundant non-robotic datasets. So far, existing works have focused on proposing pre-training strategies and assessing them via ablation studies, giving high-level knowledge of how pre-training design choices affect downstream performance. However, the significant gap in data and objective between the two stages motivates a more detailed understanding of what properties of better pre-trained visual representations enable their comparative advantage. In this work, we empirically analyze the representations of robotic manipulation data from several standard benchmarks under a variety of pre-trained models, correlating key metrics of the representations with closed-loop task performance after behavior cloning. We find evidence that suggests our proposed metrics have substantive predictive power for downstream robotic learning.

Video

Chat is not available.