Timezone: »

What Makes Certain Pre-Trained Visual Representations Better for Robotic Learning?
Kyle Hsu · Tyler Lum · Ruohan Gao · Shixiang (Shane) Gu · Jiajun Wu · Chelsea Finn
Event URL: https://openreview.net/forum?id=X2ZWPdouHRm »

Deep learning for robotics is data-intensive, but collecting high-quality robotics data at scale is prohibitively expensive. One approach to mitigate this is to leverage visual representations pre-trained on relatively abundant non-robotic datasets. So far, existing works have focused on proposing pre-training strategies and assessing them via ablation studies, giving high-level knowledge of how pre-training design choices affect downstream performance. However, the significant gap in data and objective between the two stages motivates a more detailed understanding of what properties of better pre-trained visual representations enable their comparative advantage. In this work, we empirically analyze the representations of robotic manipulation data from several standard benchmarks under a variety of pre-trained models, correlating key metrics of the representations with closed-loop task performance after behavior cloning. We find evidence that suggests our proposed metrics have substantive predictive power for downstream robotic learning.

Author Information

Kyle Hsu (Stanford University)
Tyler Lum (Computer Science Department, Stanford University)
Ruohan Gao (Stanford University)
Shixiang (Shane) Gu (Google Brain)
Jiajun Wu (Stanford University)
Chelsea Finn (Stanford)

More from the Same Authors