Skip to yearly menu bar Skip to main content

Workshop: Generalization in Planning (GenPlan '23)

Understanding Representations Pretrained with Auxiliary Losses for Embodied Agent Planning

Yuxuan (Effie) Li · Luca Weihs

Keywords: [ Representation Learning ] [ Planning ] [ Embodied AI ]


Pretrained representations from large-scale vision models have boosted the performance of downstream embodied policy learning. We look to understand whether additional pretraining using common auxiliary losses in embodied AI can build on these general-purpose visual representations to better support planning in embodied tasks. We use a CLIP visual backbone and pretrain a visual compression module and the agent's state belief representations with four unsupervised auxiliary losses, two hindsight-based losses, and a standard imitation learning loss, on a fixed dataset of exploration trajectories. The learned representations are then frozen for downstream multi-step evaluation on two goal-directed tasks in realistic environments. Surprisingly, we find that imitation learning on these exploration trajectories outperforms all other auxiliary losses even despite the exploration trajectories being dissimilar from the downstream tasks. This suggests that imitation of exploration may be "all you need" for building powerful planning representations. Additionally, we find that simple alternatives of popular auxiliary losses can improve their support for downstream planning ability.

Chat is not available.