Skip to yearly menu bar Skip to main content


Spotlight
in
Workshop: 6th Robot Learning Workshop: Pretraining, Fine-Tuning, and Generalization with Large Scale Models

Dream2Real: Zero-Shot 3D Object Rearrangement with Vision-Language Models

Ivan Kapelyukh · Yifei Ren · Ignacio Alzugaray · Edward Johns

Keywords: [ Object Rearrangement ] [ vision-language models ] [ NeRF ]


Abstract:

We introduce Dream2Real, a robotics framework which integrates 2D vision-language models into a 3D object rearrangement method. The robot autonomously constructs a 3D NeRF-based representation of the scene, where objects can be rendered in novel arrangements. These renders are evaluated by a VLM, so that the arrangement which best satisfies the user instruction is selected and recreated in the real world via pick-and-place. Real-world results show that this framework enables zero-shot rearrangement, avoiding the need to collect a dataset of example arrangements.

Chat is not available.