Timezone: »
Understanding spatial relations (e.g., laptop on table) in visual input is important for both humans and robots. Existing datasets are insufficient as they lack large-scale, high-quality 3D ground truth information, which is critical for learning spatial relations. In this paper, we fill this gap by constructing Rel3D: the first large-scale, human-annotated dataset for grounding spatial relations in 3D. Rel3D enables quantifying the effectiveness of 3D information in predicting spatial relations on large-scale human data. Moreover, we propose minimally contrastive data collection---a novel crowdsourcing method for reducing dataset bias. The 3D scenes in our dataset come in minimally contrastive pairs: two scenes in a pair are almost identical, but a spatial relation holds in one and fails in the other. We empirically validate that minimally contrastive examples can diagnose issues with current relation detection models as well as lead to sample-efficient training. Code and data are available at https://github.com/princeton-vl/Rel3D.
Author Information
Ankit Goyal (Princeton University)
Kaiyu Yang (Princeton University)
I am a Ph.D. candidate in the Department of Computer Science at Princeton University, where I work with Prof. Jia Deng in Princeton Vision & Learning Lab. I also collaborate closely with Prof. Olga Russakovsky. My research focuses on bridging deep learning and symbolic reasoning, with applications in automated theorem proving and mathematical reasoning in natural languages. Prior to that, I worked in computer vision, including topics such as human poses, visual relationships, and fairness. I received my master’s degree from the University of Michigan and my bachelor’s degree from Tsinghua University.
Dawei Yang (University of Michigan)
Jia Deng (Princeton University)
Related Events (a corresponding poster, oral, or spotlight)
-
2020 Spotlight: Rel3D: A Minimally Contrastive Benchmark for Grounding Spatial Relations in 3D »
Wed. Dec 9th 03:20 -- 03:30 AM Room Orals & Spotlights: Vision Applications
More from the Same Authors
-
2022 : ProgPrompt: Generating Situated Robot Task Plans using Large Language Models »
Ishika Singh · Valts Blukis · Arsalan Mousavian · Ankit Goyal · Danfei Xu · Jonathan Tremblay · Dieter Fox · Jesse Thomason · Animesh Garg -
2021 : Fairness and privacy aspects of ImageNet »
Olga Russakovsky · Kaiyu Yang -
2021 Oral: DROID-SLAM: Deep Visual SLAM for Monocular, Stereo, and RGB-D Cameras »
Zachary Teed · Jia Deng -
2021 Poster: DROID-SLAM: Deep Visual SLAM for Monocular, Stereo, and RGB-D Cameras »
Zachary Teed · Jia Deng -
2020 Poster: Learning to Prove Theorems by Learning to Generate Theorems »
Mingzhe Wang · Jia Deng -
2020 Poster: Strongly Incremental Constituency Parsing with Graph Neural Networks »
Kaiyu Yang · Jia Deng -
2016 Poster: Single-Image Depth Perception in the Wild »
Weifeng Chen · Zhao Fu · Dawei Yang · Jia Deng -
2011 Poster: Fast and Balanced: Efficient Label Tree Learning for Large Scale Object Recognition »
Jia Deng · Sanjeev Satheesh · Alexander C Berg · Li Fei-Fei