Skip to yearly menu bar Skip to main content


Poster

UniDSeg: Unified Cross-Domain 3D Semantic Segmentation via Visual Foundation Models Prior

Yao Wu · Mingwei Xing · Yachao Zhang · Xiaotong Luo · Yuan Xie · Yanyun Qu

East Exhibit Hall A-C #1606
[ ] [ Project Page ]
Thu 12 Dec 11 a.m. PST — 2 p.m. PST

Abstract: 3D semantic segmentation using an adapting model trained from a source domain with or without accessing unlabeled target-domain data is the fundamental task in computer vision, containing domain adaptation and domain generalization.The essence of simultaneously solving cross-domain tasks is to enhance the generalizability of the encoder.In light of this, we propose a groundbreaking universal method with the help of off-the-shelf Visual Foundation Models (VFMs) to boost the adaptability and generalizability of cross-domain 3D semantic segmentation, dubbed $\textbf{UniDSeg}$.Our method explores the VFMs prior and how to harness them, aiming to inherit the recognition ability of VFMs.Specifically, this method introduces layer-wise learnable blocks to the VFMs, which hinges on alternately learning two representations during training: (i) Learning visual prompt. The 3D-to-2D transitional prior and task-shared knowledge is captured from the prompt space, and then (ii) Learning deep query. Spatial Tunability is constructed to the representation of distinct instances driven by prompts in the query space.Integrating these representations into a cross-modal learning framework, UniDSeg efficiently mitigates the domain gap between 2D and 3D modalities, achieving unified cross-domain 3D semantic segmentation.Extensive experiments demonstrate the effectiveness of our method across widely recognized tasks and datasets, all achieving superior performance over state-of-the-art methods. Remarkably, UniDSeg achieves 57.5\%/54.4\% mIoU on ``A2D2/sKITTI'' for domain adaptive/generalized tasks. Code is available at https://github.com/Barcaaaa/UniDSeg.

Live content is unavailable. Log in and register to view live content