Skip to yearly menu bar Skip to main content


Poster

UniDSeg: Unified Cross-Domain 3D Semantic Segmentation via Visual Foundation Models Prior

Yao Wu · Mingwei Xing · Yachao Zhang · Xiaotong Luo · Yuan Xie · Yanyun Qu


Abstract:

3D semantic segmentation using an adapting model trained from a source domain with or without accessing unlabeled target-domain data is the fundamental task in computer vision, containing domain adaptation and domain generalization.The essence of simultaneously solving cross-domain tasks is to enhance the generalizability of the encoder.In light of this, we propose a groundbreaking universal method with the help of off-the-shelf Visual Foundation Models (VFMs) to boost the adaptability and generalizability of cross-domain 3D semantic segmentation, dubbed \textbf{UniDSeg}.Our method explores the VFMs prior and how to harness them, aiming to inherit the recognition ability of VFMs.Specifically, this method introduces layer-wise learnable blocks to the VFMs, which hinges on alternately learning two representations during training: \emph{(i)} Learning visual prompt. The 3D-to-2D transitional prior and task-shared knowledge is captured from the prompt space, and then \emph{(ii)} Learning deep query. Spatial Tunability is constructed to the representation of distinct instances driven by prompts in the query space.Integrating these representations into a cross-modal learning framework, UniDSeg efficiently mitigates the domain gap between 2D and 3D modalities, achieving unified cross-domain 3D semantic segmentation.Extensive experiments demonstrate the effectiveness of our method across widely recognized tasks and datasets, all achieving superior performance over state-of-the-art methods. Remarkably, UniDSeg achieves 57.5\%/54.4\% mIoU on ``A2D2/sKITTI'' for domain adaptive/generalized tasks. Code is available at \textcolor{blue}{\url{https://anonymous.4open.science/r/UniDSeg-4BC1/}}.

Live content is unavailable. Log in and register to view live content