Timezone: »
In this work, we present a unified framework for multi-modality 3D object detection, named UVTR. The proposed method aims to unify multi-modality representations in the voxel space for accurate and robust single- or cross-modality 3D detection. To this end, the modality-specific space is first designed to represent different inputs in the voxel feature space. Different from previous work, our approach preserves the voxel space without height compression to alleviate semantic ambiguity and enable spatial connections. To make full use of the inputs from different sensors, the cross-modality interaction is then proposed, including knowledge transfer and modality fusion. In this way, geometry-aware expressions in point clouds and context-rich features in images are well utilized for better performance and robustness. The transformer decoder is applied to efficiently sample features from the unified space with learnable positions, which facilitates object-level interactions. In general, UVTR presents an early attempt to represent different modalities in a unified framework. It surpasses previous work in single- or multi-modality entries. The proposed method achieves leading performance in the nuScenes test set for both object detection and the following object tracking task. Code is made publicly available at https://github.com/dvlab-research/UVTR.
Author Information
Yanwei Li (The Chinese University of Hong Kong)
Yilun Chen (The Chinese University of Hong Kong)
Xiaojuan Qi (The University of Hong Kong)
Zeming Li (Megvii(Face++) Inc)
Jian Sun (Megvii, Face++)
Jiaya Jia (CUHK)
More from the Same Authors
-
2021 Spotlight: Spherical Motion Dynamics: Learning Dynamics of Normalized Neural Network using SGD and Weight Decay »
Ruosi Wan · Zhanxing Zhu · Xiangyu Zhang · Jian Sun -
2022 Poster: Towards Efficient 3D Object Detection with Knowledge Distillation »
Jihan Yang · Shaoshuai Shi · Runyu Ding · Zhe Wang · Xiaojuan Qi -
2023 Poster: Data Pruning via Moving-one-Sample-out »
Haoru Tan · Sitong Wu · Fei Du · Yukang Chen · Zhibin Wang · Fan Wang · Xiaojuan Qi -
2023 Poster: CL-NeRF: Continual Learning of Neural Radiance Fields for Evolving Scene Representation »
Xiuzhe Wu · Peng Dai · Weipeng DENG · Handi Chen · Yang Wu · Yan-Pei Cao · Ying Shan · Xiaojuan Qi -
2023 Poster: Real-World Image Variation by Aligning Diffusion Inversion Chain »
Yuechen Zhang · Jinbo Xing · Eric Lo · Jiaya Jia -
2023 Poster: DiffComplete: Diffusion-based Generative 3D Shape Completion »
Ruihang Chu · Enze Xie · Shentong Mo · Zhenguo Li · Matthias Niessner · Chi-Wing Fu · Jiaya Jia -
2023 Poster: GPT4Tools: Teaching Large Language Model to Use Tools via Self-instruction »
Rui Yang · Lin Song · Yanwei Li · Sijie Zhao · Yixiao Ge · Xiu Li · Ying Shan -
2023 Poster: CoDet: Co-occurrence Guided Region-Word Alignment for Open-Vocabulary Object Detection »
Chuofan Ma · Yi Jiang · Xin Wen · Zehuan Yuan · Xiaojuan Qi -
2022 Poster: Spatial Pruned Sparse Convolution for Efficient 3D Object Detection »
Jianhui Liu · Yukang Chen · Xiaoqing Ye · Zhuotao Tian · Xiao Tan · Xiaojuan Qi -
2022 Poster: Prototypical VoteNet for Few-Shot 3D Point Cloud Object Detection »
Shizhen Zhao · Xiaojuan Qi -
2022 Poster: Self-Supervised Visual Representation Learning with Semantic Grouping »
Xin Wen · Bingchen Zhao · Anlin Zheng · Xiangyu Zhang · Xiaojuan Qi -
2022 Poster: Rethinking Resolution in the Context of Efficient Video Recognition »
Chuofan Ma · Qiushan Guo · Yi Jiang · Ping Luo · Zehuan Yuan · Xiaojuan Qi -
2021 Poster: Spherical Motion Dynamics: Learning Dynamics of Normalized Neural Network using SGD and Weight Decay »
Ruosi Wan · Zhanxing Zhu · Xiangyu Zhang · Jian Sun -
2021 Poster: Blending Anti-Aliasing into Vision Transformer »
Shengju Qian · Hao Shao · Yi Zhu · Mu Li · Jiaya Jia -
2021 Poster: Dynamic Grained Encoder for Vision Transformers »
Lin Song · Songyang Zhang · Songtao Liu · Zeming Li · Xuming He · Hongbin Sun · Jian Sun · Nanning Zheng -
2021 Poster: Instance-Conditional Knowledge Distillation for Object Detection »
Zijian Kang · Peizhen Zhang · Xiangyu Zhang · Jian Sun · Nanning Zheng -
2020 Poster: Lightweight Generative Adversarial Networks for Text-Guided Image Manipulation »
Bowen Li · Xiaojuan Qi · Philip Torr · Thomas Lukasiewicz -
2020 Poster: Rethinking Learnable Tree Filter for Generic Feature Transform »
Lin Song · Yanwei Li · Zhengkai Jiang · Zeming Li · Xiangyu Zhang · Hongbin Sun · Jian Sun · Nanning Zheng -
2020 Poster: Fine-Grained Dynamic Head for Object Detection »
Lin Song · Yanwei Li · Zhengkai Jiang · Zeming Li · Hongbin Sun · Jian Sun · Nanning Zheng -
2020 Poster: LAPAR: Linearly-Assembled Pixel-Adaptive Regression Network for Single Image Super-resolution and Beyond »
Wenbo Li · Kun Zhou · Lu Qi · Nianjuan Jiang · Jiangbo Lu · Jiaya Jia -
2019 Poster: Learnable Tree Filter for Structure-preserving Feature Transform »
Lin Song · Yanwei Li · Zeming Li · Gang Yu · Hongbin Sun · Jian Sun · Nanning Zheng -
2019 Poster: DetNAS: Backbone Search for Object Detection »
Yukang Chen · Tong Yang · Xiangyu Zhang · GAOFENG MENG · Xinyu Xiao · Jian Sun -
2018 Poster: MetaAnchor: Learning to Detect Objects with Customized Anchors »
Tong Yang · Xiangyu Zhang · Zeming Li · Wenqiang Zhang · Jian Sun -
2018 Poster: Image Inpainting via Generative Multi-column Convolutional Neural Networks »
Yi Wang · Xin Tao · Xiaojuan Qi · Xiaoyong Shen · Jiaya Jia -
2018 Poster: Sequential Context Encoding for Duplicate Removal »
Lu Qi · Shu Liu · Jianping Shi · Jiaya Jia -
2016 Poster: Visual Question Answering with Question Representation Update (QRU) »
Ruiyu Li · Jiaya Jia -
2014 Poster: Deep Convolutional Neural Network for Image Deconvolution »
Li Xu · Jimmy S. Ren · Ce Liu · Jiaya Jia