Timezone: »
In this paper, we tackle the problem of learning visual representations from unlabeled scene-centric data. Existing works have demonstrated the potential of utilizing the underlying complex structure within scene-centric data; still, they commonly rely on hand-crafted objectness priors or specialized pretext tasks to build a learning framework, which may harm generalizability. Instead, we propose contrastive learning from data-driven semantic slots, namely SlotCon, for joint semantic grouping and representation learning. The semantic grouping is performed by assigning pixels to a set of learnable prototypes, which can adapt to each sample by attentive pooling over the feature and form new slots. Based on the learned data-dependent slots, a contrastive objective is employed for representation learning, which enhances the discriminability of features, and conversely facilitates grouping semantically coherent pixels together. Compared with previous efforts, by simultaneously optimizing the two coupled objectives of semantic grouping and contrastive learning, our approach bypasses the disadvantages of hand-crafted priors and is able to learn object/group-level representations from scene-centric images. Experiments show our approach effectively decomposes complex scenes into semantic groups for feature learning and significantly benefits downstream tasks, including object detection, instance segmentation, and semantic segmentation. Code is available at: https://github.com/CVMI-Lab/SlotCon.
Author Information
Xin Wen (The University of Hong Kong)
Bingchen Zhao (University of Edinburgh)
Anlin Zheng (Megvii Technology Inc.)
Xiangyu Zhang (MEGVII Technology)
Xiaojuan Qi (The University of Hong Kong)
More from the Same Authors
-
2021 Spotlight: Spherical Motion Dynamics: Learning Dynamics of Normalized Neural Network using SGD and Weight Decay »
Ruosi Wan · Zhanxing Zhu · Xiangyu Zhang · Jian Sun -
2022 Poster: Unifying Voxel-based Representation with Transformer for 3D Object Detection »
Yanwei Li · Yilun Chen · Xiaojuan Qi · Zeming Li · Jian Sun · Jiaya Jia -
2022 Poster: Towards Efficient 3D Object Detection with Knowledge Distillation »
Jihan Yang · Shaoshuai Shi · Runyu Ding · Zhe Wang · Xiaojuan Qi -
2023 Poster: Data Pruning via Moving-one-Sample-out »
Haoru Tan · Sitong Wu · Fei Du · Yukang Chen · Zhibin Wang · Fan Wang · Xiaojuan Qi -
2023 Poster: CL-NeRF: Continual Learning of Neural Radiance Fields for Evolving Scene Representation »
Xiuzhe Wu · Peng Dai · Weipeng DENG · Handi Chen · Yang Wu · Yan-Pei Cao · Ying Shan · Xiaojuan Qi -
2023 Poster: Hierarchical Semi-Implicit Variational Inference with Application to Diffusion Model Acceleration »
Longlin Yu · Tianyu Xie · Yu Zhu · Tong Yang · Xiangyu Zhang · Cheng Zhang -
2023 Poster: Slot-guided Volumetric Object Radiance Fields »
DI QI · Tong Yang · Xiangyu Zhang -
2023 Poster: RevColV2: Exploring Disentangled Representations in Masked Image Modeling »
Qi Han · Yuxuan Cai · Xiangyu Zhang -
2023 Poster: CoDet: Co-occurrence Guided Region-Word Alignment for Open-Vocabulary Object Detection »
Chuofan Ma · Yi Jiang · Xin Wen · Zehuan Yuan · Xiaojuan Qi -
2022 Poster: Spatial Pruned Sparse Convolution for Efficient 3D Object Detection »
Jianhui Liu · Yukang Chen · Xiaoqing Ye · Zhuotao Tian · Xiao Tan · Xiaojuan Qi -
2022 Poster: Prototypical VoteNet for Few-Shot 3D Point Cloud Object Detection »
Shizhen Zhao · Xiaojuan Qi -
2022 Poster: Rethinking Resolution in the Context of Efficient Video Recognition »
Chuofan Ma · Qiushan Guo · Yi Jiang · Ping Luo · Zehuan Yuan · Xiaojuan Qi -
2021 Poster: Novel Visual Category Discovery with Dual Ranking Statistics and Mutual Knowledge Distillation »
Bingchen Zhao · Kai Han -
2021 Poster: Spherical Motion Dynamics: Learning Dynamics of Normalized Neural Network using SGD and Weight Decay »
Ruosi Wan · Zhanxing Zhu · Xiangyu Zhang · Jian Sun -
2021 Poster: Instance-Conditional Knowledge Distillation for Object Detection »
Zijian Kang · Peizhen Zhang · Xiangyu Zhang · Jian Sun · Nanning Zheng -
2021 Poster: SOLQ: Segmenting Objects by Learning Queries »
Bin Dong · Fangao Zeng · Tiancai Wang · Xiangyu Zhang · Yichen Wei -
2020 Poster: Lightweight Generative Adversarial Networks for Text-Guided Image Manipulation »
Bowen Li · Xiaojuan Qi · Philip Torr · Thomas Lukasiewicz -
2020 Poster: Rethinking Learnable Tree Filter for Generic Feature Transform »
Lin Song · Yanwei Li · Zhengkai Jiang · Zeming Li · Xiangyu Zhang · Hongbin Sun · Jian Sun · Nanning Zheng -
2019 Poster: DetNAS: Backbone Search for Object Detection »
Yukang Chen · Tong Yang · Xiangyu Zhang · GAOFENG MENG · Xinyu Xiao · Jian Sun -
2018 Poster: MetaAnchor: Learning to Detect Objects with Customized Anchors »
Tong Yang · Xiangyu Zhang · Zeming Li · Wenqiang Zhang · Jian Sun