Timezone: »
Obtaining the human-like perception ability of abstracting visual concepts from concrete pixels has always been a fundamental and important target in machine learning research fields such as disentangled representation learning and scene decomposition. Towards this goal, we propose an unsupervised transformer-based Visual Concepts Tokenization framework, dubbed VCT, to perceive an image into a set of disentangled visual concept tokens, with each concept token responding to one type of independent visual concept. Particularly, to obtain these concept tokens, we only use cross-attention to extract visual information from the image tokens layer by layer without self-attention between concept tokens, preventing information leakage across concept tokens. We further propose a Concept Disentangling Loss to facilitate that different concept tokens represent independent visual concepts. The cross-attention and disentangling loss play the role of induction and mutual exclusion for the concept tokens, respectively. Extensive experiments on several popular datasets verify the effectiveness of VCT on the tasks of disentangled representation learning and scene decomposition. VCT achieves the state of the art results by a large margin.
Author Information
Tao Yang (Xi'an Jiaotong University)
Yuwang Wang (Microsoft)
Yan Lu (Microsoft Research Asia)
Nanning Zheng (Xi'an Jiaotong University)
Related Events (a corresponding poster, oral, or spotlight)
-
2022 Poster: Visual Concepts Tokenization »
Wed. Nov 30th 05:00 -- 07:00 PM Room Hall J #211
More from the Same Authors
-
2022 Poster: Could Giant Pre-trained Image Models Extract Universal Representations? »
Yutong Lin · Ze Liu · Zheng Zhang · Han Hu · Nanning Zheng · Stephen Lin · Yue Cao -
2022 Spotlight: Lightning Talks 5B-4 »
Yuezhi Yang · Zeyu Yang · Yong Lin · Yishi Xu · Linan Yue · Tao Yang · Weixin Chen · Qi Liu · Jiaqi Chen · Dongsheng Wang · Baoyuan Wu · Yuwang Wang · Hao Pan · Shengyu Zhu · Zhenwei Miao · Yan Lu · Lu Tan · Bo Chen · Yichao Du · Haoqian Wang · Wei Li · Yanqing An · Ruiying Lu · Peng Cui · Nanning Zheng · Li Wang · Zhibin Duan · Xiatian Zhu · Mingyuan Zhou · Enhong Chen · Li Zhang -
2022 Spotlight: Lightning Talks 2A-3 »
David Buterez · Chengan He · Xuan Kan · Yutong Lin · Konstantin Schürholt · Yu Yang · Louis Annabi · Wei Dai · Xiaotian Cheng · Alexandre Pitti · Ze Liu · Jon Paul Janet · Jun Saito · Boris Knyazev · Mathias Quoy · Zheng Zhang · James Zachary · Steven J Kiddle · Xavier Giro-i-Nieto · Chang Liu · Hejie Cui · Zilong Zhang · Hakan Bilen · Damian Borth · Dino Oglic · Holly Rushmeier · Han Hu · Xiangyang Ji · Yi Zhou · Nanning Zheng · Ying Guo · Pietro Liò · Stephen Lin · Carl Yang · Yue Cao -
2022 Spotlight: Could Giant Pre-trained Image Models Extract Universal Representations? »
Yutong Lin · Ze Liu · Zheng Zhang · Han Hu · Nanning Zheng · Stephen Lin · Yue Cao -
2022 Poster: Alignment-guided Temporal Attention for Video Action Recognition »
Yizhou Zhao · Zhenyang Li · Xun Guo · Yan Lu -
2022 Poster: Mask-based Latent Reconstruction for Reinforcement Learning »
Tao Yu · Zhizheng Zhang · Cuiling Lan · Yan Lu · Zhibo Chen -
2021 Poster: Co-evolution Transformer for Protein Contact Prediction »
He Zhang · Fusong Ju · Jianwei Zhu · Liang He · Bin Shao · Nanning Zheng · Tie-Yan Liu -
2021 Poster: Deep Contextual Video Compression »
Jiahao Li · Bin Li · Yan Lu -
2021 Poster: Dynamic Grained Encoder for Vision Transformers »
Lin Song · Songyang Zhang · Songtao Liu · Zeming Li · Xuming He · Hongbin Sun · Jian Sun · Nanning Zheng -
2021 Poster: Instance-Conditional Knowledge Distillation for Object Detection »
Zijian Kang · Peizhen Zhang · Xiangyu Zhang · Jian Sun · Nanning Zheng -
2020 Poster: Compositional Generalization by Learning Analytical Expressions »
Qian Liu · Shengnan An · Jian-Guang Lou · Bei Chen · Zeqi Lin · Yan Gao · Bin Zhou · Nanning Zheng · Dongmei Zhang -
2020 Spotlight: Compositional Generalization by Learning Analytical Expressions »
Qian Liu · Shengnan An · Jian-Guang Lou · Bei Chen · Zeqi Lin · Yan Gao · Bin Zhou · Nanning Zheng · Dongmei Zhang -
2020 Poster: Rethinking Learnable Tree Filter for Generic Feature Transform »
Lin Song · Yanwei Li · Zhengkai Jiang · Zeming Li · Xiangyu Zhang · Hongbin Sun · Jian Sun · Nanning Zheng -
2020 Poster: Fine-Grained Dynamic Head for Object Detection »
Lin Song · Yanwei Li · Zhengkai Jiang · Zeming Li · Hongbin Sun · Jian Sun · Nanning Zheng -
2019 Poster: Learnable Tree Filter for Structure-preserving Feature Transform »
Lin Song · Yanwei Li · Zeming Li · Gang Yu · Hongbin Sun · Jian Sun · Nanning Zheng