Timezone: »
This paper investigates how to realize better and more efficient embedding learning to tackle the semi-supervised video object segmentation under challenging multi-object scenarios. The state-of-the-art methods learn to decode features with a single positive object and thus have to match and segment each target separately under multi-object scenarios, consuming multiple times computing resources. To solve the problem, we propose an Associating Objects with Transformers (AOT) approach to match and decode multiple objects uniformly. In detail, AOT employs an identification mechanism to associate multiple targets into the same high-dimensional embedding space. Thus, we can simultaneously process multiple objects' matching and segmentation decoding as efficiently as processing a single object. For sufficiently modeling multi-object association, a Long Short-Term Transformer is designed for constructing hierarchical matching and propagation. We conduct extensive experiments on both multi-object and single-object benchmarks to examine AOT variant networks with different complexities. Particularly, our R50-AOT-L outperforms all the state-of-the-art competitors on three popular benchmarks, i.e., YouTube-VOS (84.1% J&F), DAVIS 2017 (84.9%), and DAVIS 2016 (91.1%), while keeping more than 3X faster multi-object run-time. Meanwhile, our AOT-T can maintain real-time multi-object speed on the above benchmarks. Based on AOT, we ranked 1st in the 3rd Large-scale VOS Challenge.
Author Information
Zongxin Yang (Zhejiang University)
Yunchao Wei (UTS)
Yi Yang (UTS)
More from the Same Authors
-
2022 Spotlight: Decoupling Features in Hierarchical Propagation for Video Object Segmentation »
Zongxin Yang · Yi Yang -
2022 Spotlight: Lightning Talks 6A-1 »
Ziyi Wang · Nian Liu · Yaming Yang · Qilong Wang · Yuanxin Liu · Zongxin Yang · Yizhao Gao · Yanchen Deng · Dongze Lian · Nanyi Fei · Ziyu Guan · Xiao Wang · Shufeng Kong · Xumin Yu · Daquan Zhou · Yi Yang · Fandong Meng · Mingze Gao · Caihua Liu · Yongming Rao · Zheng Lin · Haoyu Lu · Zhe Wang · Jiashi Feng · Zhaolin Zhang · Deyu Bo · Xinchao Wang · Chuan Shi · Jiangnan Li · Jiangtao Xie · Jie Zhou · Zhiwu Lu · Wei Zhao · Bo An · Jiwen Lu · Peihua Li · Jian Pei · Hao Jiang · Cai Xu · Peng Fu · Qinghua Hu · Yijie Li · Weigang Lu · Yanan Cao · Jianbin Huang · Weiping Wang · Zhao Cao · Jie Zhou -
2022 Spotlight: Mask Matching Transformer for Few-Shot Segmentation »
siyu jiao · Gengwei Zhang · Shant Navasardyan · Ling Chen · Yao Zhao · Yunchao Wei · Humphrey Shi -
2022 Poster: Mask Matching Transformer for Few-Shot Segmentation »
siyu jiao · Gengwei Zhang · Shant Navasardyan · Ling Chen · Yao Zhao · Yunchao Wei · Humphrey Shi -
2021 Poster: Few-Shot Segmentation via Cycle-Consistent Transformer »
Gengwei Zhang · Guoliang Kang · Yi Yang · Yunchao Wei -
2020 Poster: Consistent Structural Relation Learning for Zero-Shot Segmentation »
Peike Li · Yunchao Wei · Yi Yang -
2020 Spotlight: Consistent Structural Relation Learning for Zero-Shot Segmentation »
Peike Li · Yunchao Wei · Yi Yang -
2020 Poster: Adversarial Style Mining for One-Shot Unsupervised Domain Adaptation »
Yawei Luo · Ping Liu · Tao Guan · Junqing Yu · Yi Yang -
2020 Poster: Pixel-Level Cycle Association: A New Perspective for Domain Adaptive Semantic Segmentation »
Guoliang Kang · Yunchao Wei · Yi Yang · Yueting Zhuang · Alexander Hauptmann -
2020 Oral: Pixel-Level Cycle Association: A New Perspective for Domain Adaptive Semantic Segmentation »
Guoliang Kang · Yunchao Wei · Yi Yang · Yueting Zhuang · Alexander Hauptmann -
2019 Poster: Connective Cognition Network for Directional Visual Commonsense Reasoning »
Aming Wu · Linchao Zhu · Yahong Han · Yi Yang -
2019 Poster: Network Pruning via Transformable Architecture Search »
Xuanyi Dong · Yi Yang -
2018 Poster: Self-Erasing Network for Integral Object Attention »
Qibin Hou · PengTao Jiang · Yunchao Wei · Ming-Ming Cheng