Timezone: »
Recently Transformer has been largely explored in tracking and shown state-of-the-art (SOTA) performance. However, existing efforts mainly focus on fusing and enhancing features generated by convolutional neural networks (CNNs). The potential of Transformer in representation learning remains under-explored. In this paper, we aim to further unleash the power of Transformer by proposing a simple yet efficient fully-attentional tracker, dubbed SwinTrack, within classic Siamese framework. In particular, both representation learning and feature fusion in SwinTrack leverage the Transformer architecture, enabling better feature interactions for tracking than pure CNN or hybrid CNN-Transformer frameworks. Besides, to further enhance robustness, we present a novel motion token that embeds historical target trajectory to improve tracking by providing temporal context. Our motion token is lightweight with negligible computation but brings clear gains. In our thorough experiments, SwinTrack exceeds existing approaches on multiple benchmarks. Particularly, on the challenging LaSOT, SwinTrack sets a new record with 0.713 SUC score. It also achieves SOTA results on other benchmarks. We expect SwinTrack to serve as a solid baseline for Transformer tracking and facilitate future research. Our codes and results are released at https://github.com/LitingLin/SwinTrack.
Author Information
Liting Lin (South China University of Technology)
Heng Fan (University of North Texas)
Zhipeng Zhang (Didi Research)
Yong Xu (South China University of Technology)
Haibin Ling (State University of New York, Stony Brook)
More from the Same Authors
-
2022 Poster: Divert More Attention to Vision-Language Tracking »
Mingzhe Guo · Zhipeng Zhang · Heng Fan · Liping Jing -
2023 Poster: Generative Pre-Training of Spatio-Temporal Graph Neural Networks »
Zhonghang Li · Lianghao Xia · Yong Xu · Chao Huang -
2021 Poster: Encoding Spatial Distribution of Convolutional Features for Texture Representation »
Yong Xu · Feng Li · Zhile Chen · Jinxiu Liang · Yuhui Quan -
2021 Poster: Searching the Search Space of Vision Transformer »
Minghao Chen · Kan Wu · Bolin Ni · Houwen Peng · Bei Liu · Jianlong Fu · Hongyang Chao · Haibin Ling -
2019 Poster: Adaptive GNN for Image Analysis and Editing »
Lingyu Liang · LianWen Jin · Yong Xu