Timezone: »

TransMatcher: Deep Image Matching Through Transformers for Generalizable Person Re-identification
Shengcai Liao · Ling Shao

Thu Dec 09 12:30 AM -- 02:00 AM (PST) @

Transformers have recently gained increasing attention in computer vision. However, existing studies mostly use Transformers for feature representation learning, e.g. for image classification and dense predictions, and the generalizability of Transformers is unknown. In this work, we further investigate the possibility of applying Transformers for image matching and metric learning given pairs of images. We find that the Vision Transformer (ViT) and the vanilla Transformer with decoders are not adequate for image matching due to their lack of image-to-image attention. Thus, we further design two naive solutions, i.e. query-gallery concatenation in ViT, and query-gallery cross-attention in the vanilla Transformer. The latter improves the performance, but it is still limited. This implies that the attention mechanism in Transformers is primarily designed for global feature aggregation, which is not naturally suitable for image matching. Accordingly, we propose a new simplified decoder, which drops the full attention implementation with the softmax weighting, keeping only the query-key similarity computation. Additionally, global max pooling and a multilayer perceptron (MLP) head are applied to decode the matching result. This way, the simplified decoder is computationally more efficient, while at the same time more effective for image matching. The proposed method, called TransMatcher, achieves state-of-the-art performance in generalizable person re-identification, with up to 6.1% and 5.7% performance gains in Rank-1 and mAP, respectively, on several popular datasets. Code is available at https://github.com/ShengcaiLiao/QAConv.

Author Information

Shengcai Liao (Inception Institute of Artificial Intelligence (IIAI))

Shengcai Liao is a Lead Scientist in the Inception Institute of Artificial Intelligence (IIAI), Abu Dhabi, UAE. He is a Senior Member of IEEE. Previously, he was an Associate Professor in the Institute of Automation, Chinese Academy of Sciences (CASIA). He received the B.S. degree in mathematics from the Sun Yat-sen University in 2005 and the Ph.D. degree from CASIA in 2010. He was a Postdoc in the Michigan State University during 2010-2012. His research interests include object detection, recognition, and tracking, especially face and person related tasks. He has published over 100 papers, with **over 14,900 citations and h-index 43** according to Google Scholar. He **ranks 905 among 215,114 scientists (Top 0.42%)** in 2019 single year in the field of AI, according to a study by Stanford University of Top 2% world-wide scientists. His representative work LOMO+XQDA, known for effective feature design and metric learning for person re-identification, has been **cited over 1,900 times and ranks top 10 among 602 papers in CVPR 2015**. He was awarded the Best Student Paper in ICB 2006, ICB 2015, and CCBR 2016, and the Best Paper in ICB 2007. He was also awarded the IJCB 2014 Best Reviewer and CVPR 2019/2021 Outstanding Reviewer. He was an Assistant Editor for the book “Encyclopedia of Biometrics (2nd Ed.)”. He will serve as Program Chair for IJCB 2022, and Area Chair for CVPR 2022 and ECCV 2022. He served as Area Chairs for ICPR 2016, ICB 2016 and 2018, SPC for IJCAI 2021, and reviewers for ICCV, CVPR, ECCV, NeurIPS, ICLR, AAAI, TPAMI, IJCV, TNNLS, etc. He was the Winner of the CVPR 2017 Detection in Crowded Scenes Challenge and ICCV 2019 NightOwls Pedestrian Detection Challenge.

Ling Shao (Inception Institute of Artificial Intelligence)

More from the Same Authors