Timezone: »
We propose a sparse end-to-end multi-person pose regression framework, termed QueryPose, which can directly predict multi-person keypoint sequences from the input image. The existing end-to-end methods rely on dense representations to preserve the spatial detail and structure for precise keypoint localization. However, the dense paradigm introduces complex and redundant post-processes during inference. In our framework, each human instance is encoded by several learnable spatial-aware part-level queries associated with an instance-level query. First, we propose the Spatial Part Embedding Generation Module (SPEGM) that considers the local spatial attention mechanism to generate several spatial-sensitive part embeddings, which contain spatial details and structural information for enhancing the part-level queries. Second, we introduce the Selective Iteration Module (SIM) to adaptively update the sparse part-level queries via the generated spatial-sensitive part embeddings stage-by-stage. Based on the two proposed modules, the part-level queries are able to fully encode the spatial details and structural information for precise keypoint regression. With the bipartite matching, QueryPose avoids the hand-designed post-processes. Without bells and whistles, QueryPose surpasses the existing dense end-to-end methods with 73.6 AP on MS COCO mini-val set and 72.7 AP on CrowdPose test set. Code is available at https://github.com/buptxyb666/QueryPose.
Author Information
Yabo Xiao (Beijing University of Posts and Telecommunications)
Kai Su (Southeast University)
Xiaojuan Wang (Beijing University of Posts and Telecommunications)
Dongdong Yu (Institute of automation, Chinese academy of science, Chinese Academy of Sciences)
Lei Jin (Beijing University of Posts and Telecommunications)
Mingshu He (Beijing University of Posts and Telecommunications)
Zehuan Yuan (Nanjing University)
More from the Same Authors
-
2022 Poster: Embracing Consistency: A One-Stage Approach for Spatio-Temporal Video Grounding »
Yang Jin · yongzhi li · Zehuan Yuan · Yadong Mu -
2022 Spotlight: Lightning Talks 3A-4 »
Jinzhi Zhang · Hao Jiang · Hongrui Cai · Qi Yi · Yang Jin · Zhi Tian · Rui Zhang · Wanquan Feng · Xiangxiang Chu · Ruofan Tang · yongzhi li · Yadong Mu · Zehuan Yuan · shaohui peng · Zheng Cao · Xiaoming Wang · Xuetao Feng · Xiaolin Wei · Jiaming Guo · Yadong Mu · Yan Wang · Jing Xiao · Xing Hu · Chunhua Shen · Ruqi Huang · Juyong Zhang · Zidong Du · LU FANG · xishan zhang · Qi Guo · Yunji Chen -
2022 Spotlight: Embracing Consistency: A One-Stage Approach for Spatio-Temporal Video Grounding »
Yang Jin · yongzhi li · Zehuan Yuan · Yadong Mu -
2022 Poster: Rethinking Resolution in the Context of Efficient Video Recognition »
Chuofan Ma · Qiushan Guo · Yi Jiang · Ping Luo · Zehuan Yuan · Xiaojuan Qi -
2021 Poster: Disentangled Contrastive Learning on Graphs »
Haoyang Li · Xin Wang · Ziwei Zhang · Zehuan Yuan · Hang Li · Wenwu Zhu