Timezone: »
Poster
RLIP: Relational Language-Image Pre-training for Human-Object Interaction Detection
Hangjie Yuan · Jianwen Jiang · Samuel Albanie · Tao Feng · Ziyuan Huang · Dong Ni · Mingqian Tang
@
The task of Human-Object Interaction (HOI) detection targets fine-grained visual parsing of humans interacting with their environment, enabling a broad range of applications. Prior work has demonstrated the benefits of effective architecture design and integration of relevant cues for more accurate HOI detection. However, the design of an appropriate pre-training strategy for this task remains underexplored by existing approaches. To address this gap, we propose $\textit{Relational Language-Image Pre-training}$ (RLIP), a strategy for contrastive pre-training that leverages both entity and relation descriptions. To make effective use of such pre-training, we make three technical contributions: (1) a new $\textbf{Par}$allel entity detection and $\textbf{Se}$quential relation inference (ParSe) architecture that enables the use of both entity and relation descriptions during holistically optimized pre-training; (2) a synthetic data generation framework, Label Sequence Extension, that expands the scale of language data available within each minibatch; (3) ambiguity-suppression mechanisms, Relation Quality Labels and Relation Pseudo-Labels, to mitigate the influence of ambiguous/noisy samples in the pre-training data. Through extensive experiments, we demonstrate the benefits of these contributions, collectively termed RLIP-ParSe, for improved zero-shot, few-shot and fine-tuning HOI detection performance as well as increased robustness to learning from noisy annotations. Code will be available at https://github.com/JacobYuan7/RLIP.
Author Information
Hangjie Yuan (Zhejiang University)
Please refer to https://jacobyuan7.github.io/
Jianwen Jiang (Alibaba DAMO Academy)
Samuel Albanie (Oxford University)
Tao Feng (Alibaba Group)
Ziyuan Huang (National University of Singapore)
Dong Ni (Zhejiang University)
Mingqian Tang (Alibaba Group)
More from the Same Authors
-
2022 Poster: Learning a Condensed Frame for Memory-Efficient Video Class-Incremental Learning »
Yixuan Pei · Zhiwu Qing · Jun CEN · Xiang Wang · Shiwei Zhang · Yaxiong Wang · Mingqian Tang · Nong Sang · Xueming Qian -
2022 Spotlight: RLIP: Relational Language-Image Pre-training for Human-Object Interaction Detection »
Hangjie Yuan · Jianwen Jiang · Samuel Albanie · Tao Feng · Ziyuan Huang · Dong Ni · Mingqian Tang -
2022 Poster: ReCo: Retrieve and Co-segment for Zero-shot Transfer »
Gyungin Shin · Weidi Xie · Samuel Albanie -
2022 Poster: Grow and Merge: A Unified Framework for Continuous Categories Discovery »
Xinwei Zhang · Jianwen Jiang · Yutong Feng · Zhi-Fan Wu · Xibin Zhao · Hai Wan · Mingqian Tang · Rong Jin · Yue Gao -
2021 Workshop: The pre-registration workshop: an alternative publication model for machine learning research »
Samuel Albanie · João Henriques · Luca Bertinetto · Alex Hernandez-Garcia · Hazel Doughty · Gul Varol -
2020 Workshop: The pre-registration experiment: an alternative publication model for machine learning research »
Luca Bertinetto · João Henriques · Samuel Albanie · Michela Paganini · Gul Varol -
2018 Poster: Gather-Excite: Exploiting Feature Context in Convolutional Neural Networks »
Jie Hu · Li Shen · Samuel Albanie · Gang Sun · Andrea Vedaldi