Skip to yearly menu bar Skip to main content


Poster

An End-To-End Graph Attention Network Hash for Cross-Modal Retrieval

Huilong Jin · Yingxue Zhang · Lei Shi · Shuang Zhang · Feifei Kou · Jiapeng Yang · Chuangying Zhu · Jia Luo


Abstract:

Due to its low storage cost and fast search speed, cross-modal retrieval based on hashing has attracted widespread attention and is widely used in real-world applications of social media search. However, most existing hashing methods are often limited by uncomprehensive feature representations and semantic associations, which greatly restricts their performance and applicability in practical applications. To deal with this challenge, in this paper we propose an end-to-end graph attention network hash (EGATH) for cross modal retrieval, which can not only capture direct semantic associations between images and texts but also match semantic content between different modalities. We adopt the CLIP combined with the Transformer to improve understanding and generalization ability in semantic consistency across different data modalities. The classifier base on graph attention network is conducted to obtain predicted labels for enhance cross-modal feature representation. We construct hash codes using an optimization strategy and loss function to preserve the semantic information and compactness of the hash code. Comprehensive experimental on the NUS-WIDE, MIRFlickr25K, and MS-COCO benchmark datasets show that our EGATH significantly performs favorably against several state-of-the-art methods.

Live content is unavailable. Log in and register to view live content