Skip to yearly menu bar Skip to main content


Poster

Relationship Prompt Learning is Enough for Open-Vocabulary Semantic Segmentation

li Jiahao · Yanyun Qu · Yuan Xie · Yang Lu

East Exhibit Hall A-C #1005
[ ]
Wed 11 Dec 4:30 p.m. PST — 7:30 p.m. PST

Abstract: Open-vocabulary semantic segmentation (OVSS) aims to segment unseen classes without pixel-level labels. Existing Vision-Language Model (VLM)-based methods leverage VLM's rich knowledge to enhance additional segmentation-specific networks, yielding competitive results, but at the cost of extensive parameter consumption. To reduce these costs, we attempt to enable VLM to directly produce the segmentation results without segmentation-specific networks. Prompt learning offers a direct and parameter-efficient approach, yet it falls short in guiding VLM for pixel-level visual localization. Therefore, we propose relationship prompt module (RPM), which generates relationship prompt that directs VLM to extract pixel-level semantic embeddings suitable for OVSS. Moreover, RPM integrates with VLM to construct relationship prompt network (RPN), achieving OVSS without segmentation-specific networks. RPN attains state-of-the-art performance with merely about ${\bf 3M}$ trainable parameters (2\% of total parameters).

Live content is unavailable. Log in and register to view live content