Looking Beyond Single Images for Contrastive Semantic Segmentation Learning

FEIHU ZHANG · Philip Torr · Rene Ranftl · Stephan Richter

Keywords: [ Machine Learning ] [ Vision ] [ Contrastive Learning ] [ Representation Learning ]


We present an approach to contrastive representation learning for semantic segmentation. Our approach leverages the representational power of existing feature extractors to find corresponding regions across images. These cross-image correspondences are used as auxiliary labels to guide the pixel-level selection of positive and negative samples for more effective contrastive learning in semantic segmentation. We show that auxiliary labels can be generated from a variety of feature extractors, ranging from image classification networks that have been trained using unsupervised contrastive learning to segmentation models that have been trained on a small amount of labeled data. We additionally introduce a novel metric for rapidly judging the quality of a given auxiliary-labeling strategy, and empirically analyze various factors that influence the performance of contrastive learning for semantic segmentation. We demonstrate the effectiveness of our method both in the low-data as well as the high-data regime on various datasets. Our experiments show that contrastive learning with our auxiliary-labeling approach consistently boosts semantic segmentation accuracy when compared to standard ImageNet pretraining and outperforms existing approaches of contrastive and semi-supervised semantic segmentation.

Chat is not available.