SemiVL: Semi-Supervised Semantic Segmentation with Vision-Language Guidance

Lukas Hoyer, David Joseph Tan, Muhammad Ferjad Naeem, Luc van Gool, Federico Tombari

2023-11-27Semi-Supervised Semantic Segmentation Segmentation Semantic Segmentation

Abstract

In semi-supervised semantic segmentation, a model is trained with a limited number of labeled images along with a large corpus of unlabeled images to reduce the high annotation effort. While previous methods are able to learn good segmentation boundaries, they are prone to confuse classes with similar visual appearance due to the limited supervision. On the other hand, vision-language models (VLMs) are able to learn diverse semantic knowledge from image-caption datasets but produce noisy segmentation due to the image-level training. In SemiVL, we propose to integrate rich priors from VLM pre-training into semi-supervised semantic segmentation to learn better semantic decision boundaries. To adapt the VLM from global to local reasoning, we introduce a spatial fine-tuning strategy for label-efficient learning. Further, we design a language-guided decoder to jointly reason over vision and language. Finally, we propose to handle inherent ambiguities in class labels by providing the model with language guidance in the form of class definitions. We evaluate SemiVL on 4 semantic segmentation datasets, where it significantly outperforms previous semi-supervised methods. For instance, SemiVL improves the state-of-the-art by +13.5 mIoU on COCO with 232 annotated images and by +6.1 mIoU on Pascal VOC with 92 labels. Project page: https://github.com/google-research/semivl

Results

Task	Dataset	Metric	Value	Model
Semantic Segmentation	COCO 1/512 labeled	Validation mIoU	50.1	SemiVL
Semantic Segmentation	COCO 1/256 labeled	Validation mIoU	52.8	SemiVL
Semantic Segmentation	ADE20K 1/16 labeled	Validation mIoU	37.2	SemiVL
Semantic Segmentation	PASCAL VOC 2012 92 labeled	Validation mIoU	84	SemiVL (ViT-B/16)
Semantic Segmentation	PASCAL VOC 2012 92 labeled	Validation mIoU	77.9	UniMatch (ViT-B/16)
Semantic Segmentation	ADE20K 1/32 labeled	Validation mIoU	35.1	SemiVL
Semantic Segmentation	PASCAL VOC 2012 732 labeled	Validation mIoU	86.7	SemiVL (ViT-B/16)
Semantic Segmentation	PASCAL VOC 2012 732 labeled	Validation mIoU	83.3	UniMatch (ViT-B/16)
Semantic Segmentation	PASCAL VOC 2012 1464 labels	Validation mIoU	87.3	SemiVL (ViT-B/16
Semantic Segmentation	PASCAL VOC 2012 1464 labels	Validation mIoU	84	UniMatch (ViT-B/16)
Semantic Segmentation	COCO 1/128 labeled	Validation mIoU	53.6	SemiVL
Semantic Segmentation	COCO 1/64 labeled	Validation mIoU	55.4	SemiVL
Semantic Segmentation	Cityscapes 100 samples labeled	Validation mIoU	76.2	SemiVL (ViT-B/16)
Semantic Segmentation	PASCAL VOC 2012 366 labeled	Validation mIoU	86	SemiVL (ViT-B/16)
Semantic Segmentation	PASCAL VOC 2012 366 labeled	Validation mIoU	82	UniMatch (ViT-B/16)
Semantic Segmentation	Cityscapes 6.25% labeled	Validation mIoU	77.9	SemiVL (ViT-B/16)
Semantic Segmentation	COCO 1/32 labeled	Validation mIoU	56.5	SemiVL
Semantic Segmentation	PASCAL VOC 2012 183 labeled	Validation mIoU	85.6	SemiVL (ViT-B/16)
Semantic Segmentation	PASCAL VOC 2012 183 labeled	Validation mIoU	80.1	UniMatch (ViT-B/16)
10-shot image generation	COCO 1/512 labeled	Validation mIoU	50.1	SemiVL
10-shot image generation	COCO 1/256 labeled	Validation mIoU	52.8	SemiVL
10-shot image generation	ADE20K 1/16 labeled	Validation mIoU	37.2	SemiVL
10-shot image generation	PASCAL VOC 2012 92 labeled	Validation mIoU	84	SemiVL (ViT-B/16)
10-shot image generation	PASCAL VOC 2012 92 labeled	Validation mIoU	77.9	UniMatch (ViT-B/16)
10-shot image generation	ADE20K 1/32 labeled	Validation mIoU	35.1	SemiVL
10-shot image generation	PASCAL VOC 2012 732 labeled	Validation mIoU	86.7	SemiVL (ViT-B/16)
10-shot image generation	PASCAL VOC 2012 732 labeled	Validation mIoU	83.3	UniMatch (ViT-B/16)
10-shot image generation	PASCAL VOC 2012 1464 labels	Validation mIoU	87.3	SemiVL (ViT-B/16
10-shot image generation	PASCAL VOC 2012 1464 labels	Validation mIoU	84	UniMatch (ViT-B/16)
10-shot image generation	COCO 1/128 labeled	Validation mIoU	53.6	SemiVL
10-shot image generation	COCO 1/64 labeled	Validation mIoU	55.4	SemiVL
10-shot image generation	Cityscapes 100 samples labeled	Validation mIoU	76.2	SemiVL (ViT-B/16)
10-shot image generation	PASCAL VOC 2012 366 labeled	Validation mIoU	86	SemiVL (ViT-B/16)
10-shot image generation	PASCAL VOC 2012 366 labeled	Validation mIoU	82	UniMatch (ViT-B/16)
10-shot image generation	Cityscapes 6.25% labeled	Validation mIoU	77.9	SemiVL (ViT-B/16)
10-shot image generation	COCO 1/32 labeled	Validation mIoU	56.5	SemiVL
10-shot image generation	PASCAL VOC 2012 183 labeled	Validation mIoU	85.6	SemiVL (ViT-B/16)
10-shot image generation	PASCAL VOC 2012 183 labeled	Validation mIoU	80.1	UniMatch (ViT-B/16)

SemiVL: Semi-Supervised Semantic Segmentation with Vision-Language Guidance

Abstract

Results

Related Papers

SemiVL: Semi-Supervised Semantic Segmentation with Vision-Language Guidance

Abstract

Results

Related Papers