ReCo: Retrieve and Co-segment for Zero-shot Transfer

Gyungin Shin, Weidi Xie, Samuel Albanie

2022-06-14Unsupervised Semantic Segmentation with Language-image Pre-training Unsupervised Semantic Segmentation Segmentation Semantic Segmentation Retrieval

Paper PDF Code(official)Code

Abstract

Semantic segmentation has a broad range of applications, but its real-world impact has been significantly limited by the prohibitive annotation costs necessary to enable deployment. Segmentation methods that forgo supervision can side-step these costs, but exhibit the inconvenient requirement to provide labelled examples from the target distribution to assign concept names to predictions. An alternative line of work in language-image pre-training has recently demonstrated the potential to produce models that can both assign names across large vocabularies of concepts and enable zero-shot transfer for classification, but do not demonstrate commensurate segmentation abilities. In this work, we strive to achieve a synthesis of these two approaches that combines their strengths. We leverage the retrieval abilities of one such language-image pre-trained model, CLIP, to dynamically curate training sets from unlabelled images for arbitrary collections of concept names, and leverage the robust correspondences offered by modern image representations to co-segment entities among the resulting collections. The synthetic segment collections are then employed to construct a segmentation model (without requiring pixel labels) whose knowledge of concepts is inherited from the scalable pre-training process of CLIP. We demonstrate that our approach, termed Retrieve and Co-segment (ReCo) performs favourably to unsupervised segmentation approaches while inheriting the convenience of nameable predictions and zero-shot transfer. We also demonstrate ReCo's ability to generate specialist segmenters for extremely rare objects.

Results

Task	Dataset	Metric	Value	Model
Semantic Segmentation	COCO-Stuff-171	mIoU	14.8	ReCo
Semantic Segmentation	COCO-Object	mIoU	15.7	ReCo
Semantic Segmentation	ADE20K	Mean IoU (val)	11.2	ReCo
Semantic Segmentation	Cityscapes val	mIoU	24.2	ReCo+
Semantic Segmentation	Cityscapes val	pixel accuracy	83.7	ReCo+
Semantic Segmentation	Cityscapes val	mIoU	19.3	ReCo
Semantic Segmentation	Cityscapes val	pixel accuracy	74.6	ReCo
Semantic Segmentation	PASCAL Context-59	mIoU	22.3	ReCo
Semantic Segmentation	PascalVOC-20	mIoU	57.7	ReCo
Semantic Segmentation	KITTI-STEP	mIoU	31.9	ReCo+
Semantic Segmentation	KITTI-STEP	pixel accuracy	75.3	ReCo+
Semantic Segmentation	KITTI-STEP	mIoU	29.8	ReCo
Semantic Segmentation	KITTI-STEP	pixel accuracy	70.6	ReCo
Semantic Segmentation	COCO-Stuff-27	mIoU	32.6	ReCo+
Semantic Segmentation	COCO-Stuff-27	pixel accuracy	54.1	ReCo+
Semantic Segmentation	COCO-Stuff-27	mIoU	26.3	ReCo
Semantic Segmentation	COCO-Stuff-27	pixel accuracy	46.1	ReCo
Unsupervised Semantic Segmentation	COCO-Stuff-171	mIoU	14.8	ReCo
Unsupervised Semantic Segmentation	COCO-Object	mIoU	15.7	ReCo
Unsupervised Semantic Segmentation	ADE20K	Mean IoU (val)	11.2	ReCo
Unsupervised Semantic Segmentation	Cityscapes val	mIoU	24.2	ReCo+
Unsupervised Semantic Segmentation	Cityscapes val	pixel accuracy	83.7	ReCo+
Unsupervised Semantic Segmentation	Cityscapes val	mIoU	19.3	ReCo
Unsupervised Semantic Segmentation	Cityscapes val	pixel accuracy	74.6	ReCo
Unsupervised Semantic Segmentation	PASCAL Context-59	mIoU	22.3	ReCo
Unsupervised Semantic Segmentation	PascalVOC-20	mIoU	57.7	ReCo
Unsupervised Semantic Segmentation	KITTI-STEP	mIoU	31.9	ReCo+
Unsupervised Semantic Segmentation	KITTI-STEP	pixel accuracy	75.3	ReCo+
Unsupervised Semantic Segmentation	KITTI-STEP	mIoU	29.8	ReCo
Unsupervised Semantic Segmentation	KITTI-STEP	pixel accuracy	70.6	ReCo
Unsupervised Semantic Segmentation	COCO-Stuff-27	mIoU	32.6	ReCo+
Unsupervised Semantic Segmentation	COCO-Stuff-27	pixel accuracy	54.1	ReCo+
Unsupervised Semantic Segmentation	COCO-Stuff-27	mIoU	26.3	ReCo
Unsupervised Semantic Segmentation	COCO-Stuff-27	pixel accuracy	46.1	ReCo
10-shot image generation	COCO-Stuff-171	mIoU	14.8	ReCo
10-shot image generation	COCO-Object	mIoU	15.7	ReCo
10-shot image generation	ADE20K	Mean IoU (val)	11.2	ReCo
10-shot image generation	Cityscapes val	mIoU	24.2	ReCo+
10-shot image generation	Cityscapes val	pixel accuracy	83.7	ReCo+
10-shot image generation	Cityscapes val	mIoU	19.3	ReCo
10-shot image generation	Cityscapes val	pixel accuracy	74.6	ReCo
10-shot image generation	PASCAL Context-59	mIoU	22.3	ReCo
10-shot image generation	PascalVOC-20	mIoU	57.7	ReCo
10-shot image generation	KITTI-STEP	mIoU	31.9	ReCo+
10-shot image generation	KITTI-STEP	pixel accuracy	75.3	ReCo+
10-shot image generation	KITTI-STEP	mIoU	29.8	ReCo
10-shot image generation	KITTI-STEP	pixel accuracy	70.6	ReCo
10-shot image generation	COCO-Stuff-27	mIoU	32.6	ReCo+
10-shot image generation	COCO-Stuff-27	pixel accuracy	54.1	ReCo+
10-shot image generation	COCO-Stuff-27	mIoU	26.3	ReCo
10-shot image generation	COCO-Stuff-27	pixel accuracy	46.1	ReCo

ReCo: Retrieve and Co-segment for Zero-shot Transfer

Abstract

Results

Related Papers

ReCo: Retrieve and Co-segment for Zero-shot Transfer

Abstract

Results

Related Papers