Causal Unsupervised Semantic Segmentation

Junho Kim, Byung-Kwan Lee, Yong Man Ro

2023-10-11Self-Supervised Learning Unsupervised Semantic Segmentation Segmentation Semantic Segmentation Prediction Causal Inference

Paper PDF Code(official)

Abstract

Unsupervised semantic segmentation aims to achieve high-quality semantic grouping without human-labeled annotations. With the advent of self-supervised pre-training, various frameworks utilize the pre-trained features to train prediction heads for unsupervised dense prediction. However, a significant challenge in this unsupervised setup is determining the appropriate level of clustering required for segmenting concepts. To address it, we propose a novel framework, CAusal Unsupervised Semantic sEgmentation (CAUSE), which leverages insights from causal inference. Specifically, we bridge intervention-oriented approach (i.e., frontdoor adjustment) to define suitable two-step tasks for unsupervised prediction. The first step involves constructing a concept clusterbook as a mediator, which represents possible concept prototypes at different levels of granularity in a discretized form. Then, the mediator establishes an explicit link to the subsequent concept-wise self-supervised learning for pixel-level grouping. Through extensive experiments and analyses on various datasets, we corroborate the effectiveness of CAUSE and achieve state-of-the-art performance in unsupervised semantic segmentation.

Results

Task	Dataset	Metric	Value	Model
Semantic Segmentation	COCO-Stuff-81	Pixel Accuracy	75.2	CAUSE-TR (ViT-S/8)
Semantic Segmentation	COCO-Stuff-81	mIoU	21.2	CAUSE-TR (ViT-S/8)
Semantic Segmentation	COCO-Stuff-81	Pixel Accuracy	78.8	CAUSE-MLP (ViT-S/8)
Semantic Segmentation	COCO-Stuff-81	mIoU	19.1	CAUSE-MLP (ViT-S/8)
Semantic Segmentation	PASCAL VOC 2012 val	Clustering [mIoU]	53.4	CAUSE (iBOT, ViT-B/16)
Semantic Segmentation	PASCAL VOC 2012 val	Clustering [mIoU]	53.3	CAUSE (ViT-B/8)
Semantic Segmentation	PASCAL VOC 2012 val	Clustering [mIoU]	53.2	CAUSE (DINOv2, ViT-B/14)
Semantic Segmentation	COCO-Stuff-171	Pixel Accuracy	46.6	CAUSE-TR (ViT-S/8)
Semantic Segmentation	COCO-Stuff-171	mIoU	15.2	CAUSE-TR (ViT-S/8)
Semantic Segmentation	COCO-Stuff-27	Clustering [Accuracy]	78	CAUSE (DINOv2, ViT-B/14)
Semantic Segmentation	COCO-Stuff-27	Clustering [mIoU]	45.3	CAUSE (DINOv2, ViT-B/14)
Semantic Segmentation	COCO-Stuff-27	Clustering [Accuracy]	74.9	CAUSE (ViT-B/8)
Semantic Segmentation	COCO-Stuff-27	Clustering [mIoU]	41.9	CAUSE (ViT-B/8)
Unsupervised Semantic Segmentation	COCO-Stuff-81	Pixel Accuracy	75.2	CAUSE-TR (ViT-S/8)
Unsupervised Semantic Segmentation	COCO-Stuff-81	mIoU	21.2	CAUSE-TR (ViT-S/8)
Unsupervised Semantic Segmentation	COCO-Stuff-81	Pixel Accuracy	78.8	CAUSE-MLP (ViT-S/8)
Unsupervised Semantic Segmentation	COCO-Stuff-81	mIoU	19.1	CAUSE-MLP (ViT-S/8)
Unsupervised Semantic Segmentation	PASCAL VOC 2012 val	Clustering [mIoU]	53.4	CAUSE (iBOT, ViT-B/16)
Unsupervised Semantic Segmentation	PASCAL VOC 2012 val	Clustering [mIoU]	53.3	CAUSE (ViT-B/8)
Unsupervised Semantic Segmentation	PASCAL VOC 2012 val	Clustering [mIoU]	53.2	CAUSE (DINOv2, ViT-B/14)
Unsupervised Semantic Segmentation	COCO-Stuff-171	Pixel Accuracy	46.6	CAUSE-TR (ViT-S/8)
Unsupervised Semantic Segmentation	COCO-Stuff-171	mIoU	15.2	CAUSE-TR (ViT-S/8)
Unsupervised Semantic Segmentation	COCO-Stuff-27	Clustering [Accuracy]	78	CAUSE (DINOv2, ViT-B/14)
Unsupervised Semantic Segmentation	COCO-Stuff-27	Clustering [mIoU]	45.3	CAUSE (DINOv2, ViT-B/14)
Unsupervised Semantic Segmentation	COCO-Stuff-27	Clustering [Accuracy]	74.9	CAUSE (ViT-B/8)
Unsupervised Semantic Segmentation	COCO-Stuff-27	Clustering [mIoU]	41.9	CAUSE (ViT-B/8)
10-shot image generation	COCO-Stuff-81	Pixel Accuracy	75.2	CAUSE-TR (ViT-S/8)
10-shot image generation	COCO-Stuff-81	mIoU	21.2	CAUSE-TR (ViT-S/8)
10-shot image generation	COCO-Stuff-81	Pixel Accuracy	78.8	CAUSE-MLP (ViT-S/8)
10-shot image generation	COCO-Stuff-81	mIoU	19.1	CAUSE-MLP (ViT-S/8)
10-shot image generation	PASCAL VOC 2012 val	Clustering [mIoU]	53.4	CAUSE (iBOT, ViT-B/16)
10-shot image generation	PASCAL VOC 2012 val	Clustering [mIoU]	53.3	CAUSE (ViT-B/8)
10-shot image generation	PASCAL VOC 2012 val	Clustering [mIoU]	53.2	CAUSE (DINOv2, ViT-B/14)
10-shot image generation	COCO-Stuff-171	Pixel Accuracy	46.6	CAUSE-TR (ViT-S/8)
10-shot image generation	COCO-Stuff-171	mIoU	15.2	CAUSE-TR (ViT-S/8)
10-shot image generation	COCO-Stuff-27	Clustering [Accuracy]	78	CAUSE (DINOv2, ViT-B/14)
10-shot image generation	COCO-Stuff-27	Clustering [mIoU]	45.3	CAUSE (DINOv2, ViT-B/14)
10-shot image generation	COCO-Stuff-27	Clustering [Accuracy]	74.9	CAUSE (ViT-B/8)
10-shot image generation	COCO-Stuff-27	Clustering [mIoU]	41.9	CAUSE (ViT-B/8)

Causal Unsupervised Semantic Segmentation

Abstract

Results

Related Papers

Causal Unsupervised Semantic Segmentation

Abstract

Results

Related Papers