TasksSotADatasetsPapersMethodsSubmitAbout
Papers With Code 2

A community resource for machine learning research: papers, code, benchmarks, and state-of-the-art results.

Explore

Notable BenchmarksAll SotADatasetsPapersMethods

Community

Submit ResultsAbout

Data sourced from the PWC Archive (CC-BY-SA 4.0). Built by the community, for the community.

Papers/Causal Unsupervised Semantic Segmentation

Causal Unsupervised Semantic Segmentation

Junho Kim, Byung-Kwan Lee, Yong Man Ro

2023-10-11Self-Supervised LearningUnsupervised Semantic SegmentationSegmentationSemantic SegmentationPredictionCausal Inference
PaperPDFCode(official)

Abstract

Unsupervised semantic segmentation aims to achieve high-quality semantic grouping without human-labeled annotations. With the advent of self-supervised pre-training, various frameworks utilize the pre-trained features to train prediction heads for unsupervised dense prediction. However, a significant challenge in this unsupervised setup is determining the appropriate level of clustering required for segmenting concepts. To address it, we propose a novel framework, CAusal Unsupervised Semantic sEgmentation (CAUSE), which leverages insights from causal inference. Specifically, we bridge intervention-oriented approach (i.e., frontdoor adjustment) to define suitable two-step tasks for unsupervised prediction. The first step involves constructing a concept clusterbook as a mediator, which represents possible concept prototypes at different levels of granularity in a discretized form. Then, the mediator establishes an explicit link to the subsequent concept-wise self-supervised learning for pixel-level grouping. Through extensive experiments and analyses on various datasets, we corroborate the effectiveness of CAUSE and achieve state-of-the-art performance in unsupervised semantic segmentation.

Results

TaskDatasetMetricValueModel
Semantic SegmentationCOCO-Stuff-81Pixel Accuracy75.2CAUSE-TR (ViT-S/8)
Semantic SegmentationCOCO-Stuff-81mIoU21.2CAUSE-TR (ViT-S/8)
Semantic SegmentationCOCO-Stuff-81Pixel Accuracy78.8CAUSE-MLP (ViT-S/8)
Semantic SegmentationCOCO-Stuff-81mIoU19.1CAUSE-MLP (ViT-S/8)
Semantic SegmentationPASCAL VOC 2012 valClustering [mIoU]53.4CAUSE (iBOT, ViT-B/16)
Semantic SegmentationPASCAL VOC 2012 valClustering [mIoU]53.3CAUSE (ViT-B/8)
Semantic SegmentationPASCAL VOC 2012 valClustering [mIoU]53.2CAUSE (DINOv2, ViT-B/14)
Semantic SegmentationCOCO-Stuff-171Pixel Accuracy46.6CAUSE-TR (ViT-S/8)
Semantic SegmentationCOCO-Stuff-171mIoU15.2CAUSE-TR (ViT-S/8)
Semantic SegmentationCOCO-Stuff-27Clustering [Accuracy]78CAUSE (DINOv2, ViT-B/14)
Semantic SegmentationCOCO-Stuff-27Clustering [mIoU]45.3CAUSE (DINOv2, ViT-B/14)
Semantic SegmentationCOCO-Stuff-27Clustering [Accuracy]74.9CAUSE (ViT-B/8)
Semantic SegmentationCOCO-Stuff-27Clustering [mIoU]41.9CAUSE (ViT-B/8)
Unsupervised Semantic SegmentationCOCO-Stuff-81Pixel Accuracy75.2CAUSE-TR (ViT-S/8)
Unsupervised Semantic SegmentationCOCO-Stuff-81mIoU21.2CAUSE-TR (ViT-S/8)
Unsupervised Semantic SegmentationCOCO-Stuff-81Pixel Accuracy78.8CAUSE-MLP (ViT-S/8)
Unsupervised Semantic SegmentationCOCO-Stuff-81mIoU19.1CAUSE-MLP (ViT-S/8)
Unsupervised Semantic SegmentationPASCAL VOC 2012 valClustering [mIoU]53.4CAUSE (iBOT, ViT-B/16)
Unsupervised Semantic SegmentationPASCAL VOC 2012 valClustering [mIoU]53.3CAUSE (ViT-B/8)
Unsupervised Semantic SegmentationPASCAL VOC 2012 valClustering [mIoU]53.2CAUSE (DINOv2, ViT-B/14)
Unsupervised Semantic SegmentationCOCO-Stuff-171Pixel Accuracy46.6CAUSE-TR (ViT-S/8)
Unsupervised Semantic SegmentationCOCO-Stuff-171mIoU15.2CAUSE-TR (ViT-S/8)
Unsupervised Semantic SegmentationCOCO-Stuff-27Clustering [Accuracy]78CAUSE (DINOv2, ViT-B/14)
Unsupervised Semantic SegmentationCOCO-Stuff-27Clustering [mIoU]45.3CAUSE (DINOv2, ViT-B/14)
Unsupervised Semantic SegmentationCOCO-Stuff-27Clustering [Accuracy]74.9CAUSE (ViT-B/8)
Unsupervised Semantic SegmentationCOCO-Stuff-27Clustering [mIoU]41.9CAUSE (ViT-B/8)
10-shot image generationCOCO-Stuff-81Pixel Accuracy75.2CAUSE-TR (ViT-S/8)
10-shot image generationCOCO-Stuff-81mIoU21.2CAUSE-TR (ViT-S/8)
10-shot image generationCOCO-Stuff-81Pixel Accuracy78.8CAUSE-MLP (ViT-S/8)
10-shot image generationCOCO-Stuff-81mIoU19.1CAUSE-MLP (ViT-S/8)
10-shot image generationPASCAL VOC 2012 valClustering [mIoU]53.4CAUSE (iBOT, ViT-B/16)
10-shot image generationPASCAL VOC 2012 valClustering [mIoU]53.3CAUSE (ViT-B/8)
10-shot image generationPASCAL VOC 2012 valClustering [mIoU]53.2CAUSE (DINOv2, ViT-B/14)
10-shot image generationCOCO-Stuff-171Pixel Accuracy46.6CAUSE-TR (ViT-S/8)
10-shot image generationCOCO-Stuff-171mIoU15.2CAUSE-TR (ViT-S/8)
10-shot image generationCOCO-Stuff-27Clustering [Accuracy]78CAUSE (DINOv2, ViT-B/14)
10-shot image generationCOCO-Stuff-27Clustering [mIoU]45.3CAUSE (DINOv2, ViT-B/14)
10-shot image generationCOCO-Stuff-27Clustering [Accuracy]74.9CAUSE (ViT-B/8)
10-shot image generationCOCO-Stuff-27Clustering [mIoU]41.9CAUSE (ViT-B/8)

Related Papers

SeC: Advancing Complex Video Object Segmentation via Progressive Concept Construction2025-07-21Multi-Strategy Improved Snake Optimizer Accelerated CNN-LSTM-Attention-Adaboost for Trajectory Prediction2025-07-21A Semi-Supervised Learning Method for the Identification of Bad Exposures in Large Imaging Surveys2025-07-17Deep Learning-Based Fetal Lung Segmentation from Diffusion-weighted MRI Images and Lung Maturity Evaluation for Fetal Growth Restriction2025-07-17DiffOSeg: Omni Medical Image Segmentation via Multi-Expert Collaboration Diffusion Model2025-07-17From Variability To Accuracy: Conditional Bernoulli Diffusion Models with Consensus-Driven Correction for Thin Structure Segmentation2025-07-17Unleashing Vision Foundation Models for Coronary Artery Segmentation: Parallel ViT-CNN Encoding and Variational Fusion2025-07-17SCORE: Scene Context Matters in Open-Vocabulary Remote Sensing Instance Segmentation2025-07-17