Diffuse, Attend, and Segment: Unsupervised Zero-Shot Segmentation using Stable Diffusion

Junjiao Tian, Lavisha Aggarwal, Andrea Colaco, Zsolt Kira, Mar Gonzalez-Franco

2023-08-23Zero Shot Segmentation Segmentation Semantic Segmentation

Abstract

Producing quality segmentation masks for images is a fundamental problem in computer vision. Recent research has explored large-scale supervised training to enable zero-shot segmentation on virtually any image style and unsupervised training to enable segmentation without dense annotations. However, constructing a model capable of segmenting anything in a zero-shot manner without any annotations is still challenging. In this paper, we propose to utilize the self-attention layers in stable diffusion models to achieve this goal because the pre-trained stable diffusion model has learned inherent concepts of objects within its attention layers. Specifically, we introduce a simple yet effective iterative merging process based on measuring KL divergence among attention maps to merge them into valid segmentation masks. The proposed method does not require any training or language dependency to extract quality segmentation for any images. On COCO-Stuff-27, our method surpasses the prior unsupervised zero-shot SOTA method by an absolute 26% in pixel accuracy and 17% in mean IoU. The project page is at \url{https://sites.google.com/view/diffseg/home}.

Results

Task	Dataset	Metric	Value	Model
Semantic Segmentation	Cityscapes	Pixel Accuracy	76	DiffSeg (512)
Semantic Segmentation	Cityscapes	mIoU	21.2	DiffSeg (512)
Semantic Segmentation	COCO-Stuff-27	Pixel Accuracy	72.5	DiffSeg (512)
Semantic Segmentation	COCO-Stuff-27	mIoU	43.6	DiffSeg (512)
10-shot image generation	Cityscapes	Pixel Accuracy	76	DiffSeg (512)
10-shot image generation	Cityscapes	mIoU	21.2	DiffSeg (512)
10-shot image generation	COCO-Stuff-27	Pixel Accuracy	72.5	DiffSeg (512)
10-shot image generation	COCO-Stuff-27	mIoU	43.6	DiffSeg (512)

Related Papers

SeC: Advancing Complex Video Object Segmentation via Progressive Concept Construction2025-07-21 Deep Learning-Based Fetal Lung Segmentation from Diffusion-weighted MRI Images and Lung Maturity Evaluation for Fetal Growth Restriction2025-07-17 DiffOSeg: Omni Medical Image Segmentation via Multi-Expert Collaboration Diffusion Model2025-07-17 From Variability To Accuracy: Conditional Bernoulli Diffusion Models with Consensus-Driven Correction for Thin Structure Segmentation2025-07-17 Unleashing Vision Foundation Models for Coronary Artery Segmentation: Parallel ViT-CNN Encoding and Variational Fusion2025-07-17 SCORE: Scene Context Matters in Open-Vocabulary Remote Sensing Instance Segmentation2025-07-17 Unified Medical Image Segmentation with State Space Modeling Snake2025-07-17 A Privacy-Preserving Semantic-Segmentation Method Using Domain-Adaptation Technique2025-07-17