Unsupervised Semantic Segmentation on COCO-Stuff-171

Metric: mIoU (higher is better)

LeaderboardDataset

Loading chart...

Results

Hide extra data

Sort:

#	Model↕	mIoU▼	Extra Data	Paper	Date↕	Code
1	CorrCLIP	34	No	CorrCLIP: Reconstructing Correlations in CLIP wi...	2024-11-15	Code
2	TextRegion	31.2	No	TextRegion: Text-Aligned Region Tokens from Froz...	2025-05-29	Code
3	Trident	28.6	No	Harnessing Vision Foundation Models for High-Per...	2024-11-14	Code
4	ProxyCLIP	26.8	No	ProxyCLIP: Proxy Attention Improves CLIP for Ope...	2024-08-09	Code
5	TagAlign	25.3	No	TagAlign: Improving Vision-Language Alignment wi...	2023-12-21	Code
6	TTD (TCL)	23.7	No	TTD: Text-Tag Self-Distillation Enhancing Image-...	2024-03-30	Code
7	COSMOS ViT-B/16	23.2	No	COSMOS: Cross-Modality Self-Distillation for Vis...	2024-12-02	Code
8	TCL	22.4	No	Learning to Generate Text-grounded Mask for Open...	2022-12-01	Code
9	TTD (MaskCLIP)	19.4	No	TTD: Text-Tag Self-Distillation Enhancing Image-...	2024-03-30	Code
10	MaskCLIP	16.4	No	Extract Free Dense Labels from CLIP	2021-12-02	Code
11	CAUSE-TR (ViT-S/8)	15.2	No	Causal Unsupervised Semantic Segmentation	2023-10-11	Code
12	ReCo	14.8	No	ReCo: Retrieve and Co-segment for Zero-shot Tran...	2022-06-14	Code
13	TransFGU (ViT-S/8)	11.93	Yes	TransFGU: A Top-down Approach to Fine-Grained Un...	2021-12-02	Code
14	GroupViT	11.1	No	GroupViT: Semantic Segmentation Emerges from Tex...	2022-02-22	Code
15	PiCIE (ResNet-50)	5.6	No	PiCIE: Unsupervised Semantic Segmentation using ...	2021-03-30	Code
16	IIC (ResNet-50)	2.2	No	Invariant Information Clustering for Unsupervise...	2018-07-17	Code

#1CorrCLIPSOTA
34
mIoU· 2024-11-15
CorrCLIP: Reconstructing Correlations in CLIP with Off-the-Shelf Foundation Models for Open-Vocabulary Semantic Segmentation Code
#2TextRegion
31.2
mIoU· 2025-05-29
TextRegion: Text-Aligned Region Tokens from Frozen Image-Text Models Code
#3TridentSOTA
28.6
mIoU· 2024-11-14
Harnessing Vision Foundation Models for High-Performance, Training-Free Open Vocabulary Segmentation Code
#4ProxyCLIPSOTA
26.8
mIoU· 2024-08-09
ProxyCLIP: Proxy Attention Improves CLIP for Open-Vocabulary Segmentation Code
#5TagAlignSOTA
25.3
mIoU· 2023-12-21
TagAlign: Improving Vision-Language Alignment with Multi-Tag Classification Code
#6TTD (TCL)
23.7
mIoU· 2024-03-30
TTD: Text-Tag Self-Distillation Enhancing Image-Text Alignment in CLIP to Alleviate Single Tag Bias Code
#7COSMOS ViT-B/16
23.2
mIoU· 2024-12-02
COSMOS: Cross-Modality Self-Distillation for Vision Language Pre-training Code
#8TCLSOTA
22.4
mIoU· 2022-12-01
Learning to Generate Text-grounded Mask for Open-world Semantic Segmentation from Only Image-Text Pairs Code
#9TTD (MaskCLIP)
19.4
mIoU· 2024-03-30
TTD: Text-Tag Self-Distillation Enhancing Image-Text Alignment in CLIP to Alleviate Single Tag Bias Code
#10MaskCLIPSOTA
16.4
mIoU· 2021-12-02
Extract Free Dense Labels from CLIP Code
#11CAUSE-TR (ViT-S/8)
15.2
mIoU· 2023-10-11
Causal Unsupervised Semantic Segmentation Code
#12ReCo
14.8
mIoU· 2022-06-14
ReCo: Retrieve and Co-segment for Zero-shot Transfer Code
#13TransFGU (ViT-S/8)
11.93
mIoU· Extra Data· 2021-12-02
TransFGU: A Top-down Approach to Fine-Grained Unsupervised Semantic Segmentation Code
#14GroupViT
11.1
mIoU· 2022-02-22
GroupViT: Semantic Segmentation Emerges from Text Supervision Code
#15PiCIE (ResNet-50)SOTA
5.6
mIoU· 2021-03-30
PiCIE: Unsupervised Semantic Segmentation using Invariance and Equivariance in Clustering Code
#16IIC (ResNet-50)SOTA
2.2
mIoU· 2018-07-17
Invariant Information Clustering for Unsupervised Image Classification and Segmentation Code