Unsupervised Semantic Segmentation on Cityscapes val

Metric: mIoU (higher is better)

LeaderboardDataset

Loading chart...

Results

Sort:

#	Model↕	mIoU▼	Extra Data	Paper	Date↕	Code
1	CorrCLIP	51.1	No	CorrCLIP: Reconstructing Correlations in CLIP wi...	2024-11-15	Code
2	Trident	47.6	No	Harnessing Vision Foundation Models for High-Per...	2024-11-14	Code
3	ProxyCLIP	42	No	ProxyCLIP: Proxy Attention Improves CLIP for Ope...	2024-08-09	Code
4	COSMOS ViT-B/16	34.7	No	COSMOS: Cross-Modality Self-Distillation for Vis...	2024-12-02	Code
5	TTD (MaskCLIP)	32	No	TTD: Text-Tag Self-Distillation Enhancing Image-...	2024-03-30	Code
6	TagAlign	27.5	No	TagAlign: Improving Vision-Language Alignment wi...	2023-12-21	Code
7	TTD (TCL)	27	No	TTD: Text-Tag Self-Distillation Enhancing Image-...	2024-03-30	Code
8	ReCo+	24.2	No	ReCo: Retrieve and Co-segment for Zero-shot Tran...	2022-06-14	Code
9	TCL	24	No	Learning to Generate Text-grounded Mask for Open...	2022-12-01	Code
10	Segmenter ViT-S/16	21.8	No	Drive&Segment: Unsupervised Semantic Segmentatio...	2022-03-21	Code
11	ReCo	19.3	No	ReCo: Retrieve and Co-segment for Zero-shot Tran...	2022-06-14	Code
12	CLIPpy ViT-B	18.1	No	Perceptual Grouping in Contrastive Vision-Langua...	2022-10-18	Code
13	MaskCLIP	10	No	Extract Free Dense Labels from CLIP	2021-12-02	Code

#1CorrCLIPSOTA
51.1
mIoU· 2024-11-15
CorrCLIP: Reconstructing Correlations in CLIP with Off-the-Shelf Foundation Models for Open-Vocabulary Semantic Segmentation Code
#2TridentSOTA
47.6
mIoU· 2024-11-14
Harnessing Vision Foundation Models for High-Performance, Training-Free Open Vocabulary Segmentation Code
#3ProxyCLIPSOTA
42
mIoU· 2024-08-09
ProxyCLIP: Proxy Attention Improves CLIP for Open-Vocabulary Segmentation Code
#4COSMOS ViT-B/16
34.7
mIoU· 2024-12-02
COSMOS: Cross-Modality Self-Distillation for Vision Language Pre-training Code
#5TTD (MaskCLIP)SOTA
32
mIoU· 2024-03-30
TTD: Text-Tag Self-Distillation Enhancing Image-Text Alignment in CLIP to Alleviate Single Tag Bias Code
#6TagAlignSOTA
27.5
mIoU· 2023-12-21
TagAlign: Improving Vision-Language Alignment with Multi-Tag Classification Code
#7TTD (TCL)
27
mIoU· 2024-03-30
TTD: Text-Tag Self-Distillation Enhancing Image-Text Alignment in CLIP to Alleviate Single Tag Bias Code
#8ReCo+SOTA
24.2
mIoU· 2022-06-14
ReCo: Retrieve and Co-segment for Zero-shot Transfer Code
#9TCL
24
mIoU· 2022-12-01
Learning to Generate Text-grounded Mask for Open-world Semantic Segmentation from Only Image-Text Pairs Code
#10Segmenter ViT-S/16SOTA
21.8
mIoU· 2022-03-21
Drive&Segment: Unsupervised Semantic Segmentation of Urban Scenes via Cross-modal Distillation Code
#11ReCo
19.3
mIoU· 2022-06-14
ReCo: Retrieve and Co-segment for Zero-shot Transfer Code
#12CLIPpy ViT-B
18.1
mIoU· 2022-10-18
Perceptual Grouping in Contrastive Vision-Language Models Code
#13MaskCLIPSOTA
10
mIoU· 2021-12-02
Extract Free Dense Labels from CLIP Code