Unsupervised Semantic Segmentation on COCO-Object

Metric: mIoU (higher is better)

LeaderboardDataset

Loading chart...

Results

Sort:

#	Model↕	mIoU▼	Extra Data	Paper	Date↕	Code
1	CorrCLIP	49.4	No	CorrCLIP: Reconstructing Correlations in CLIP wi...	2024-11-15	Code
2	Trident	42.2	No	Harnessing Vision Foundation Models for High-Per...	2024-11-14	Code
3	ProxyCLIP	39.2	No	ProxyCLIP: Proxy Attention Improves CLIP for Ope...	2024-08-09	Code
4	TTD (TCL)	37.4	No	TTD: Text-Tag Self-Distillation Enhancing Image-...	2024-03-30	Code
5	CLS-SEG	35.3	No	TagCLIP: A Local-to-Global Framework to Enhance ...	2023-12-20	Code
6	TagAlign	33.3	No	TagAlign: Improving Vision-Language Alignment wi...	2023-12-21	Code
7	TCL	31.6	No	Learning to Generate Text-grounded Mask for Open...	2022-12-01	Code
8	COSMOS ViT-B/16	31.3	No	COSMOS: Cross-Modality Self-Distillation for Vis...	2024-12-02	Code
9	GroupViT (RedCaps)	27.5	No	GroupViT: Semantic Segmentation Emerges from Tex...	2022-02-22	Code
10	TTD (MaskCLIP)	26.5	No	TTD: Text-Tag Self-Distillation Enhancing Image-...	2024-03-30	Code
11	MaskCLIP	20.6	No	Extract Free Dense Labels from CLIP	2021-12-02	Code
12	ReCo	15.7	No	ReCo: Retrieve and Co-segment for Zero-shot Tran...	2022-06-14	Code

#1CorrCLIPSOTA
49.4
mIoU· 2024-11-15
CorrCLIP: Reconstructing Correlations in CLIP with Off-the-Shelf Foundation Models for Open-Vocabulary Semantic Segmentation Code
#2TridentSOTA
42.2
mIoU· 2024-11-14
Harnessing Vision Foundation Models for High-Performance, Training-Free Open Vocabulary Segmentation Code
#3ProxyCLIPSOTA
39.2
mIoU· 2024-08-09
ProxyCLIP: Proxy Attention Improves CLIP for Open-Vocabulary Segmentation Code
#4TTD (TCL)SOTA
37.4
mIoU· 2024-03-30
TTD: Text-Tag Self-Distillation Enhancing Image-Text Alignment in CLIP to Alleviate Single Tag Bias Code
#5CLS-SEGSOTA
35.3
mIoU· 2023-12-20
TagCLIP: A Local-to-Global Framework to Enhance Open-Vocabulary Multi-Label Classification of CLIP Without Training Code
#6TagAlign
33.3
mIoU· 2023-12-21
TagAlign: Improving Vision-Language Alignment with Multi-Tag Classification Code
#7TCLSOTA
31.6
mIoU· 2022-12-01
Learning to Generate Text-grounded Mask for Open-world Semantic Segmentation from Only Image-Text Pairs Code
#8COSMOS ViT-B/16
31.3
mIoU· 2024-12-02
COSMOS: Cross-Modality Self-Distillation for Vision Language Pre-training Code
#9GroupViT (RedCaps)SOTA
27.5
mIoU· 2022-02-22
GroupViT: Semantic Segmentation Emerges from Text Supervision Code
#10TTD (MaskCLIP)
26.5
mIoU· 2024-03-30
TTD: Text-Tag Self-Distillation Enhancing Image-Text Alignment in CLIP to Alleviate Single Tag Bias Code
#11MaskCLIPSOTA
20.6
mIoU· 2021-12-02
Extract Free Dense Labels from CLIP Code
#12ReCo
15.7
mIoU· 2022-06-14
ReCo: Retrieve and Co-segment for Zero-shot Transfer Code