Semantic Segmentation on ADE20K

Metric: Mean IoU (val) (higher is better)

LeaderboardDataset

Loading chart...

Results

Sort:

#	Model↕	Mean IoU (val)▼	Extra Data	Paper	Date↕	Code
1	CorrCLIP	30.7	No	CorrCLIP: Reconstructing Correlations in CLIP wi...	2024-11-15	Code
2	TextRegion	27.3	No	TextRegion: Text-Aligned Region Tokens from Froz...	2025-05-29	Code
3	Trident	26.7	No	Harnessing Vision Foundation Models for High-Per...	2024-11-14	Code
4	ProxyCLIP	24.2	No	ProxyCLIP: Proxy Attention Improves CLIP for Ope...	2024-08-09	Code
5	COSMOS ViT-B/16	17.7	No	COSMOS: Cross-Modality Self-Distillation for Vis...	2024-12-02	Code
6	TagAlign	17.3	No	TagAlign: Improving Vision-Language Alignment wi...	2023-12-21	Code
7	TCL	17.1	No	Learning to Generate Text-grounded Mask for Open...	2022-12-01	Code
8	TTD (TCL)	17	No	TTD: Text-Tag Self-Distillation Enhancing Image-...	2024-03-30	Code
9	CLIPpy ViT-B	13.5	No	Perceptual Grouping in Contrastive Vision-Langua...	2022-10-18	Code
10	TTD (MaskCLIP)	12.7	No	TTD: Text-Tag Self-Distillation Enhancing Image-...	2024-03-30	Code
11	ReCo	11.2	No	ReCo: Retrieve and Co-segment for Zero-shot Tran...	2022-06-14	Code
12	MaskCLIP	9.8	No	Extract Free Dense Labels from CLIP	2021-12-02	Code
13	GroupViT (RedCaps)	9.2	No	GroupViT: Semantic Segmentation Emerges from Tex...	2022-02-22	Code

#1CorrCLIPSOTA
30.7
Mean IoU (val)· 2024-11-15
CorrCLIP: Reconstructing Correlations in CLIP with Off-the-Shelf Foundation Models for Open-Vocabulary Semantic Segmentation Code
#2TextRegion
27.3
Mean IoU (val)· 2025-05-29
TextRegion: Text-Aligned Region Tokens from Frozen Image-Text Models Code
#3TridentSOTA
26.7
Mean IoU (val)· 2024-11-14
Harnessing Vision Foundation Models for High-Performance, Training-Free Open Vocabulary Segmentation Code
#4ProxyCLIPSOTA
24.2
Mean IoU (val)· 2024-08-09
ProxyCLIP: Proxy Attention Improves CLIP for Open-Vocabulary Segmentation Code
#5COSMOS ViT-B/16
17.7
Mean IoU (val)· 2024-12-02
COSMOS: Cross-Modality Self-Distillation for Vision Language Pre-training Code
#6TagAlignSOTA
17.3
Mean IoU (val)· 2023-12-21
TagAlign: Improving Vision-Language Alignment with Multi-Tag Classification Code
#7TCLSOTA
17.1
Mean IoU (val)· 2022-12-01
Learning to Generate Text-grounded Mask for Open-world Semantic Segmentation from Only Image-Text Pairs Code
#8TTD (TCL)
17
Mean IoU (val)· 2024-03-30
TTD: Text-Tag Self-Distillation Enhancing Image-Text Alignment in CLIP to Alleviate Single Tag Bias Code
#9CLIPpy ViT-BSOTA
13.5
Mean IoU (val)· 2022-10-18
Perceptual Grouping in Contrastive Vision-Language Models Code
#10TTD (MaskCLIP)
12.7
Mean IoU (val)· 2024-03-30
TTD: Text-Tag Self-Distillation Enhancing Image-Text Alignment in CLIP to Alleviate Single Tag Bias Code
#11ReCoSOTA
11.2
Mean IoU (val)· 2022-06-14
ReCo: Retrieve and Co-segment for Zero-shot Transfer Code
#12MaskCLIPSOTA
9.8
Mean IoU (val)· 2021-12-02
Extract Free Dense Labels from CLIP Code
#13GroupViT (RedCaps)
9.2
Mean IoU (val)· 2022-02-22
GroupViT: Semantic Segmentation Emerges from Text Supervision Code