Tasks
SotA
Datasets
Papers
Methods
Submit
About
SotA
/
Computer Vision
/
Unsupervised Semantic Segmentation
/
Cityscapes val
Unsupervised Semantic Segmentation on Cityscapes val
Metric: mIoU (higher is better)
Leaderboard
Dataset
Loading chart...
Results
Submit a result
Export CSV
Sort:
mIoU (best first)
mIoU (worst first)
Date (newest first)
Date (oldest first)
Model name (A→Z)
#
Model
↕
mIoU
▼
Extra Data
Paper
Date
↕
Code
1
CorrCLIP
51.1
No
CorrCLIP: Reconstructing Correlations in CLIP wi...
2024-11-15
Code
2
Trident
47.6
No
Harnessing Vision Foundation Models for High-Per...
2024-11-14
Code
3
ProxyCLIP
42
No
ProxyCLIP: Proxy Attention Improves CLIP for Ope...
2024-08-09
Code
4
COSMOS ViT-B/16
34.7
No
COSMOS: Cross-Modality Self-Distillation for Vis...
2024-12-02
Code
5
TTD (MaskCLIP)
32
No
TTD: Text-Tag Self-Distillation Enhancing Image-...
2024-03-30
Code
6
TagAlign
27.5
No
TagAlign: Improving Vision-Language Alignment wi...
2023-12-21
Code
7
TTD (TCL)
27
No
TTD: Text-Tag Self-Distillation Enhancing Image-...
2024-03-30
Code
8
ReCo+
24.2
No
ReCo: Retrieve and Co-segment for Zero-shot Tran...
2022-06-14
Code
9
TCL
24
No
Learning to Generate Text-grounded Mask for Open...
2022-12-01
Code
10
Segmenter ViT-S/16
21.8
No
Drive&Segment: Unsupervised Semantic Segmentatio...
2022-03-21
Code
11
ReCo
19.3
No
ReCo: Retrieve and Co-segment for Zero-shot Tran...
2022-06-14
Code
12
CLIPpy ViT-B
18.1
No
Perceptual Grouping in Contrastive Vision-Langua...
2022-10-18
Code
13
MaskCLIP
10
No
Extract Free Dense Labels from CLIP
2021-12-02
Code
#1
CorrCLIP
SOTA
51.1
mIoU
· 2024-11-15
CorrCLIP: Reconstructing Correlations in CLIP with Off-the-Shelf Foundation Models for Open-Vocabulary Semantic Segmentation
Code
#2
Trident
SOTA
47.6
mIoU
· 2024-11-14
Harnessing Vision Foundation Models for High-Performance, Training-Free Open Vocabulary Segmentation
Code
#3
ProxyCLIP
SOTA
42
mIoU
· 2024-08-09
ProxyCLIP: Proxy Attention Improves CLIP for Open-Vocabulary Segmentation
Code
#4
COSMOS ViT-B/16
34.7
mIoU
· 2024-12-02
COSMOS: Cross-Modality Self-Distillation for Vision Language Pre-training
Code
#5
TTD (MaskCLIP)
SOTA
32
mIoU
· 2024-03-30
TTD: Text-Tag Self-Distillation Enhancing Image-Text Alignment in CLIP to Alleviate Single Tag Bias
Code
#6
TagAlign
SOTA
27.5
mIoU
· 2023-12-21
TagAlign: Improving Vision-Language Alignment with Multi-Tag Classification
Code
#7
TTD (TCL)
27
mIoU
· 2024-03-30
TTD: Text-Tag Self-Distillation Enhancing Image-Text Alignment in CLIP to Alleviate Single Tag Bias
Code
#8
ReCo+
SOTA
24.2
mIoU
· 2022-06-14
ReCo: Retrieve and Co-segment for Zero-shot Transfer
Code
#9
TCL
24
mIoU
· 2022-12-01
Learning to Generate Text-grounded Mask for Open-world Semantic Segmentation from Only Image-Text Pairs
Code
#10
Segmenter ViT-S/16
SOTA
21.8
mIoU
· 2022-03-21
Drive&Segment: Unsupervised Semantic Segmentation of Urban Scenes via Cross-modal Distillation
Code
#11
ReCo
19.3
mIoU
· 2022-06-14
ReCo: Retrieve and Co-segment for Zero-shot Transfer
Code
#12
CLIPpy ViT-B
18.1
mIoU
· 2022-10-18
Perceptual Grouping in Contrastive Vision-Language Models
Code
#13
MaskCLIP
SOTA
10
mIoU
· 2021-12-02
Extract Free Dense Labels from CLIP
Code