Tasks
SotA
Datasets
Papers
Methods
Submit
About
SotA
/
Computer Vision
/
Open Vocabulary Semantic Segmentation
/
ADE20K-150
Open Vocabulary Semantic Segmentation on ADE20K-150
Metric: mIoU (higher is better)
Leaderboard
Dataset
Loading chart...
Results
Submit a result
Export CSV
Sort:
mIoU (best first)
mIoU (worst first)
Date (newest first)
Date (oldest first)
Model name (A→Z)
#
Model
↕
mIoU
▼
Extra Data
Paper
Date
↕
Code
1
Mask-Adapter
38.2
No
Mask-Adapter: The Devil is in the Masks for Open...
2024-12-05
Code
2
MaskCLIP++
38.2
No
MaskCLIP++: A Mask-Based CLIP Fine-tuning Framew...
2024-12-16
Code
3
UMG-CLIP-E/14
38.2
No
UMG-CLIP: A Unified Multi-Granularity Vision Gen...
2024-01-12
Code
4
CAT-Seg
37.9
No
CAT-Seg: Cost Aggregation for Open-Vocabulary Se...
2023-03-21
Code
5
SILC
37.7
No
SILC: Improving Vision Language Pretraining with...
2023-10-20
-
6
MAFT+
36.1
No
Collaborative Vision-Text Representation Optimiz...
2024-08-01
Code
7
UMG-CLIP-L/14
36.1
No
UMG-CLIP: A Unified Multi-Granularity Vision Gen...
2024-01-12
Code
8
OVSeg + OpenDAS
35.8
No
OpenDAS: Open-Vocabulary Domain Adaptation for 2...
2024-05-30
-
9
SED
35.2
No
SED: A Simple Encoder-Decoder for Open-Vocabular...
2023-11-27
Code
10
CLIPSelf
34.5
No
CLIPSelf: Vision Transformer Distills Itself for...
2023-10-02
Code
11
FC-CLIP
34.1
No
Convolutions Die Hard: Open-Vocabulary Segmentat...
2023-08-04
Code
12
SCAN
33.5
No
Open-Vocabulary Segmentation with Semantic-Assis...
2023-12-07
Code
13
EBSeg-L
32.8
No
Open-Vocabulary Semantic Segmentation with Image...
2024-06-14
Code
14
MAFT-ViTL
32
No
Learning Mask-aware CLIP Representations for Zer...
2023-09-30
Code
15
PACL
31.4
No
Open Vocabulary Semantic Segmentation with Patch...
2022-12-09
Code
16
ODISE
29.9
No
Open-Vocabulary Panoptic Segmentation with Text-...
2023-03-08
Code
17
OVSeg Swin-B
29.6
No
Open-Vocabulary Semantic Segmentation with Mask-...
2022-10-09
Code
18
MaskCLIP
23.7
No
Open-Vocabulary Universal Image Segmentation wit...
2022-08-18
Code
19
POMP
20.7
No
-
-
-
20
SimSeg
20.5
No
A Simple Baseline for Open-Vocabulary Semantic S...
2021-12-29
Code
21
TTD (TCL)
17
No
TTD: Text-Tag Self-Distillation Enhancing Image-...
2024-03-30
Code
22
LaVG
15.8
No
In Defense of Lazy Visual Grounding for Open-Voc...
2024-08-09
Code
23
TTD (MaskCLIP)
12.7
No
TTD: Text-Tag Self-Distillation Enhancing Image-...
2024-03-30
Code
#1
Mask-Adapter
38.2
mIoU
· 2024-12-05
Mask-Adapter: The Devil is in the Masks for Open-Vocabulary Segmentation
Code
#2
MaskCLIP++
38.2
mIoU
· 2024-12-16
MaskCLIP++: A Mask-Based CLIP Fine-tuning Framework for Open-Vocabulary Image Segmentation
Code
#3
UMG-CLIP-E/14
SOTA
38.2
mIoU
· 2024-01-12
UMG-CLIP: A Unified Multi-Granularity Vision Generalist for Open-World Understanding
Code
#4
CAT-Seg
SOTA
37.9
mIoU
· 2023-03-21
CAT-Seg: Cost Aggregation for Open-Vocabulary Semantic Segmentation
Code
#5
SILC
37.7
mIoU
· 2023-10-20
SILC: Improving Vision Language Pretraining with Self-Distillation
#6
MAFT+
36.1
mIoU
· 2024-08-01
Collaborative Vision-Text Representation Optimizing for Open-Vocabulary Segmentation
Code
#7
UMG-CLIP-L/14
36.1
mIoU
· 2024-01-12
UMG-CLIP: A Unified Multi-Granularity Vision Generalist for Open-World Understanding
Code
#8
OVSeg + OpenDAS
35.8
mIoU
· 2024-05-30
OpenDAS: Open-Vocabulary Domain Adaptation for 2D and 3D Segmentation
#9
SED
35.2
mIoU
· 2023-11-27
SED: A Simple Encoder-Decoder for Open-Vocabulary Semantic Segmentation
Code
#10
CLIPSelf
34.5
mIoU
· 2023-10-02
CLIPSelf: Vision Transformer Distills Itself for Open-Vocabulary Dense Prediction
Code
#11
FC-CLIP
34.1
mIoU
· 2023-08-04
Convolutions Die Hard: Open-Vocabulary Segmentation with Single Frozen Convolutional CLIP
Code
#12
SCAN
33.5
mIoU
· 2023-12-07
Open-Vocabulary Segmentation with Semantic-Assisted Calibration
Code
#13
EBSeg-L
32.8
mIoU
· 2024-06-14
Open-Vocabulary Semantic Segmentation with Image Embedding Balancing
Code
#14
MAFT-ViTL
32
mIoU
· 2023-09-30
Learning Mask-aware CLIP Representations for Zero-Shot Segmentation
Code
#15
PACL
SOTA
31.4
mIoU
· 2022-12-09
Open Vocabulary Semantic Segmentation with Patch Aligned Contrastive Learning
Code
#16
ODISE
29.9
mIoU
· 2023-03-08
Open-Vocabulary Panoptic Segmentation with Text-to-Image Diffusion Models
Code
#17
OVSeg Swin-B
SOTA
29.6
mIoU
· 2022-10-09
Open-Vocabulary Semantic Segmentation with Mask-adapted CLIP
Code
#18
MaskCLIP
SOTA
23.7
mIoU
· 2022-08-18
Open-Vocabulary Universal Image Segmentation with MaskCLIP
Code
#19
POMP
20.7
mIoU
No paper
#20
SimSeg
SOTA
20.5
mIoU
· 2021-12-29
A Simple Baseline for Open-Vocabulary Semantic Segmentation with Pre-trained Vision-language Model
Code
#21
TTD (TCL)
17
mIoU
· 2024-03-30
TTD: Text-Tag Self-Distillation Enhancing Image-Text Alignment in CLIP to Alleviate Single Tag Bias
Code
#22
LaVG
15.8
mIoU
· 2024-08-09
In Defense of Lazy Visual Grounding for Open-Vocabulary Semantic Segmentation
Code
#23
TTD (MaskCLIP)
12.7
mIoU
· 2024-03-30
TTD: Text-Tag Self-Distillation Enhancing Image-Text Alignment in CLIP to Alleviate Single Tag Bias
Code