Tasks
SotA
Datasets
Papers
Methods
Submit
About
SotA
/
Computer Vision
/
Open Vocabulary Semantic Segmentation
/
PascalVOC-20
Open Vocabulary Semantic Segmentation on PascalVOC-20
Metric: mIoU (higher is better)
Leaderboard
Dataset
Loading chart...
Results
Submit a result
Hide extra data
Export CSV
Sort:
mIoU (best first)
mIoU (worst first)
Date (newest first)
Date (oldest first)
Model name (A→Z)
#
Model
↕
mIoU
▼
Extra Data
Paper
Date
↕
Code
1
UMG-CLIP-L/14
97.9
No
UMG-CLIP: A Unified Multi-Granularity Vision Gen...
2024-01-12
Code
2
SILC
97.6
No
SILC: Improving Vision Language Pretraining with...
2023-10-20
-
3
SCAN
97.2
No
Open-Vocabulary Segmentation with Semantic-Assis...
2023-12-07
Code
4
CAT-Seg
97
No
CAT-Seg: Cost Aggregation for Open-Vocabulary Se...
2023-03-21
Code
5
MaskCLIP++
96.8
No
MaskCLIP++: A Mask-Based CLIP Fine-tuning Framew...
2024-12-16
Code
6
MAFT+
96.5
No
Collaborative Vision-Text Representation Optimiz...
2024-08-01
Code
7
EBSeg-L
96.4
No
Open-Vocabulary Semantic Segmentation with Image...
2024-06-14
Code
8
FC-CLIP
95.4
No
Convolutions Die Hard: Open-Vocabulary Segmentat...
2023-08-04
Code
9
OVSeg Swin-B
94.5
No
Open-Vocabulary Semantic Segmentation with Mask-...
2022-10-09
Code
10
MAFT-ViTL
92.1
No
Learning Mask-aware CLIP Representations for Zer...
2023-09-30
Code
11
MAFT-ViTL
92.1
No
Learning Mask-aware CLIP Representations for Zer...
2023-09-30
Code
12
HyperSeg
92.1
Yes
HyperSeg: Towards Universal Visual Segmentation ...
2024-11-26
Code
13
POMP
89.4
No
Prompt Pre-Training with Twenty-Thousand Classes...
2023-04-10
Code
14
TagAlign(trained with image-text pairs)
87.9
No
TagAlign: Improving Vision-Language Alignment wi...
2023-12-21
Code
15
ODISE
84.6
No
Open-Vocabulary Panoptic Segmentation with Text-...
2023-03-08
Code
16
TCL
83.2
No
Learning to Generate Text-grounded Mask for Open...
2022-12-01
Code
17
LaVG
82.5
No
In Defense of Lazy Visual Grounding for Open-Voc...
2024-08-09
Code
18
PACL
72.3
No
Open Vocabulary Semantic Segmentation with Patch...
2022-12-09
Code
#1
UMG-CLIP-L/14
SOTA
97.9
mIoU
· 2024-01-12
UMG-CLIP: A Unified Multi-Granularity Vision Generalist for Open-World Understanding
Code
#2
SILC
SOTA
97.6
mIoU
· 2023-10-20
SILC: Improving Vision Language Pretraining with Self-Distillation
#3
SCAN
97.2
mIoU
· 2023-12-07
Open-Vocabulary Segmentation with Semantic-Assisted Calibration
Code
#4
CAT-Seg
SOTA
97
mIoU
· 2023-03-21
CAT-Seg: Cost Aggregation for Open-Vocabulary Semantic Segmentation
Code
#5
MaskCLIP++
96.8
mIoU
· 2024-12-16
MaskCLIP++: A Mask-Based CLIP Fine-tuning Framework for Open-Vocabulary Image Segmentation
Code
#6
MAFT+
96.5
mIoU
· 2024-08-01
Collaborative Vision-Text Representation Optimizing for Open-Vocabulary Segmentation
Code
#7
EBSeg-L
96.4
mIoU
· 2024-06-14
Open-Vocabulary Semantic Segmentation with Image Embedding Balancing
Code
#8
FC-CLIP
95.4
mIoU
· 2023-08-04
Convolutions Die Hard: Open-Vocabulary Segmentation with Single Frozen Convolutional CLIP
Code
#9
OVSeg Swin-B
SOTA
94.5
mIoU
· 2022-10-09
Open-Vocabulary Semantic Segmentation with Mask-adapted CLIP
Code
#10
MAFT-ViTL
92.1
mIoU
· 2023-09-30
Learning Mask-aware CLIP Representations for Zero-Shot Segmentation
Code
#11
MAFT-ViTL
92.1
mIoU
· 2023-09-30
Learning Mask-aware CLIP Representations for Zero-Shot Segmentation
Code
#12
HyperSeg
92.1
mIoU
· Extra Data
· 2024-11-26
HyperSeg: Towards Universal Visual Segmentation with Large Language Model
Code
#13
POMP
89.4
mIoU
· 2023-04-10
Prompt Pre-Training with Twenty-Thousand Classes for Open-Vocabulary Visual Recognition
Code
#14
TagAlign(trained with image-text pairs)
87.9
mIoU
· 2023-12-21
TagAlign: Improving Vision-Language Alignment with Multi-Tag Classification
Code
#15
ODISE
84.6
mIoU
· 2023-03-08
Open-Vocabulary Panoptic Segmentation with Text-to-Image Diffusion Models
Code
#16
TCL
83.2
mIoU
· 2022-12-01
Learning to Generate Text-grounded Mask for Open-world Semantic Segmentation from Only Image-Text Pairs
Code
#17
LaVG
82.5
mIoU
· 2024-08-09
In Defense of Lazy Visual Grounding for Open-Vocabulary Semantic Segmentation
Code
#18
PACL
72.3
mIoU
· 2022-12-09
Open Vocabulary Semantic Segmentation with Patch Aligned Contrastive Learning
Code