Open Vocabulary Semantic Segmentation on PascalVOC-20

Metric: mIoU (higher is better)

LeaderboardDataset

Loading chart...

Results

Hide extra data

Sort:

#	Model↕	mIoU▼	Extra Data	Paper	Date↕	Code
1	UMG-CLIP-L/14	97.9	No	UMG-CLIP: A Unified Multi-Granularity Vision Gen...	2024-01-12	Code
2	SILC	97.6	No	SILC: Improving Vision Language Pretraining with...	2023-10-20	-
3	SCAN	97.2	No	Open-Vocabulary Segmentation with Semantic-Assis...	2023-12-07	Code
4	CAT-Seg	97	No	CAT-Seg: Cost Aggregation for Open-Vocabulary Se...	2023-03-21	Code
5	MaskCLIP++	96.8	No	MaskCLIP++: A Mask-Based CLIP Fine-tuning Framew...	2024-12-16	Code
6	MAFT+	96.5	No	Collaborative Vision-Text Representation Optimiz...	2024-08-01	Code
7	EBSeg-L	96.4	No	Open-Vocabulary Semantic Segmentation with Image...	2024-06-14	Code
8	FC-CLIP	95.4	No	Convolutions Die Hard: Open-Vocabulary Segmentat...	2023-08-04	Code
9	OVSeg Swin-B	94.5	No	Open-Vocabulary Semantic Segmentation with Mask-...	2022-10-09	Code
10	MAFT-ViTL	92.1	No	Learning Mask-aware CLIP Representations for Zer...	2023-09-30	Code
11	MAFT-ViTL	92.1	No	Learning Mask-aware CLIP Representations for Zer...	2023-09-30	Code
12	HyperSeg	92.1	Yes	HyperSeg: Towards Universal Visual Segmentation ...	2024-11-26	Code
13	POMP	89.4	No	Prompt Pre-Training with Twenty-Thousand Classes...	2023-04-10	Code
14	TagAlign(trained with image-text pairs)	87.9	No	TagAlign: Improving Vision-Language Alignment wi...	2023-12-21	Code
15	ODISE	84.6	No	Open-Vocabulary Panoptic Segmentation with Text-...	2023-03-08	Code
16	TCL	83.2	No	Learning to Generate Text-grounded Mask for Open...	2022-12-01	Code
17	LaVG	82.5	No	In Defense of Lazy Visual Grounding for Open-Voc...	2024-08-09	Code
18	PACL	72.3	No	Open Vocabulary Semantic Segmentation with Patch...	2022-12-09	Code

#1UMG-CLIP-L/14SOTA
97.9
mIoU· 2024-01-12
UMG-CLIP: A Unified Multi-Granularity Vision Generalist for Open-World Understanding Code
#2SILCSOTA
97.6
mIoU· 2023-10-20
SILC: Improving Vision Language Pretraining with Self-Distillation
#3SCAN
97.2
mIoU· 2023-12-07
Open-Vocabulary Segmentation with Semantic-Assisted Calibration Code
#4CAT-SegSOTA
97
mIoU· 2023-03-21
CAT-Seg: Cost Aggregation for Open-Vocabulary Semantic Segmentation Code
#5MaskCLIP++
96.8
mIoU· 2024-12-16
MaskCLIP++: A Mask-Based CLIP Fine-tuning Framework for Open-Vocabulary Image Segmentation Code
#6MAFT+
96.5
mIoU· 2024-08-01
Collaborative Vision-Text Representation Optimizing for Open-Vocabulary Segmentation Code
#7EBSeg-L
96.4
mIoU· 2024-06-14
Open-Vocabulary Semantic Segmentation with Image Embedding Balancing Code
#8FC-CLIP
95.4
mIoU· 2023-08-04
Convolutions Die Hard: Open-Vocabulary Segmentation with Single Frozen Convolutional CLIP Code
#9OVSeg Swin-BSOTA
94.5
mIoU· 2022-10-09
Open-Vocabulary Semantic Segmentation with Mask-adapted CLIP Code
#10MAFT-ViTL
92.1
mIoU· 2023-09-30
Learning Mask-aware CLIP Representations for Zero-Shot Segmentation Code
#11MAFT-ViTL
92.1
mIoU· 2023-09-30
Learning Mask-aware CLIP Representations for Zero-Shot Segmentation Code
#12HyperSeg
92.1
mIoU· Extra Data· 2024-11-26
HyperSeg: Towards Universal Visual Segmentation with Large Language Model Code
#13POMP
89.4
mIoU· 2023-04-10
Prompt Pre-Training with Twenty-Thousand Classes for Open-Vocabulary Visual Recognition Code
#14TagAlign(trained with image-text pairs)
87.9
mIoU· 2023-12-21
TagAlign: Improving Vision-Language Alignment with Multi-Tag Classification Code
#15ODISE
84.6
mIoU· 2023-03-08
Open-Vocabulary Panoptic Segmentation with Text-to-Image Diffusion Models Code
#16TCL
83.2
mIoU· 2022-12-01
Learning to Generate Text-grounded Mask for Open-world Semantic Segmentation from Only Image-Text Pairs Code
#17LaVG
82.5
mIoU· 2024-08-09
In Defense of Lazy Visual Grounding for Open-Vocabulary Semantic Segmentation Code
#18PACL
72.3
mIoU· 2022-12-09
Open Vocabulary Semantic Segmentation with Patch Aligned Contrastive Learning Code