Open Vocabulary Semantic Segmentation on PASCAL Context-59

Metric: mIoU (higher is better)

LeaderboardDataset

Loading chart...

Results

Hide extra data

Sort:

#	Model↕	mIoU▼	Extra Data	Paper	Date↕	Code
1	HyperSeg	64.6	Yes	HyperSeg: Towards Universal Visual Segmentation ...	2024-11-26	Code
2	SILC	63.5	No	SILC: Improving Vision Language Pretraining with...	2023-10-20	-
3	CAT-Seg	63.3	No	CAT-Seg: Cost Aggregation for Open-Vocabulary Se...	2023-03-21	Code
4	MaskCLIP++	62.5	No	MaskCLIP++: A Mask-Based CLIP Fine-tuning Framew...	2024-12-16	Code
5	CLIPSelf	62.3	No	CLIPSelf: Vision Transformer Distills Itself for...	2023-10-02	Code
6	UMG-CLIP-L/14	61	No	UMG-CLIP: A Unified Multi-Granularity Vision Gen...	2024-01-12	Code
7	SED	60.6	No	SED: A Simple Encoder-Decoder for Open-Vocabular...	2023-11-27	Code
8	Mask-Adapter	60.4	No	Mask-Adapter: The Devil is in the Masks for Open...	2024-12-05	Code
9	EBSeg-L	60.2	No	Open-Vocabulary Semantic Segmentation with Image...	2024-06-14	Code
10	MAFT+	59.4	No	Collaborative Vision-Text Representation Optimiz...	2024-08-01	Code
11	SCAN	59.3	No	Open-Vocabulary Segmentation with Semantic-Assis...	2023-12-07	Code
12	MAFT-ViTL	58.5	No	Learning Mask-aware CLIP Representations for Zer...	2023-09-30	Code
13	FC-CLIP	58.4	No	Convolutions Die Hard: Open-Vocabulary Segmentat...	2023-08-04	Code
14	ODISE	57.3	No	Open-Vocabulary Panoptic Segmentation with Text-...	2023-03-08	Code
15	OVSeg Swin-B	55.7	No	Open-Vocabulary Semantic Segmentation with Mask-...	2022-10-09	Code
16	PACL	50.1	No	Open Vocabulary Semantic Segmentation with Patch...	2022-12-09	Code
17	SimSeg	47.7	No	A Simple Baseline for Open-Vocabulary Semantic S...	2021-12-29	Code
18	MaskCLIP	45.9	No	Open-Vocabulary Universal Image Segmentation wit...	2022-08-18	Code
19	TaAlign(trained with image-text pairs)	37.6	No	TagAlign: Improving Vision-Language Alignment wi...	2023-12-21	Code
20	TTD (TCL)	37.4	No	TTD: Text-Tag Self-Distillation Enhancing Image-...	2024-03-30	Code
21	LaVG	34.7	No	In Defense of Lazy Visual Grounding for Open-Voc...	2024-08-09	Code
22	TCL	33.9	No	Learning to Generate Text-grounded Mask for Open...	2022-12-01	Code
23	TTD (MaskCLIP)	31	No	TTD: Text-Tag Self-Distillation Enhancing Image-...	2024-03-30	Code
24	CLIP Surgery (original CLIP without any fine-tuning)	29.3	No	A Closer Look at the Explainability of Contrasti...	2023-04-12	Code

#1HyperSegSOTA
64.6
mIoU· Extra Data· 2024-11-26
HyperSeg: Towards Universal Visual Segmentation with Large Language Model Code
#2SILCSOTA
63.5
mIoU· 2023-10-20
SILC: Improving Vision Language Pretraining with Self-Distillation
#3CAT-SegSOTA
63.3
mIoU· 2023-03-21
CAT-Seg: Cost Aggregation for Open-Vocabulary Semantic Segmentation Code
#4MaskCLIP++
62.5
mIoU· 2024-12-16
MaskCLIP++: A Mask-Based CLIP Fine-tuning Framework for Open-Vocabulary Image Segmentation Code
#5CLIPSelf
62.3
mIoU· 2023-10-02
CLIPSelf: Vision Transformer Distills Itself for Open-Vocabulary Dense Prediction Code
#6UMG-CLIP-L/14
61
mIoU· 2024-01-12
UMG-CLIP: A Unified Multi-Granularity Vision Generalist for Open-World Understanding Code
#7SED
60.6
mIoU· 2023-11-27
SED: A Simple Encoder-Decoder for Open-Vocabulary Semantic Segmentation Code
#8Mask-Adapter
60.4
mIoU· 2024-12-05
Mask-Adapter: The Devil is in the Masks for Open-Vocabulary Segmentation Code
#9EBSeg-L
60.2
mIoU· 2024-06-14
Open-Vocabulary Semantic Segmentation with Image Embedding Balancing Code
#10MAFT+
59.4
mIoU· 2024-08-01
Collaborative Vision-Text Representation Optimizing for Open-Vocabulary Segmentation Code
#11SCAN
59.3
mIoU· 2023-12-07
Open-Vocabulary Segmentation with Semantic-Assisted Calibration Code
#12MAFT-ViTL
58.5
mIoU· 2023-09-30
Learning Mask-aware CLIP Representations for Zero-Shot Segmentation Code
#13FC-CLIP
58.4
mIoU· 2023-08-04
Convolutions Die Hard: Open-Vocabulary Segmentation with Single Frozen Convolutional CLIP Code
#14ODISESOTA
57.3
mIoU· 2023-03-08
Open-Vocabulary Panoptic Segmentation with Text-to-Image Diffusion Models Code
#15OVSeg Swin-BSOTA
55.7
mIoU· 2022-10-09
Open-Vocabulary Semantic Segmentation with Mask-adapted CLIP Code
#16PACL
50.1
mIoU· 2022-12-09
Open Vocabulary Semantic Segmentation with Patch Aligned Contrastive Learning Code
#17SimSegSOTA
47.7
mIoU· 2021-12-29
A Simple Baseline for Open-Vocabulary Semantic Segmentation with Pre-trained Vision-language Model Code
#18MaskCLIP
45.9
mIoU· 2022-08-18
Open-Vocabulary Universal Image Segmentation with MaskCLIP Code
#19TaAlign(trained with image-text pairs)
37.6
mIoU· 2023-12-21
TagAlign: Improving Vision-Language Alignment with Multi-Tag Classification Code
#20TTD (TCL)
37.4
mIoU· 2024-03-30
TTD: Text-Tag Self-Distillation Enhancing Image-Text Alignment in CLIP to Alleviate Single Tag Bias Code
#21LaVG
34.7
mIoU· 2024-08-09
In Defense of Lazy Visual Grounding for Open-Vocabulary Semantic Segmentation Code
#22TCL
33.9
mIoU· 2022-12-01
Learning to Generate Text-grounded Mask for Open-world Semantic Segmentation from Only Image-Text Pairs Code
#23TTD (MaskCLIP)
31
mIoU· 2024-03-30
TTD: Text-Tag Self-Distillation Enhancing Image-Text Alignment in CLIP to Alleviate Single Tag Bias Code
#24CLIP Surgery (original CLIP without any fine-tuning)
29.3
mIoU· 2023-04-12
A Closer Look at the Explainability of Contrastive Language-Image Pre-training Code