Open Vocabulary Semantic Segmentation on ADE20K-150

Metric: mIoU (higher is better)

LeaderboardDataset

Loading chart...

Results

Sort:

#	Model↕	mIoU▼	Extra Data	Paper	Date↕	Code
1	Mask-Adapter	38.2	No	Mask-Adapter: The Devil is in the Masks for Open...	2024-12-05	Code
2	MaskCLIP++	38.2	No	MaskCLIP++: A Mask-Based CLIP Fine-tuning Framew...	2024-12-16	Code
3	UMG-CLIP-E/14	38.2	No	UMG-CLIP: A Unified Multi-Granularity Vision Gen...	2024-01-12	Code
4	CAT-Seg	37.9	No	CAT-Seg: Cost Aggregation for Open-Vocabulary Se...	2023-03-21	Code
5	SILC	37.7	No	SILC: Improving Vision Language Pretraining with...	2023-10-20	-
6	MAFT+	36.1	No	Collaborative Vision-Text Representation Optimiz...	2024-08-01	Code
7	UMG-CLIP-L/14	36.1	No	UMG-CLIP: A Unified Multi-Granularity Vision Gen...	2024-01-12	Code
8	OVSeg + OpenDAS	35.8	No	OpenDAS: Open-Vocabulary Domain Adaptation for 2...	2024-05-30	-
9	SED	35.2	No	SED: A Simple Encoder-Decoder for Open-Vocabular...	2023-11-27	Code
10	CLIPSelf	34.5	No	CLIPSelf: Vision Transformer Distills Itself for...	2023-10-02	Code
11	FC-CLIP	34.1	No	Convolutions Die Hard: Open-Vocabulary Segmentat...	2023-08-04	Code
12	SCAN	33.5	No	Open-Vocabulary Segmentation with Semantic-Assis...	2023-12-07	Code
13	EBSeg-L	32.8	No	Open-Vocabulary Semantic Segmentation with Image...	2024-06-14	Code
14	MAFT-ViTL	32	No	Learning Mask-aware CLIP Representations for Zer...	2023-09-30	Code
15	PACL	31.4	No	Open Vocabulary Semantic Segmentation with Patch...	2022-12-09	Code
16	ODISE	29.9	No	Open-Vocabulary Panoptic Segmentation with Text-...	2023-03-08	Code
17	OVSeg Swin-B	29.6	No	Open-Vocabulary Semantic Segmentation with Mask-...	2022-10-09	Code
18	MaskCLIP	23.7	No	Open-Vocabulary Universal Image Segmentation wit...	2022-08-18	Code
19	POMP	20.7	No	-	-	-
20	SimSeg	20.5	No	A Simple Baseline for Open-Vocabulary Semantic S...	2021-12-29	Code
21	TTD (TCL)	17	No	TTD: Text-Tag Self-Distillation Enhancing Image-...	2024-03-30	Code
22	LaVG	15.8	No	In Defense of Lazy Visual Grounding for Open-Voc...	2024-08-09	Code
23	TTD (MaskCLIP)	12.7	No	TTD: Text-Tag Self-Distillation Enhancing Image-...	2024-03-30	Code

#1Mask-Adapter
38.2
mIoU· 2024-12-05
Mask-Adapter: The Devil is in the Masks for Open-Vocabulary Segmentation Code
#2MaskCLIP++
38.2
mIoU· 2024-12-16
MaskCLIP++: A Mask-Based CLIP Fine-tuning Framework for Open-Vocabulary Image Segmentation Code
#3UMG-CLIP-E/14SOTA
38.2
mIoU· 2024-01-12
UMG-CLIP: A Unified Multi-Granularity Vision Generalist for Open-World Understanding Code
#4CAT-SegSOTA
37.9
mIoU· 2023-03-21
CAT-Seg: Cost Aggregation for Open-Vocabulary Semantic Segmentation Code
#5SILC
37.7
mIoU· 2023-10-20
SILC: Improving Vision Language Pretraining with Self-Distillation
#6MAFT+
36.1
mIoU· 2024-08-01
Collaborative Vision-Text Representation Optimizing for Open-Vocabulary Segmentation Code
#7UMG-CLIP-L/14
36.1
mIoU· 2024-01-12
UMG-CLIP: A Unified Multi-Granularity Vision Generalist for Open-World Understanding Code
#8OVSeg + OpenDAS
35.8
mIoU· 2024-05-30
OpenDAS: Open-Vocabulary Domain Adaptation for 2D and 3D Segmentation
#9SED
35.2
mIoU· 2023-11-27
SED: A Simple Encoder-Decoder for Open-Vocabulary Semantic Segmentation Code
#10CLIPSelf
34.5
mIoU· 2023-10-02
CLIPSelf: Vision Transformer Distills Itself for Open-Vocabulary Dense Prediction Code
#11FC-CLIP
34.1
mIoU· 2023-08-04
Convolutions Die Hard: Open-Vocabulary Segmentation with Single Frozen Convolutional CLIP Code
#12SCAN
33.5
mIoU· 2023-12-07
Open-Vocabulary Segmentation with Semantic-Assisted Calibration Code
#13EBSeg-L
32.8
mIoU· 2024-06-14
Open-Vocabulary Semantic Segmentation with Image Embedding Balancing Code
#14MAFT-ViTL
32
mIoU· 2023-09-30
Learning Mask-aware CLIP Representations for Zero-Shot Segmentation Code
#15PACLSOTA
31.4
mIoU· 2022-12-09
Open Vocabulary Semantic Segmentation with Patch Aligned Contrastive Learning Code
#16ODISE
29.9
mIoU· 2023-03-08
Open-Vocabulary Panoptic Segmentation with Text-to-Image Diffusion Models Code
#17OVSeg Swin-BSOTA
29.6
mIoU· 2022-10-09
Open-Vocabulary Semantic Segmentation with Mask-adapted CLIP Code
#18MaskCLIPSOTA
23.7
mIoU· 2022-08-18
Open-Vocabulary Universal Image Segmentation with MaskCLIP Code
#19POMP
20.7
mIoU
No paper
#20SimSegSOTA
20.5
mIoU· 2021-12-29
A Simple Baseline for Open-Vocabulary Semantic Segmentation with Pre-trained Vision-language Model Code
#21TTD (TCL)
17
mIoU· 2024-03-30
TTD: Text-Tag Self-Distillation Enhancing Image-Text Alignment in CLIP to Alleviate Single Tag Bias Code
#22LaVG
15.8
mIoU· 2024-08-09
In Defense of Lazy Visual Grounding for Open-Vocabulary Semantic Segmentation Code
#23TTD (MaskCLIP)
12.7
mIoU· 2024-03-30
TTD: Text-Tag Self-Distillation Enhancing Image-Text Alignment in CLIP to Alleviate Single Tag Bias Code