2D Classification on LVIS v1.0

Metric: AP novel-LVIS base training (higher is better)

LeaderboardDataset

Loading chart...

Results

Hide augmentations

Sort:

#	Model↕	AP novel-LVIS base training▼	Augmentations	Paper	Date↕	Code
1	LaMI-DETR	43.4	No	LaMI-DETR: Open-Vocabulary Detection with Langua...	2024-07-16	Code
2	DITO	40.4	No	Region-centric Image-Language Pretraining for Op...	2023-09-29	Code
3	OV-DQUO(ViT-L/14)	39.3	No	OV-DQUO: Open-Vocabulary DETR with Denoising Tex...	2024-05-28	Code
4	CoDet (EVA02-L)	37	Yes	CoDet: Co-Occurrence Guided Region-Word Alignmen...	2023-10-25	Code
5	CLIPSelf	34.9	No	CLIPSelf: Vision Transformer Distills Itself for...	2023-10-02	Code
6	OVMR	34.4	Yes	OVMR: Open-Vocabulary Recognition with Multi-Mod...	2024-06-07	Code
7	DE-ViT	34.3	No	Detect Everything with Few Examples	2023-09-22	Code
8	CFM-ViT	33.9	No	Contrastive Feature Masking Open-Vocabulary Visi...	2023-09-02	-
9	CLIM (RN50x64)	32.3	No	CLIM: Contrastive Language-Image Mosaic for Regi...	2023-12-18	Code
10	RO-ViT	32.1	No	Region-Aware Pretraining for Open-Vocabulary Obj...	2023-05-11	Code
11	Prova (Swin-Base)	31.5	Yes	Comprehensive Multi-Modal Prototypes are Simple ...	2024-12-23	Code
12	RTGen	30.2	Yes	RTGen: Generating Region-Text Pairs for Open-Voc...	2024-05-30	Code
13	OV-DQUO(ViT-B/16)	29.7	No	OV-DQUO: Open-Vocabulary DETR with Denoising Tex...	2024-05-28	Code
14	ViLD-ensemble w/ ALIGN (Eb7-FPN)	26.3	No	Open-vocabulary Object Detection via Vision and ...	2021-04-28	Code
15	OWL-ViT (CLIP-L/14)	25.6	Yes	Simple Open-Vocabulary Object Detection with Vis...	2022-05-12	Code
16	POMP	25.2	No	Prompt Pre-Training with Twenty-Thousand Classes...	2023-04-10	Code
17	BARON	22.6	No	Aligning Bag of Regions for Open-Vocabulary Obje...	2023-02-27	Code
18	MEDet	22.4	No	Open Vocabulary Object Detection with Proposal M...	2022-06-22	Code
19	Region-CLIP (RN50x4-C4)	22	No	RegionCLIP: Region-based Language-Image Pretrain...	2021-12-16	Code
20	RALF	21.9	Yes	Retrieval-Augmented Open-Vocabulary Object Detec...	2024-04-08	Code
21	OADP	21.7	No	Object-Aware Distillation Pyramid for Open-Vocab...	2023-03-10	Code
22	X-Paste	21.4	No	X-Paste: Revisiting Scalable Copy-Paste for Inst...	2022-12-07	Code
23	Object-Centric-OVD	21.1	Yes	Bridging the Gap between Object and Image-level ...	2022-07-07	Code
24	ViLD-ensemble (R152-FPN)	18.7	No	Open-vocabulary Object Detection via Vision and ...	2021-04-28	Code
25	Detic	17.8	Yes	Detecting Twenty-thousand Classes using Image-le...	2022-01-07	Code
26	Region-CLIP (RN50-C4)	17.1	No	RegionCLIP: Region-based Language-Image Pretrain...	2021-12-16	Code
27	ViLD-ensemble (R50-FPN)	16.6	No	Open-vocabulary Object Detection via Vision and ...	2021-04-28	Code
28	ViLD (R50-FPN)	16.1	No	Open-vocabulary Object Detection via Vision and ...	2021-04-28	Code

#1LaMI-DETRSOTA
43.4
AP novel-LVIS base training· 2024-07-16
LaMI-DETR: Open-Vocabulary Detection with Language Model Instruction Code
#2DITOSOTA
40.4
AP novel-LVIS base training· 2023-09-29
Region-centric Image-Language Pretraining for Open-Vocabulary Detection Code
#3OV-DQUO(ViT-L/14)
39.3
AP novel-LVIS base training· 2024-05-28
OV-DQUO: Open-Vocabulary DETR with Denoising Text Query Training and Open-World Unknown Objects Supervision Code
#4CoDet (EVA02-L)
37
AP novel-LVIS base training· Augmentations· 2023-10-25
CoDet: Co-Occurrence Guided Region-Word Alignment for Open-Vocabulary Object Detection Code
#5CLIPSelf
34.9
AP novel-LVIS base training· 2023-10-02
CLIPSelf: Vision Transformer Distills Itself for Open-Vocabulary Dense Prediction Code
#6OVMR
34.4
AP novel-LVIS base training· Augmentations· 2024-06-07
OVMR: Open-Vocabulary Recognition with Multi-Modal References Code
#7DE-ViTSOTA
34.3
AP novel-LVIS base training· 2023-09-22
Detect Everything with Few Examples Code
#8CFM-ViTSOTA
33.9
AP novel-LVIS base training· 2023-09-02
Contrastive Feature Masking Open-Vocabulary Vision Transformer
#9CLIM (RN50x64)
32.3
AP novel-LVIS base training· 2023-12-18
CLIM: Contrastive Language-Image Mosaic for Region Representation Code
#10RO-ViTSOTA
32.1
AP novel-LVIS base training· 2023-05-11
Region-Aware Pretraining for Open-Vocabulary Object Detection with Vision Transformers Code
#11Prova (Swin-Base)
31.5
AP novel-LVIS base training· Augmentations· 2024-12-23
Comprehensive Multi-Modal Prototypes are Simple and Effective Classifiers for Vast-Vocabulary Object Detection Code
#12RTGen
30.2
AP novel-LVIS base training· Augmentations· 2024-05-30
RTGen: Generating Region-Text Pairs for Open-Vocabulary Object Detection Code
#13OV-DQUO(ViT-B/16)
29.7
AP novel-LVIS base training· 2024-05-28
OV-DQUO: Open-Vocabulary DETR with Denoising Text Query Training and Open-World Unknown Objects Supervision Code
#14ViLD-ensemble w/ ALIGN (Eb7-FPN)SOTA
26.3
AP novel-LVIS base training· 2021-04-28
Open-vocabulary Object Detection via Vision and Language Knowledge Distillation Code
#15OWL-ViT (CLIP-L/14)
25.6
AP novel-LVIS base training· Augmentations· 2022-05-12
Simple Open-Vocabulary Object Detection with Vision Transformers Code
#16POMP
25.2
AP novel-LVIS base training· 2023-04-10
Prompt Pre-Training with Twenty-Thousand Classes for Open-Vocabulary Visual Recognition Code
#17BARON
22.6
AP novel-LVIS base training· 2023-02-27
Aligning Bag of Regions for Open-Vocabulary Object Detection Code
#18MEDet
22.4
AP novel-LVIS base training· 2022-06-22
Open Vocabulary Object Detection with Proposal Mining and Prediction Equalization Code
#19Region-CLIP (RN50x4-C4)
22
AP novel-LVIS base training· 2021-12-16
RegionCLIP: Region-based Language-Image Pretraining Code
#20RALF
21.9
AP novel-LVIS base training· Augmentations· 2024-04-08
Retrieval-Augmented Open-Vocabulary Object Detection Code
#21OADP
21.7
AP novel-LVIS base training· 2023-03-10
Object-Aware Distillation Pyramid for Open-Vocabulary Object Detection Code
#22X-Paste
21.4
AP novel-LVIS base training· 2022-12-07
X-Paste: Revisiting Scalable Copy-Paste for Instance Segmentation using CLIP and StableDiffusion Code
#23Object-Centric-OVD
21.1
AP novel-LVIS base training· Augmentations· 2022-07-07
Bridging the Gap between Object and Image-level Representations for Open-Vocabulary Detection Code
#24ViLD-ensemble (R152-FPN)
18.7
AP novel-LVIS base training· 2021-04-28
Open-vocabulary Object Detection via Vision and Language Knowledge Distillation Code
#25Detic
17.8
AP novel-LVIS base training· Augmentations· 2022-01-07
Detecting Twenty-thousand Classes using Image-level Supervision Code
#26Region-CLIP (RN50-C4)
17.1
AP novel-LVIS base training· 2021-12-16
RegionCLIP: Region-based Language-Image Pretraining Code
#27ViLD-ensemble (R50-FPN)
16.6
AP novel-LVIS base training· 2021-04-28
Open-vocabulary Object Detection via Vision and Language Knowledge Distillation Code
#28ViLD (R50-FPN)
16.1
AP novel-LVIS base training· 2021-04-28
Open-vocabulary Object Detection via Vision and Language Knowledge Distillation Code