Open Vocabulary Object Detection on MSCOCO

Metric: AP 0.5 (higher is better)

LeaderboardDataset

Loading chart...

Results

Hide extra data

Sort:

#	Model↕	AP 0.5▼	Extra Data	Paper	Date↕	Code
1	Cooperative Foundational Models	50.3	No	Enhancing Novel Object Detection via Cooperative...	2023-11-19	Code
2	DE-ViT	50	No	Detect Everything with Few Examples	2023-09-22	Code
3	Yolov8-nano	47.2	Yes	YOLOv8-Based Visual Detection of Road Hazards: P...	2023-10-31	-
4	DITO	46.1	No	Region-centric Image-Language Pretraining for Op...	2023-09-29	Code
5	OV-DQUO(RN50x4)	45.6	No	OV-DQUO: Open-Vocabulary DETR with Denoising Tex...	2024-05-28	Code
6	LP-OVOD (OWL-ViT Proposals)	44.9	No	LP-OVOD: Open-Vocabulary Object Detection by Lin...	2023-10-26	Code
7	CLIPSelf	44.3	No	CLIPSelf: Vision Transformer Distills Itself for...	2023-10-02	Code
8	CORA+	43.1	No	CORA: Adapting CLIP for Open-Vocabulary Detectio...	2023-03-23	Code
9	BARON	42.7	No	Aligning Bag of Regions for Open-Vocabulary Obje...	2023-02-27	Code
10	SIA-OVD (RN50x4)	41.9	No	SIA-OVD: Shape-Invariant Adapter for Bridging th...	2024-10-08	Code
11	CORA	41.7	No	CORA: Adapting CLIP for Open-Vocabulary Detectio...	2023-03-23	Code
12	RALF	41.3	No	Retrieval-Augmented Open-Vocabulary Object Detec...	2024-04-08	Code
13	LP-OVOD	40.5	No	LP-OVOD: Open-Vocabulary Object Detection by Lin...	2023-10-26	Code
14	Region-CLIP (RN50x4-C4)	39.3	Yes	RegionCLIP: Region-based Language-Image Pretrain...	2021-12-16	Code
15	OV-DQUO(R50)	39.2	No	OV-DQUO: Open-Vocabulary DETR with Denoising Tex...	2024-05-28	Code
16	Object-Centric-OVD	36.9	No	Bridging the Gap between Object and Image-level ...	2022-07-07	Code
17	CLIM (RN50)	36.9	No	CLIM: Contrastive Language-Image Mosaic for Regi...	2023-12-18	Code
18	OADP (G-OVD)	35.6	No	Object-Aware Distillation Pyramid for Open-Vocab...	2023-03-10	Code
19	SIA-OVD (RN50)	35.5	No	SIA-OVD: Shape-Invariant Adapter for Bridging th...	2024-10-08	Code
20	VL-PLM (RN50)	34.4	No	Exploiting Unlabeled Data with Vision and Langua...	2022-07-18	Code
21	CFM-ViT	34.1	No	Contrastive Feature Masking Open-Vocabulary Visi...	2023-09-02	-
22	MEDet (RN50)	32.6	No	Open Vocabulary Object Detection with Proposal M...	2022-06-22	Code
23	Region-CLIP (RN50-C4)	31.4	Yes	RegionCLIP: Region-based Language-Image Pretrain...	2021-12-16	Code
24	OVAD-Baseline	30	No	Open-vocabulary Attribute Detection	2022-11-23	Code
25	OADP	30	No	Object-Aware Distillation Pyramid for Open-Vocab...	2023-03-10	Code
26	OV-DERT	29.4	No	Open-Vocabulary DETR with Conditional Matching	2022-03-22	Code
27	LocOv (RN50-C4)	28.6	No	Localized Vision-Language Matching for Open-voca...	2022-05-12	Code
28	Detic	27.8	No	Detecting Twenty-thousand Classes using Image-le...	2022-01-07	Code
29	ViLD	27.6	Yes	Open-vocabulary Object Detection via Vision and ...	2021-04-28	Code
30	OVR-CNN	22.8	No	Open-Vocabulary Object Detection Using Captions	2020-11-20	Code
31	HierKD	20.3	No	Open-Vocabulary One-Stage Detection with Hierarc...	2022-03-20	Code
32	Yolov8	0.5	No	YOLOv8-AM: YOLOv8 Based on Effective Attention M...	2024-02-14	Code

#1Cooperative Foundational ModelsSOTA
50.3
AP 0.5· 2023-11-19
Enhancing Novel Object Detection via Cooperative Foundational Models Code
#2DE-ViTSOTA
50
AP 0.5· 2023-09-22
Detect Everything with Few Examples Code
#3Yolov8-nano
47.2
AP 0.5· Extra Data· 2023-10-31
YOLOv8-Based Visual Detection of Road Hazards: Potholes, Sewer Covers, and Manholes
#4DITO
46.1
AP 0.5· 2023-09-29
Region-centric Image-Language Pretraining for Open-Vocabulary Detection Code
#5OV-DQUO(RN50x4)
45.6
AP 0.5· 2024-05-28
OV-DQUO: Open-Vocabulary DETR with Denoising Text Query Training and Open-World Unknown Objects Supervision Code
#6LP-OVOD (OWL-ViT Proposals)
44.9
AP 0.5· 2023-10-26
LP-OVOD: Open-Vocabulary Object Detection by Linear Probing Code
#7CLIPSelf
44.3
AP 0.5· 2023-10-02
CLIPSelf: Vision Transformer Distills Itself for Open-Vocabulary Dense Prediction Code
#8CORA+SOTA
43.1
AP 0.5· 2023-03-23
CORA: Adapting CLIP for Open-Vocabulary Detection with Region Prompting and Anchor Pre-Matching Code
#9BARONSOTA
42.7
AP 0.5· 2023-02-27
Aligning Bag of Regions for Open-Vocabulary Object Detection Code
#10SIA-OVD (RN50x4)
41.9
AP 0.5· 2024-10-08
SIA-OVD: Shape-Invariant Adapter for Bridging the Image-Region Gap in Open-Vocabulary Detection Code
#11CORA
41.7
AP 0.5· 2023-03-23
CORA: Adapting CLIP for Open-Vocabulary Detection with Region Prompting and Anchor Pre-Matching Code
#12RALF
41.3
AP 0.5· 2024-04-08
Retrieval-Augmented Open-Vocabulary Object Detection Code
#13LP-OVOD
40.5
AP 0.5· 2023-10-26
LP-OVOD: Open-Vocabulary Object Detection by Linear Probing Code
#14Region-CLIP (RN50x4-C4)SOTA
39.3
AP 0.5· Extra Data· 2021-12-16
RegionCLIP: Region-based Language-Image Pretraining Code
#15OV-DQUO(R50)
39.2
AP 0.5· 2024-05-28
OV-DQUO: Open-Vocabulary DETR with Denoising Text Query Training and Open-World Unknown Objects Supervision Code
#16Object-Centric-OVD
36.9
AP 0.5· 2022-07-07
Bridging the Gap between Object and Image-level Representations for Open-Vocabulary Detection Code
#17CLIM (RN50)
36.9
AP 0.5· 2023-12-18
CLIM: Contrastive Language-Image Mosaic for Region Representation Code
#18OADP (G-OVD)
35.6
AP 0.5· 2023-03-10
Object-Aware Distillation Pyramid for Open-Vocabulary Object Detection Code
#19SIA-OVD (RN50)
35.5
AP 0.5· 2024-10-08
SIA-OVD: Shape-Invariant Adapter for Bridging the Image-Region Gap in Open-Vocabulary Detection Code
#20VL-PLM (RN50)
34.4
AP 0.5· 2022-07-18
Exploiting Unlabeled Data with Vision and Language Models for Object Detection Code
#21CFM-ViT
34.1
AP 0.5· 2023-09-02
Contrastive Feature Masking Open-Vocabulary Vision Transformer
#22MEDet (RN50)
32.6
AP 0.5· 2022-06-22
Open Vocabulary Object Detection with Proposal Mining and Prediction Equalization Code
#23Region-CLIP (RN50-C4)
31.4
AP 0.5· Extra Data· 2021-12-16
RegionCLIP: Region-based Language-Image Pretraining Code
#24OVAD-Baseline
30
AP 0.5· 2022-11-23
Open-vocabulary Attribute Detection Code
#25OADP
30
AP 0.5· 2023-03-10
Object-Aware Distillation Pyramid for Open-Vocabulary Object Detection Code
#26OV-DERT
29.4
AP 0.5· 2022-03-22
Open-Vocabulary DETR with Conditional Matching Code
#27LocOv (RN50-C4)
28.6
AP 0.5· 2022-05-12
Localized Vision-Language Matching for Open-vocabulary Object Detection Code
#28Detic
27.8
AP 0.5· 2022-01-07
Detecting Twenty-thousand Classes using Image-level Supervision Code
#29ViLDSOTA
27.6
AP 0.5· Extra Data· 2021-04-28
Open-vocabulary Object Detection via Vision and Language Knowledge Distillation Code
#30OVR-CNNSOTA
22.8
AP 0.5· 2020-11-20
Open-Vocabulary Object Detection Using Captions Code
#31HierKD
20.3
AP 0.5· 2022-03-20
Open-Vocabulary One-Stage Detection with Hierarchical Visual-Language Knowledge Distillation Code
#32Yolov8
0.5
AP 0.5· 2024-02-14
YOLOv8-AM: YOLOv8 Based on Effective Attention Mechanisms for Pediatric Wrist Fracture Detection Code