Object Detection on COCO-O

Metric: Average mAP (higher is better)

LeaderboardDataset

Loading chart...

Results

Hide extra data

Sort:

#	Model↕	Average mAP▼	Extra Data	Paper	Date↕	Code
1	EVA	57.8	No	EVA: Exploring the Limits of Masked Visual Repre...	2022-11-14	Code
2	DETA (Swin-L)	48.5	No	NMS Strikes Back	2022-12-12	Code
3	GLIP-L (Swin-L)	48	No	Grounded Language-Image Pre-training	2021-12-07	Code
4	GRiT (ViT-H)	42.9	No	GRiT: A Generative Region-to-text Transformer fo...	2022-12-01	Code
5	DINO (Swin-L)	42.1	No	DINO: DETR with Improved DeNoising Anchor Boxes ...	2022-03-07	Code
6	CBNetV2 (Swin-L)	39	No	CBNet: A Composite Backbone Network Architecture...	2021-07-01	Code
7	ConvNeXt-XL (Cascade Mask R-CNN)	37.5	No	A ConvNet for the 2020s	2022-01-10	Code
8	InternImage-L (Cascade Mask R-CNN)	37	No	InternImage: Exploring Large-Scale Vision Founda...	2022-11-10	Code
9	DyHead (Swin-L)	35.3	No	Dynamic Head: Unifying Object Detection Heads wi...	2021-06-15	Code
10	ViTDet (ViT-H)	34.3	No	Exploring Plain Vision Transformer Backbones for...	2022-03-30	Code
11	ViT-Adapter (BEiTv2-L)	34.25	No	Vision Transformer Adapter for Dense Predictions	2022-05-17	Code
12	FIBER-B (Swin-B)	33.7	No	Coarse-to-Fine Vision-Language Pre-training with...	2022-06-15	Code
13	QueryInst (Swin-L)	33.2	No	Instances as Queries	2021-05-05	Code
14	YOLOv6-L6	32.5	No	YOLOv6: A Single-Stage Object Detection Framewor...	2022-09-07	Code
15	YOLOv7-E6E	32	No	YOLOv7: Trainable bag-of-freebies sets new state...	2022-07-06	Code
16	MViTV2-H (Cascade Mask R-CNN)	30.9	No	MViTv2: Improved Multiscale Vision Transformers ...	2021-12-02	Code
17	Det-AdvProp (EfficientNet-B5)	30.8	No	Robust and Accurate Object Detection via Adversa...	2021-03-23	Code
18	YOLOv4-P6	30.4	No	YOLOv4: Optimal Speed and Accuracy of Object Det...	2020-04-23	Code
19	YOLOX-X	30.3	No	YOLOX: Exceeding YOLO Series in 2021	2021-07-18	Code
20	CenterNet2 (R2-101-DCN)	29.5	No	Probabilistic two-stage detection	2021-03-12	Code
21	GLIP-T (Swin-T)	29.1	No	Grounded Language-Image Pre-training	2021-12-07	Code
22	EfficientDet-D5 (EfficientNet-B5)	28.5	No	EfficientDet: Scalable and Efficient Object Dete...	2019-11-20	Code
23	PVTv2-B5 (Mask R-CNN)	28.2	No	PVT v2: Improved Baselines with Pyramid Vision T...	2021-06-25	Code
24	VFNet (RX-101-64x4d)	28	No	VarifocalNet: An IoU-aware Dense Object Detector	2020-08-31	Code
25	GCNet (RX-101-32x4d-DCN)	26	No	GCNet: Non-local Networks Meet Squeeze-Excitatio...	2019-04-25	Code
26	GFLv2 (R2-101-DCN)	25.1	No	Generalized Focal Loss V2: Learning Reliable Loc...	2020-11-25	Code
27	RepPointsV2 (RX-101-64x4d-DCN)	24.9	No	RepPoints V2: Verification Meets Regression for ...	2020-07-16	Code
28	UniverseNet (R2-101-DCN)	24.8	No	USB: Universal-Scale Object Detection Benchmark	2021-03-25	Code
29	YOLOX-S	20.6	No	YOLOX: Exceeding YOLO Series in 2021	2021-07-18	Code
30	YOLOS-B (ViT-B)	20	No	You Only Look at One Sequence: Rethinking Transf...	2021-06-01	Code
31	DyHead (ResNet-50)	19.3	No	Dynamic Head: Unifying Object Detection Heads wi...	2021-06-15	Code
32	HTC (ResNet-50)	19.1	No	Hybrid Task Cascade for Instance Segmentation	2019-01-22	Code
33	Deformable-DETR (ResNet-50)	18.5	No	Deformable DETR: Deformable Transformers for End...	2020-10-08	Code
34	Cascade R-CNN (ResNet-50)	18.2	No	Cascade R-CNN: High Quality Object Detection and...	2019-06-24	Code
35	Mask R-CNN (ResNet-50)	17.1	No	Mask R-CNN	2017-03-20	Code
36	DETR (ResNet-50)	17.1	No	End-to-End Object Detection with Transformers	2020-05-26	Code
37	ATSS (ResNet-50)	16.8	No	Bridging the Gap Between Anchor-based and Anchor...	2019-12-05	Code
38	FCOS (ResNet-50)	16.7	No	FCOS: Fully Convolutional One-Stage Object Detec...	2019-04-02	Code
39	RetinaNet (ResNet-50)	16.6	No	Focal Loss for Dense Object Detection	2017-08-07	Code
40	Faster R-CNN (ResNet-50-FPN)	16.4	Yes	Faster R-CNN: Towards Real-Time Object Detection...	2015-06-04	Code
41	YOLOv3 (DarkNet-53)	14.8	No	YOLOv3: An Incremental Improvement	2018-04-08	Code
42	SSD (VGG-16)	13.6	No	SSD: Single Shot MultiBox Detector	2015-12-08	Code

#1EVASOTA
57.8
Average mAP· 2022-11-14
EVA: Exploring the Limits of Masked Visual Representation Learning at Scale Code
#2DETA (Swin-L)
48.5
Average mAP· 2022-12-12
NMS Strikes Back Code
#3GLIP-L (Swin-L)SOTA
48
Average mAP· 2021-12-07
Grounded Language-Image Pre-training Code
#4GRiT (ViT-H)
42.9
Average mAP· 2022-12-01
GRiT: A Generative Region-to-text Transformer for Object Understanding Code
#5DINO (Swin-L)
42.1
Average mAP· 2022-03-07
DINO: DETR with Improved DeNoising Anchor Boxes for End-to-End Object Detection Code
#6CBNetV2 (Swin-L)SOTA
39
Average mAP· 2021-07-01
CBNet: A Composite Backbone Network Architecture for Object Detection Code
#7ConvNeXt-XL (Cascade Mask R-CNN)
37.5
Average mAP· 2022-01-10
A ConvNet for the 2020s Code
#8InternImage-L (Cascade Mask R-CNN)
37
Average mAP· 2022-11-10
InternImage: Exploring Large-Scale Vision Foundation Models with Deformable Convolutions Code
#9DyHead (Swin-L)SOTA
35.3
Average mAP· 2021-06-15
Dynamic Head: Unifying Object Detection Heads with Attentions Code
#10ViTDet (ViT-H)
34.3
Average mAP· 2022-03-30
Exploring Plain Vision Transformer Backbones for Object Detection Code
#11ViT-Adapter (BEiTv2-L)
34.25
Average mAP· 2022-05-17
Vision Transformer Adapter for Dense Predictions Code
#12FIBER-B (Swin-B)
33.7
Average mAP· 2022-06-15
Coarse-to-Fine Vision-Language Pre-training with Fusion in the Backbone Code
#13QueryInst (Swin-L)SOTA
33.2
Average mAP· 2021-05-05
Instances as Queries Code
#14YOLOv6-L6
32.5
Average mAP· 2022-09-07
YOLOv6: A Single-Stage Object Detection Framework for Industrial Applications Code
#15YOLOv7-E6E
32
Average mAP· 2022-07-06
YOLOv7: Trainable bag-of-freebies sets new state-of-the-art for real-time object detectors Code
#16MViTV2-H (Cascade Mask R-CNN)
30.9
Average mAP· 2021-12-02
MViTv2: Improved Multiscale Vision Transformers for Classification and Detection Code
#17Det-AdvProp (EfficientNet-B5)SOTA
30.8
Average mAP· 2021-03-23
Robust and Accurate Object Detection via Adversarial Learning Code
#18YOLOv4-P6SOTA
30.4
Average mAP· 2020-04-23
YOLOv4: Optimal Speed and Accuracy of Object Detection Code
#19YOLOX-X
30.3
Average mAP· 2021-07-18
YOLOX: Exceeding YOLO Series in 2021 Code
#20CenterNet2 (R2-101-DCN)
29.5
Average mAP· 2021-03-12
Probabilistic two-stage detection Code
#21GLIP-T (Swin-T)
29.1
Average mAP· 2021-12-07
Grounded Language-Image Pre-training Code
#22EfficientDet-D5 (EfficientNet-B5)SOTA
28.5
Average mAP· 2019-11-20
EfficientDet: Scalable and Efficient Object Detection Code
#23PVTv2-B5 (Mask R-CNN)
28.2
Average mAP· 2021-06-25
PVT v2: Improved Baselines with Pyramid Vision Transformer Code
#24VFNet (RX-101-64x4d)
28
Average mAP· 2020-08-31
VarifocalNet: An IoU-aware Dense Object Detector Code
#25GCNet (RX-101-32x4d-DCN)SOTA
26
Average mAP· 2019-04-25
GCNet: Non-local Networks Meet Squeeze-Excitation Networks and Beyond Code
#26GFLv2 (R2-101-DCN)
25.1
Average mAP· 2020-11-25
Generalized Focal Loss V2: Learning Reliable Localization Quality Estimation for Dense Object Detection Code
#27RepPointsV2 (RX-101-64x4d-DCN)
24.9
Average mAP· 2020-07-16
RepPoints V2: Verification Meets Regression for Object Detection Code
#28UniverseNet (R2-101-DCN)
24.8
Average mAP· 2021-03-25
USB: Universal-Scale Object Detection Benchmark Code
#29YOLOX-S
20.6
Average mAP· 2021-07-18
YOLOX: Exceeding YOLO Series in 2021 Code
#30YOLOS-B (ViT-B)
20
Average mAP· 2021-06-01
You Only Look at One Sequence: Rethinking Transformer in Vision through Object Detection Code
#31DyHead (ResNet-50)
19.3
Average mAP· 2021-06-15
Dynamic Head: Unifying Object Detection Heads with Attentions Code
#32HTC (ResNet-50)SOTA
19.1
Average mAP· 2019-01-22
Hybrid Task Cascade for Instance Segmentation Code
#33Deformable-DETR (ResNet-50)
18.5
Average mAP· 2020-10-08
Deformable DETR: Deformable Transformers for End-to-End Object Detection Code
#34Cascade R-CNN (ResNet-50)
18.2
Average mAP· 2019-06-24
Cascade R-CNN: High Quality Object Detection and Instance Segmentation Code
#35Mask R-CNN (ResNet-50)SOTA
17.1
Average mAP· 2017-03-20
Mask R-CNN Code
#36DETR (ResNet-50)
17.1
Average mAP· 2020-05-26
End-to-End Object Detection with Transformers Code
#37ATSS (ResNet-50)
16.8
Average mAP· 2019-12-05
Bridging the Gap Between Anchor-based and Anchor-free Detection via Adaptive Training Sample Selection Code
#38FCOS (ResNet-50)
16.7
Average mAP· 2019-04-02
FCOS: Fully Convolutional One-Stage Object Detection Code
#39RetinaNet (ResNet-50)
16.6
Average mAP· 2017-08-07
Focal Loss for Dense Object Detection Code
#40Faster R-CNN (ResNet-50-FPN)SOTA
16.4
Average mAP· Extra Data· 2015-06-04
Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks Code
#41YOLOv3 (DarkNet-53)
14.8
Average mAP· 2018-04-08
YOLOv3: An Incremental Improvement Code
#42SSD (VGG-16)
13.6
Average mAP· 2015-12-08
SSD: Single Shot MultiBox Detector Code