2D Object Detection on COCO test-dev

Metric: box mAP (higher is better)

LeaderboardDataset

Loading chart...

Results

Hide augmentations

Sort:

#	Model↕	box mAP▼	Augmentations	Paper	Date↕	Code
1	Co-DETR	66	No	DETRs with Collaborative Hybrid Assignments Trai...	2022-11-22	Code
2	InternImage-H (M3I Pre-training)	65.5	No	InternImage: Exploring Large-Scale Vision Founda...	2022-11-10	Code
3	M3I Pre-training (InternImage-H)	65.4	No	Towards All-in-one Pre-training via Maximizing M...	2022-11-17	Code
4	MoCaE	65.1	No	MoCaE: Mixture of Calibrated Experts Significant...	2023-09-26	Code
5	Focal-Stable-DINO (Focal-Huge, no TTA)	64.8	No	A Strong and Reproducible Object Detector with O...	2023-04-25	Code
6	Co-DETR (Swin-L)	64.8	No	DETRs with Collaborative Hybrid Assignments Trai...	2022-11-22	Code
7	EVA	64.7	No	EVA: Exploring the Limits of Masked Visual Repre...	2022-11-14	Code
8	Group DETR v2	64.5	No	Group DETR v2: Strong Object Detector with Encod...	2022-11-07	-
9	FocalNet-H (DINO)	64.4	No	Focal Modulation Networks	2022-03-22	Code
10	InternImage-XL	64.3	No	InternImage: Exploring Large-Scale Vision Founda...	2022-11-10	Code
11	FD-SwinV2-G	64.2	No	Contrastive Learning Rivals Masked Image Modelin...	2022-05-27	Code
12	Plain-DETR (Swin-L)	63.9	No	-	-	Code
13	RevCol-H(DINO)	63.8	No	Reversible Column Networks	2022-12-22	Code
14	BEiT-3	63.7	No	Image as a Foreign Language: BEiT Pretraining fo...	2022-08-22	Code
15	Relation-DETR (Focal-L)	63.5	No	Relation DETR: Exploring Explicit Position Relat...	2024-07-16	Code
16	DETA (Swin-L)	63.5	No	NMS Strikes Back	2022-12-12	Code
17	DINO (Swin-L,multi-scale, TTA)	63.3	No	DINO: DETR with Improved DeNoising Anchor Boxes ...	2022-03-07	Code
18	SwinV2-G (HTC++)	63.1	No	Swin Transformer V2: Scaling Up Capacity and Res...	2021-11-18	Code
19	Grounding DINO	63	No	Grounding DINO: Marrying DINO with Grounded Pre-...	2023-03-09	Code
20	Florence-CoSwin-H	62.4	No	Florence: A New Foundation Model for Computer Vi...	2021-11-22	Code
21	GLIPv2 (CoSwin-H, multi-scale)	62.4	No	GLIPv2: Unifying Localization and Vision-Languag...	2022-06-12	Code
22	GLEE-Pro	62.3	No	General Object Foundation Model for Images and V...	2023-12-14	Code
23	GLIP (Swin-L, multi-scale)	61.5	No	Grounded Language-Image Pre-training	2021-12-07	Code
24	Soft Teacher + Swin-L (HTC++, multi-scale)	61.3	No	End-to-End Semi-Supervised Object Detection with...	2021-06-16	Code
25	ViT-Adapter-L (HTC++, BEiTv2 pretrain, multi-scale)	60.9	No	Vision Transformer Adapter for Dense Predictions	2022-05-17	Code
26	DyHead (Swin-L, multi scale, self-training)	60.6	No	Dynamic Head: Unifying Object Detection Heads wi...	2021-06-15	Code
27	GLEE-Plus	60.6	No	General Object Foundation Model for Images and V...	2023-12-14	Code
28	ViT-Adapter-L (HTC++, BEiT pretrain, multi-scale)	60.4	No	Vision Transformer Adapter for Dense Predictions	2022-05-17	Code
29	GRiT (ViT-H, single-scale testing)	60.4	No	GRiT: A Generative Region-to-text Transformer fo...	2022-12-01	Code
30	CBNetV2 (Dual-Swin-L HTC, multi-scale)	60.1	No	CBNet: A Composite Backbone Network Architecture...	2021-07-01	Code
31	PIIP-H6B (DINO)	60	No	Parameter-Inverted Image Pyramid Networks	2024-06-06	Code
32	CBNetV2 (Dual-Swin-L HTC, single-scale)	59.4	No	CBNet: A Composite Backbone Network Architecture...	2021-07-01	Code
33	Focal-L (DyHead, multi-scale)	58.9	No	Focal Self-attention for Local-Global Interactio...	2021-07-01	Code
34	DyHead (Swin-L, multi scale)	58.7	No	Dynamic Head: Unifying Object Detection Heads wi...	2021-06-15	Code
35	Swin-L (HTC++, multi scale)	58.7	No	Swin Transformer: Hierarchical Vision Transforme...	2021-03-25	Code
36	Swin-L (HTC++, single scale)	57.7	No	Swin Transformer: Hierarchical Vision Transforme...	2021-03-25	Code
37	Cascade Eff-B7 NAS-FPN (1280, self-training Copy Paste, single-scale)	57.3	No	Simple Copy-Paste is a Strong Data Augmentation ...	2020-12-13	Code
38	PyCenterNet (Swin-L, multi-scale)	57.1	No	CenterNet++ for Object Detection	2022-04-18	Code
39	dBOT ViT-L (CLIP)	56.8	No	Exploring Target Representations for Masked Auto...	2022-09-08	Code
40	YOLOv7-D6 (44 fps)	56.6	Yes	YOLOv7: Trainable bag-of-freebies sets new state...	2022-07-06	Code
41	SOLQ (Swin-L, single scale)	56.5	No	SOLQ: Segmenting Objects by Learning Queries	2021-06-04	Code
42	CenterNet2 (Res2Net-101-DCN-BiFPN, self-training, 1560 single-scale)	56.4	No	Probabilistic two-stage detection	2021-03-12	Code
43	ISTR (ResNet50-FPN-3x, single-scale)	56.4	No	ISTR: End-to-End Instance Segmentation with Tran...	2021-05-03	Code
44	QueryInst (single-scale)	56.1	No	Instances as Queries	2021-05-05	Code
45	dBOT ViT-L	56.1	No	Exploring Target Representations for Masked Auto...	2022-09-08	Code
46	YOLOv7-E6 (56 fps)	56	No	YOLOv7: Trainable bag-of-freebies sets new state...	2022-07-06	Code
47	YOLOv4-P7 with TTA	55.8	No	Scaled-YOLOv4: Scaling Cross Stage Partial Network	2020-11-16	Code
48	DetectoRS (ResNeXt-101-64x4d, multi-scale)	55.7	No	DetectoRS: Detecting Objects with Recursive Feat...	2020-06-03	Code
49	YOLOR-D6 (1280, single-scale, 30 fps)	55.4	No	You Only Learn One Representation: Unified Netwo...	2021-05-10	Code
50	YOLOv4-P6 with TTA	54.9	No	Scaled-YOLOv4: Scaling Cross Stage Partial Network	2020-11-16	Code
51	YOLOv7-W6 (84 fps)	54.9	No	YOLOv7: Trainable bag-of-freebies sets new state...	2022-07-06	Code
52	Cascade Eff-B7 NAS-FPN (1280)	54.8	No	Simple Copy-Paste is a Strong Data Augmentation ...	2020-12-13	Code
53	DetectoRS (ResNeXt-101-32x4d, multi-scale)	54.7	No	DetectoRS: Detecting Objects with Recursive Feat...	2020-06-03	Code
54	GLEE-Lite	54.7	No	General Object Foundation Model for Images and V...	2023-12-14	Code
55	YOLOv4-P6 CSP-P6 (single-scale, 32 fps)	54.3	No	Scaled-YOLOv4: Scaling Cross Stage Partial Network	2020-11-16	Code
56	SpineNet-190 (1280, with Self-training on OpenImages, single-scale)	54.3	No	Rethinking Pre-training and Self-training	2020-06-11	Code
57	UniverseNet-20.08d (Res2Net-101, DCN, multi-scale)	54.1	No	USB: Universal-Scale Object Detection Benchmark	2021-03-25	Code
58	DyHead (ResNeXt-64x4d-101-DCN, multi scale)	54	No	Dynamic Head: Unifying Object Detection Heads wi...	2021-06-15	Code
59	dBOT ViT-B (CLIP)	53.6	No	Exploring Target Representations for Masked Auto...	2022-09-08	Code
60	PAA (ResNext-152-32x8d + DCN, multi-scale)	53.5	No	Probabilistic Anchor Assignment with IoU Predict...	2020-07-16	Code
61	LSNet (Res2Net-101+ DCN, multi-scale)	53.5	No	Location-Sensitive Visual Recognition with Cross...	2021-04-11	Code
62	dBOT ViT-B	53.5	No	Exploring Target Representations for Masked Auto...	2022-09-08	Code
63	ResNeSt-200 (multi-scale)	53.3	No	ResNeSt: Split-Attention Networks	2020-04-19	Code
64	Cascade Mask R-CNN (Triple-ResNeXt152, multi-scale)	53.3	No	CBNet: A Novel Composite Backbone Network Archit...	2019-09-09	Code
65	DetectoRS (ResNeXt-101-32x4d, single-scale)	53.3	No	DetectoRS: Detecting Objects with Recursive Feat...	2020-06-03	Code
66	GFLV2 (Res2Net-101, DCN, multiscale)	53.3	No	Generalized Focal Loss V2: Learning Reliable Loc...	2020-11-25	Code
67	YOLOv7-X (114 fps)	53.1	Yes	YOLOv7: Trainable bag-of-freebies sets new state...	2022-07-06	Code
68	RelationNet++ (ResNeXt-64x4d-101-DCN)	52.7	No	RelationNet++: Bridging Visual Representations f...	2020-10-29	Code
69	EfficientDet-D7 (1536)	52.6	Yes	EfficientDet: Scalable and Efficient Object Dete...	2019-11-20	Code
70	YOLOv4-P5 with TTA	52.5	No	Scaled-YOLOv4: Scaling Cross Stage Partial Network	2020-11-16	Code
71	Deformable DETR (ResNeXt-101+DCN)	52.3	No	Deformable DETR: Deformable Transformers for End...	2020-10-08	Code
72	GCNet (ResNeXt-101 + DCN + cascade + GC r4)	52.3	No	Global Context Networks	2020-12-24	Code
73	PP-YOLOE-x(CSPRepResNet-x, 640x640, single-scale )	52.2	No	PP-YOLOE: An evolved version of YOLO	2022-03-30	Code
74	RetinaNet (SpineNet-190, 1280x1280)	52.1	No	SpineNet: Learning Scale-Permuted Backbone for R...	2019-12-10	Code
75	RepPoints v2 (ResNeXt-101, DCN, multi-scale)	52.1	No	RepPoints V2: Verification Meets Regression for ...	2020-07-16	Code
76	AC-FPN Cascade R-CNN (X-152-32x8d-FPN-IN5k, multi scale, only CEM)	51.9	No	Attention-guided Context Feature Pyramid Network...	2020-05-23	Code
77	OTA (ResNeXt-101+DCN, multiscale)	51.5	No	OTA: Optimal Transport Assignment for Object Det...	2021-03-26	Code
78	YOLOX-x(Modified CSP v5, 640x640, single-scale)	51.5	Yes	YOLOX: Exceeding YOLO Series in 2021	2021-07-18	Code
79	PP-YOLOE-l(CSPRepResNet-l, 640x640, single-scale )	51.4	No	PP-YOLOE: An evolved version of YOLO	2022-03-30	Code
80	YOLOv7 (161 fps)	51.4	Yes	YOLOv7: Trainable bag-of-freebies sets new state...	2022-07-06	Code
81	UniverseNet-20.08d (Res2Net-101, DCN, single-scale)	51.3	No	USB: Universal-Scale Object Detection Benchmark	2021-03-25	Code
82	TSD(SENet154-DCN,multi-scale)	51.2	No	Revisiting the Sibling Head in Object Detector	2020-03-17	Code
83	YOLOX-X (Modified CSP v5)	51.2	No	YOLOX: Exceeding YOLO Series in 2021	2021-07-18	Code
84	iBOT (ViT-B/16)	51.2	No	iBOT: Image BERT Pre-Training with Online Tokeni...	2021-11-15	Code
85	RetinaNet (SpineNet-143, 1280x1280)	50.7	No	SpineNet: Learning Scale-Permuted Backbone for R...	2019-12-10	Code
86	ATSS (ResNetXt-64x4d-101+DCN,multi-scale)	50.7	No	Bridging the Gap Between Anchor-based and Anchor...	2019-12-05	Code
87	NAS-FPN (AmoebaNet-D, learned aug)	50.7	No	Learning Data Augmentation Strategies for Object...	2019-06-26	Code
88	Boosting R-CNN*	50.7	No	Boosting R-CNN: Reweighting R-CNN Samples by RPN...	2022-06-28	Code
89	GFLV2 (Res2Net-101, DCN)	50.6	No	Generalized Focal Loss V2: Learning Reliable Loc...	2020-11-25	Code
90	aLRP Loss (ResNext-101-64x4d, DCN, multiscale test)	50.2	No	A Ranking-based, Balanced Loss Function Unifying...	2020-09-28	Code
91	FreeAnchor + SEPC (DCN, ResNext-101-64x4d)	50.1	No	Scale-Equalizing Pyramid Convolution for Object ...	2020-05-06	Code
92	D2Det (ResNet-101-DCN, multi-scale test)	50.1	No	-	-	Code
93	Dynamic R-CNN (ResNet-101-DCN, multi-scale)	50.1	No	Dynamic R-CNN: Towards High Quality Object Detec...	2020-04-13	Code
94	TSD(ResNet-101-Deformable, Image Pyramid)	49.4	No	Revisiting the Sibling Head in Object Detector	2020-03-17	Code
95	RepPoints v2 (ResNeXt-101, DCN)	49.4	No	RepPoints V2: Verification Meets Regression for ...	2020-07-16	Code
96	A2MIM (ViT-B)	49.4	No	Architecture-Agnostic Masked Image Modeling -- F...	2022-05-27	Code
97	iBOT (ViT-S/16)	49.4	No	iBOT: Image BERT Pre-Training with Online Tokeni...	2021-11-15	Code
98	CPNDet (Hourglass-104, multi-scale)	49.2	No	Corner Proposal Network for Anchor-free, Two-sta...	2020-07-27	Code
99	GFLV2 (ResNeXt-101, 32x4d, DCN)	49	No	Generalized Focal Loss V2: Learning Reliable Loc...	2020-11-25	Code
100	aLRP Loss (ResNext-101-64x4d, DCN, single scale)	48.9	No	A Ranking-based, Balanced Loss Function Unifying...	2020-09-28	Code
101	PP-YOLOE-m(CSPRepResNet-m, 640x640, single-scale )	48.9	No	PP-YOLOE: An evolved version of YOLO	2022-03-30	Code
102	UniverseNet-20.08 (Res2Net-50, DCN, single-scale)	48.8	No	USB: Universal-Scale Object Detection Benchmark	2021-03-25	Code
103	SOLQ (ResNet101, single scale)	48.7	No	SOLQ: Segmenting Objects by Learning Queries	2021-06-04	Code
104	RetinaNet (SpineNet-96, 1024x1024)	48.6	No	SpineNet: Learning Scale-Permuted Backbone for R...	2019-12-10	Code
105	TridentNet (ResNet-101-Deformable, Image Pyramid)	48.4	No	Scale-Aware Trident Networks for Object Detection	2019-01-07	Code
106	GCNet (ResNeXt-101 + DCN + cascade + GC r4)	48.4	No	GCNet: Non-local Networks Meet Squeeze-Excitatio...	2019-04-25	Code
107	GFLV2 (ResNet-101-DCN)	48.3	No	Generalized Focal Loss V2: Learning Reliable Loc...	2020-11-25	Code
108	Swin-S (RPE w/ GAB)	48.23	No	Understanding Gaussian Attention Bias of Vision ...	2023-05-08	Code
109	GFL (X-101-32x4d-DCN, single-scale)	48.2	No	Generalized Focal Loss: Learning Qualified and D...	2020-06-08	Code
110	ISTR (ResNet101-FPN-3x, single-scale)	48.1	No	ISTR: End-to-End Instance Segmentation with Tran...	2021-05-03	Code
111	YOLOX-Darknet53(Darknet53, 640x640, single-scale)	48	Yes	YOLOX: Exceeding YOLO Series in 2021	2021-07-18	Code
112	DAT-S (RetinaNet)	47.9	No	Vision Transformer with Deformable Attention	2022-01-03	Code
113	aLRP Loss (ResNext-101-64x4d, single scale)	47.8	No	A Ranking-based, Balanced Loss Function Unifying...	2020-09-28	Code
114	MatrixNet Corners (ResNet-152, multi-scale)	47.8	No	Matrix Nets: A New Deep Architecture for Object ...	2019-08-13	Code
115	SOLQ (ResNet50, single scale)	47.8	No	SOLQ: Segmenting Objects by Learning Queries	2021-06-04	Code
116	DyHead (ResNeXt-64x4d-101)	47.7	No	Dynamic Head: Unifying Object Detection Heads wi...	2021-06-15	Code
117	SAPD (ResNeXt-101, single-scale)	47.4	No	Soft Anchor-Point Object Detection	2019-11-27	Code
118	PANet (ResNeXt-101, multi-scale)	47.4	No	Path Aggregation Network for Instance Segmentation	2018-03-05	Code
119	HTC (HRNetV2p-W48)	47.3	No	Deep High-Resolution Representation Learning for...	2019-08-20	Code
120	HTC (ResNeXt-101-FPN)	47.1	No	Hybrid Task Cascade for Instance Segmentation	2019-01-22	Code
121	CenterNet511 (Hourglass-104, multi-scale)	47	No	CenterNet: Keypoint Triplets for Object Detection	2019-04-17	Code
122	MAL (ResNeXt101, multi-scale)	47	No	Multiple Anchor Learning for Visual Object Detec...	2019-12-04	Code
123	ISTR (ResNet50-FPN-3x)	46.8	No	ISTR: End-to-End Instance Segmentation with Tran...	2021-05-03	Code
124	RetinaNet (SpineNet-49, 896x896)	46.7	No	SpineNet: Learning Scale-Permuted Backbone for R...	2019-12-10	Code
125	RPDet (ResNet-101-DCN, multi-scale)	46.5	No	RepPoints: Point Set Representation for Object D...	2019-04-25	Code
126	HoughNet (MS)	46.4	No	HoughNet: Integrating near and long-range eviden...	2020-07-05	Code
127	PPDet (ResNeXt-101-FPN, multiscale)	46.3	No	Reducing Label Noise in Anchor-Free Object Detec...	2020-08-03	Code
128	GFLV2 (ResNet-101)	46.2	No	Generalized Focal Loss V2: Learning Reliable Loc...	2020-11-25	Code
129	SNIPER (ResNet-101)	46.1	No	SNIPER: Efficient Multi-Scale Training	2018-05-23	Code
130	Mask R-CNN (HRNetV2p-W48 + cascade)	46.1	No	Deep High-Resolution Representation Learning for...	2019-08-20	Code
131	ResNeXt-64x4d-101 NAS-FCOS @128-256 w/improvements	46.1	No	NAS-FCOS: Fast Neural Architecture Search for Ob...	2019-06-11	Code
132	DCNv2 (ResNet-101, multi-scale)	46	No	Deformable ConvNets v2: More Deformable, Better ...	2018-11-27	Code
133	Gaussian-FCOS	46	No	Localization Uncertainty Estimation for Anchor-F...	2020-06-28	-
134	Cascade R-CNN-FPN (ResNet-101, map-guided)	45.9	No	InstaBoost: Boosting Instance Segmentation via P...	2019-08-21	Code
135	MAL (ResNeXt101, single-scale)	45.9	No	Multiple Anchor Learning for Visual Object Detec...	2019-12-04	Code
136	CenterMask+VoVNetV2-99 (single-scale)	45.8	No	CenterMask : Real-Time Anchor-Free Instance Segm...	2019-11-15	Code
137	D-RFCN + SNIP (DPN-98 with flip, multi-scale)	45.7	No	An Analysis of Scale Invariance in Object Detect...	2017-11-22	-
138	YOLOv4 (CD53)	45.5	Yes	Scaled-YOLOv4: Scaling Cross Stage Partial Network	2020-11-16	Code
139	AC-FPN Cascade R-CNN(ResNet-101, single scale)	45	No	Attention-guided Context Feature Pyramid Network...	2020-05-23	Code
140	FreeAnchor (ResNeXt-101)	44.8	No	FreeAnchor: Learning to Match Anchors for Visual...	2019-09-05	Code
141	FCOS (ResNeXt-64x4d-101-FPN 4 + improvements)	44.7	No	FCOS: Fully Convolutional One-Stage Object Detec...	2019-04-02	Code
142	CenterMask+VoVNet2-57 (single-scale)	44.7	No	CenterMask : Real-Time Anchor-Free Instance Segm...	2019-11-15	Code
143	FSAF (ResNeXt-101, multi-scale)	44.6	No	Feature Selective Anchor-Free Module for Single-...	2019-03-02	Code
144	aLRP Loss (ResNext-101, DCN, 500 scale)	44.6	No	A Ranking-based, Balanced Loss Function Unifying...	2020-09-28	Code
145	CenterMask + X-101-32x8d (single-scale)	44.6	No	CenterMask : Real-Time Anchor-Free Instance Segm...	2019-11-15	Code
146	RetinaNet (SpineNet-49, 640x640)	44.3	No	SpineNet: Learning Scale-Permuted Backbone for R...	2019-12-10	Code
147	YOLOF-DC5	44.3	No	You Only Look One-level Feature	2021-03-17	Code
148	GFLV2 (ResNet-50)	44.3	No	Generalized Focal Loss V2: Learning Reliable Loc...	2020-11-25	Code
149	InterNet (ResNet-101-FPN, multi-scale)	44.2	No	Feature Intertwiner for Object Detection	2019-03-28	Code
150	M2Det (VGG-16, multi-scale)	44.2	No	M2Det: A Single-Shot Object Detector based on Mu...	2018-11-12	Code
151	Faster R-CNN (LIP-ResNet-101-MD w FPN)	43.9	No	LIP: Local Importance-based Pooling	2019-08-12	Code
152	M2Det (ResNet-101, multi-scale)	43.9	No	M2Det: A Single-Shot Object Detector based on Mu...	2018-11-12	Code
153	YOLOv3 @800 + ASFF* (Darknet-53)	43.9	Yes	Learning Spatial Fusion for Single-Shot Object D...	2019-11-21	Code
154	FoveaBox (ResNeXt-101)	43.9	No	FoveaBox: Beyond Anchor-based Object Detector	2019-04-08	Code
155	ExtremeNet (Hourglass-104, multi-scale)	43.7	No	Bottom-up Object Detection by Grouping Extreme a...	2019-01-23	Code
156	YOLOv4-608	43.5	Yes	YOLOv4: Optimal Speed and Accuracy of Object Det...	2020-04-23	Code
157	SNIPER (ResNet-50)	43.5	No	SNIPER: Efficient Multi-Scale Training	2018-05-23	Code
158	CenterNet (HRNetV2-W48)	43.5	No	Deep High-Resolution Representation Learning for...	2019-08-20	Code
159	D-RFCN + SNIP (ResNet-101, multi-scale)	43.4	No	An Analysis of Scale Invariance in Object Detect...	2017-11-22	-
160	Grid R-CNN (ResNeXt-101-FPN)	43.2	No	Grid R-CNN	2018-11-29	Code
161	FCOS (ResNeXt-101-64x4d-FPN)	43.2	No	FCOS: Fully Convolutional One-Stage Object Detec...	2019-04-02	Code
162	CornerNet-Saccade (Hourglass-104, multi-scale)	43.2	No	CornerNet-Lite: Efficient Keypoint Based Object ...	2019-04-18	Code
163	PP-YOLOE-s(CSPRepResNet-s, 640x640, single-scale )	43.1	No	PP-YOLOE: An evolved version of YOLO	2022-03-30	Code
164	Libra R-CNN (ResNeXt-101-FPN)	43	No	Libra R-CNN: Towards Balanced Learning for Objec...	2019-04-04	Code
165	DyHead (ResNet-50)	43	No	Dynamic Head: Unifying Object Detection Heads wi...	2021-06-15	Code
166	RPDet (ResNet-101-DCN)	42.8	No	RepPoints: Point Set Representation for Object D...	2019-04-25	Code
167	SpineNet-49 (640, RetinaNet, single-scale)	42.8	No	SpineNet: Learning Scale-Permuted Backbone for R...	2019-12-10	Code
168	Cascade R-CNN (ResNet-101-FPN+, cascade)	42.8	No	Cascade R-CNN: Delving into High Quality Object ...	2017-12-03	Code
169	Cascade R-CNN	42.8	No	Cascade R-CNN: High Quality Object Detection and...	2019-06-24	Code
170	TridentNet (ResNet-101)	42.7	No	Scale-Aware Trident Networks for Object Detection	2019-01-07	Code
171	FCOS (ResNeXt-32x8d-101-FPN)	42.7	No	FCOS: Fully Convolutional One-Stage Object Detec...	2019-04-02	Code
172	RetinaMask (ResNeXt-101-FPN-GN)	42.6	No	RetinaMask: Learning to predict masks improves s...	2019-01-10	Code
173	TAL + TAP	42.5	No	TOOD: Task-aligned One-stage Object Detection	2021-08-17	Code
174	Faster R-CNN (HRNetV2p-W48)	42.4	No	Deep High-Resolution Representation Learning for...	2019-08-20	Code
175	HSD (Rest101, 768x768, single-scale test)	42.3	No	-	-	Code
176	CornerNet511 (Hourglass-104, multi-scale)	42.1	No	CornerNet: Detecting Objects as Paired Keypoints	2018-08-03	Code
177	FoveaBox (ResNeXt-101)	42.1	No	FoveaBox: Beyond Anchor-based Object Detector	2019-04-08	Code
178	FCOS (HRNet-W32-5l)	42	No	FCOS: Fully Convolutional One-Stage Object Detec...	2019-04-02	Code
179	FoveaBox (ResNeXt-101)	41.9	No	FoveaBox: Beyond Anchor-based Object Detector	2019-04-08	Code
180	RefineDet512+ (ResNet-101)	41.8	No	Single-Shot Refinement Neural Network for Object...	2017-11-18	Code
181	GHM-C + GHM-R (RetinaNet-FPN-ResNeXt-101)	41.6	No	Gradient Harmonized Single-stage Detector	2018-11-13	Code
182	CenterNet-DLA (DLA-34, multi-scale)	41.6	No	Objects as Points	2019-04-16	Code
183	RetinaNet (SpineNet-49S, 640x640)	41.5	No	SpineNet: Learning Scale-Permuted Backbone for R...	2019-12-10	Code
184	RPDet (ResNet-101)	41	No	RepPoints: Point Set Representation for Object D...	2019-04-25	Code
185	M2Det (VGG-16, single-scale)	41	No	M2Det: A Single-Shot Object Detector based on Mu...	2018-11-12	Code
186	LeYOLO (Large@768)	41	No	LeYOLO, New Scalable and Efficient CNN Architect...	2024-06-20	Code
187	FSAF (ResNet-101, single-scale)	40.9	No	Feature Selective Anchor-Free Module for Single-...	2019-03-02	Code
188	RetinaNet (ResNeXt-101-FPN)	40.8	No	Focal Loss for Dense Object Detection	2017-08-07	Code
189	Cascade R-CNN (ResNet-50-FPN+, cascade)	40.6	No	Cascade R-CNN: Delving into High Quality Object ...	2017-12-03	Code
190	Faster R-CNN (Cascade RPN)	40.6	Yes	Cascade RPN: Delving into High-Quality Region Pr...	2019-09-15	Code
191	ResNet-50-DW-DPN (Deformable Kernels)	40.6	No	Deformable Kernels: Adapting Effective Receptive...	2019-10-07	Code
192	IoU-Net	40.6	No	Acquisition of Localization Confidence for Accur...	2018-07-30	Code
193	FCOS (HRNetV2p-W48)	40.5	Yes	Deep High-Resolution Representation Learning for...	2019-08-20	Code
194	ResNet-50-FPN Mask R-CNN + KL Loss + var voting + soft-NMS	40.4	No	Bounding Box Regression with Uncertainty for Acc...	2018-09-23	Code
195	RDSNet (ResNet-101, RetinaNet, mask, MBRM)	40.3	No	RDSNet: A New Deep Architecture for Reciprocal O...	2019-12-11	Code
196	ExtremeNet (Hourglass-104, single-scale)	40.2	No	Bottom-up Object Detection by Grouping Extreme a...	2019-01-23	Code
197	Mask R-CNN (ResNet-101-FPN, CBN)	40.1	No	Cross-Iteration Batch Normalization	2020-02-13	Code
198	Fast R-CNN (Cascade RPN)	40.1	Yes	Cascade RPN: Delving into High-Quality Region Pr...	2019-09-15	Code
199	Mask R-CNN (ResNeXt-101-FPN)	39.8	No	Mask R-CNN	2017-03-20	Code
200	GA-Faster-RCNN	39.8	No	Region Proposal by Guided Anchoring	2019-01-10	Code
201	ResNet-50 NAS-FCOS @256	39.8	No	NAS-FCOS: Fast Neural Architecture Search for Ob...	2019-06-11	Code
202	A2MIM (ResNet-50 2x)	39.8	No	Architecture-Agnostic Masked Image Modeling -- F...	2022-05-27	Code
203	FPN (ResNet101 backbone)	39.5	No	ChainerCV: a Library for Deep Learning in Comput...	2017-08-28	Code
204	RetinaMask (ResNet-50-FPN)	39.4	No	RetinaMask: Learning to predict masks improves s...	2019-01-10	Code
205	LeYOLO (Medium@640)	39.3	No	LeYOLO, New Scalable and Efficient CNN Architect...	2024-06-20	Code
206	AA-ResNet-10 + RetinaNet	39.2	No	Attention Augmented Convolutional Networks	2019-04-22	Code
207	MAL (ResNet50, single-scale)	39.2	No	Multiple Anchor Learning for Visual Object Detec...	2019-12-04	Code
208	RetinaNet (ResNet-101-FPN)	39.1	No	Focal Loss for Dense Object Detection	2017-08-07	Code
209	Cascade R-CNN (ResNet-101-FPN+)	38.8	No	Cascade R-CNN: Delving into High Quality Object ...	2017-12-03	Code
210	M2Det (ResNet-101, single-scale)	38.8	No	M2Det: A Single-Shot Object Detector based on Mu...	2018-11-12	Code
211	SaccadeNet (DLA-34-DCN)	38.5	No	SaccadeNet: A Fast and Accurate Object Detector	2020-03-26	Code
212	Mask R-CNN (ResNet-101-FPN)	38.2	No	Mask R-CNN	2017-03-20	Code
213	LeYOLO (Small@640)	38.2	No	LeYOLO, New Scalable and Efficient CNN Architect...	2024-06-20	Code
214	WSMA-Seg	38.1	No	Segmentation is All You Need	2019-04-30	-
215	Faster R-CNN + FPN + CGD	37.9	No	Compact Global Descriptor for Neural Networks	2019-07-23	Code
216	CornerNet511 (Hourglass-52, single-scale)	37.8	No	CornerNet: Detecting Objects as Paired Keypoints	2018-08-03	Code
217	RefineDet512+ (VGG-16)	37.6	No	Single-Shot Refinement Neural Network for Object...	2017-11-18	Code
218	DeformConv-R-FCN (Aligned-Inception-ResNet)	37.5	No	Deformable Convolutional Networks	2017-03-17	Code
219	Faster R-CNN (ImageNet+300M)	37.4	No	Revisiting Unreasonable Effectiveness of Data in...	2017-07-10	Code
220	Mask R-CNN (Bottleneck-injected ResNet-50, FPN)	36.9	No	torchdistill: A Modular, Configuration-Driven Fr...	2020-11-25	Code
221	Faster R-CNN + TDM	36.8	No	Beyond Skip Connections: Top-Down Modulation for...	2016-12-20	Code
222	Cascade R-CNN (ResNet-50-FPN+)	36.5	No	Cascade R-CNN: Delving into High Quality Object ...	2017-12-03	Code
223	RefineDet512 (ResNet-101)	36.4	No	Single-Shot Refinement Neural Network for Object...	2017-11-18	Code
224	Faster R-CNN + FPN	36.2	Yes	Feature Pyramid Networks for Object Detection	2016-12-09	Code
225	Faster R-CNN (Bottleneck-injected ResNet-50 and FPN)	35.9	No	torchdistill: A Modular, Configuration-Driven Fr...	2020-11-25	Code

#1Co-DETRSOTA
66
box mAP· 2022-11-22
DETRs with Collaborative Hybrid Assignments Training Code
#2InternImage-H (M3I Pre-training)SOTA
65.5
box mAP· 2022-11-10
InternImage: Exploring Large-Scale Vision Foundation Models with Deformable Convolutions Code
#3M3I Pre-training (InternImage-H)
65.4
box mAP· 2022-11-17
Towards All-in-one Pre-training via Maximizing Multi-modal Mutual Information Code
#4MoCaE
65.1
box mAP· 2023-09-26
MoCaE: Mixture of Calibrated Experts Significantly Improves Object Detection Code
#5Focal-Stable-DINO (Focal-Huge, no TTA)
64.8
box mAP· 2023-04-25
A Strong and Reproducible Object Detector with Only Public Datasets Code
#6Co-DETR (Swin-L)
64.8
box mAP· 2022-11-22
DETRs with Collaborative Hybrid Assignments Training Code
#7EVA
64.7
box mAP· 2022-11-14
EVA: Exploring the Limits of Masked Visual Representation Learning at Scale Code
#8Group DETR v2SOTA
64.5
box mAP· 2022-11-07
Group DETR v2: Strong Object Detector with Encoder-Decoder Pretraining
#9FocalNet-H (DINO)SOTA
64.4
box mAP· 2022-03-22
Focal Modulation Networks Code
#10InternImage-XL
64.3
box mAP· 2022-11-10
InternImage: Exploring Large-Scale Vision Foundation Models with Deformable Convolutions Code
#11FD-SwinV2-G
64.2
box mAP· 2022-05-27
Contrastive Learning Rivals Masked Image Modeling in Fine-tuning via Feature Distillation Code
#12Plain-DETR (Swin-L)
63.9
box mAP
No paperCode
#13RevCol-H(DINO)
63.8
box mAP· 2022-12-22
Reversible Column Networks Code
#14BEiT-3
63.7
box mAP· 2022-08-22
Image as a Foreign Language: BEiT Pretraining for All Vision and Vision-Language Tasks Code
#15Relation-DETR (Focal-L)
63.5
box mAP· 2024-07-16
Relation DETR: Exploring Explicit Position Relation Prior for Object Detection Code
#16DETA (Swin-L)
63.5
box mAP· 2022-12-12
NMS Strikes Back Code
#17DINO (Swin-L,multi-scale, TTA)SOTA
63.3
box mAP· 2022-03-07
DINO: DETR with Improved DeNoising Anchor Boxes for End-to-End Object Detection Code
#18SwinV2-G (HTC++)SOTA
63.1
box mAP· 2021-11-18
Swin Transformer V2: Scaling Up Capacity and Resolution Code
#19Grounding DINO
63
box mAP· 2023-03-09
Grounding DINO: Marrying DINO with Grounded Pre-Training for Open-Set Object Detection Code
#20Florence-CoSwin-H
62.4
box mAP· 2021-11-22
Florence: A New Foundation Model for Computer Vision Code
#21GLIPv2 (CoSwin-H, multi-scale)
62.4
box mAP· 2022-06-12
GLIPv2: Unifying Localization and Vision-Language Understanding Code
#22GLEE-Pro
62.3
box mAP· 2023-12-14
General Object Foundation Model for Images and Videos at Scale Code
#23GLIP (Swin-L, multi-scale)
61.5
box mAP· 2021-12-07
Grounded Language-Image Pre-training Code
#24Soft Teacher + Swin-L (HTC++, multi-scale)SOTA
61.3
box mAP· 2021-06-16
End-to-End Semi-Supervised Object Detection with Soft Teacher Code
#25ViT-Adapter-L (HTC++, BEiTv2 pretrain, multi-scale)
60.9
box mAP· 2022-05-17
Vision Transformer Adapter for Dense Predictions Code
#26DyHead (Swin-L, multi scale, self-training)SOTA
60.6
box mAP· 2021-06-15
Dynamic Head: Unifying Object Detection Heads with Attentions Code
#27GLEE-Plus
60.6
box mAP· 2023-12-14
General Object Foundation Model for Images and Videos at Scale Code
#28ViT-Adapter-L (HTC++, BEiT pretrain, multi-scale)
60.4
box mAP· 2022-05-17
Vision Transformer Adapter for Dense Predictions Code
#29GRiT (ViT-H, single-scale testing)
60.4
box mAP· 2022-12-01
GRiT: A Generative Region-to-text Transformer for Object Understanding Code
#30CBNetV2 (Dual-Swin-L HTC, multi-scale)
60.1
box mAP· 2021-07-01
CBNet: A Composite Backbone Network Architecture for Object Detection Code
#31PIIP-H6B (DINO)
60
box mAP· 2024-06-06
Parameter-Inverted Image Pyramid Networks Code
#32CBNetV2 (Dual-Swin-L HTC, single-scale)
59.4
box mAP· 2021-07-01
CBNet: A Composite Backbone Network Architecture for Object Detection Code
#33Focal-L (DyHead, multi-scale)
58.9
box mAP· 2021-07-01
Focal Self-attention for Local-Global Interactions in Vision Transformers Code
#34DyHead (Swin-L, multi scale)
58.7
box mAP· 2021-06-15
Dynamic Head: Unifying Object Detection Heads with Attentions Code
#35Swin-L (HTC++, multi scale)SOTA
58.7
box mAP· 2021-03-25
Swin Transformer: Hierarchical Vision Transformer using Shifted Windows Code
#36Swin-L (HTC++, single scale)
57.7
box mAP· 2021-03-25
Swin Transformer: Hierarchical Vision Transformer using Shifted Windows Code
#37Cascade Eff-B7 NAS-FPN (1280, self-training Copy Paste, single-scale)SOTA
57.3
box mAP· 2020-12-13
Simple Copy-Paste is a Strong Data Augmentation Method for Instance Segmentation Code
#38PyCenterNet (Swin-L, multi-scale)
57.1
box mAP· 2022-04-18
CenterNet++ for Object Detection Code
#39dBOT ViT-L (CLIP)
56.8
box mAP· 2022-09-08
Exploring Target Representations for Masked Autoencoders Code
#40YOLOv7-D6 (44 fps)
56.6
box mAP· Augmentations· 2022-07-06
YOLOv7: Trainable bag-of-freebies sets new state-of-the-art for real-time object detectors Code
#41SOLQ (Swin-L, single scale)
56.5
box mAP· 2021-06-04
SOLQ: Segmenting Objects by Learning Queries Code
#42CenterNet2 (Res2Net-101-DCN-BiFPN, self-training, 1560 single-scale)
56.4
box mAP· 2021-03-12
Probabilistic two-stage detection Code
#43ISTR (ResNet50-FPN-3x, single-scale)
56.4
box mAP· 2021-05-03
ISTR: End-to-End Instance Segmentation with Transformers Code
#44QueryInst (single-scale)
56.1
box mAP· 2021-05-05
Instances as Queries Code
#45dBOT ViT-L
56.1
box mAP· 2022-09-08
Exploring Target Representations for Masked Autoencoders Code
#46YOLOv7-E6 (56 fps)
56
box mAP· 2022-07-06
YOLOv7: Trainable bag-of-freebies sets new state-of-the-art for real-time object detectors Code
#47YOLOv4-P7 with TTASOTA
55.8
box mAP· 2020-11-16
Scaled-YOLOv4: Scaling Cross Stage Partial Network Code
#48DetectoRS (ResNeXt-101-64x4d, multi-scale)SOTA
55.7
box mAP· 2020-06-03
DetectoRS: Detecting Objects with Recursive Feature Pyramid and Switchable Atrous Convolution Code
#49YOLOR-D6 (1280, single-scale, 30 fps)
55.4
box mAP· 2021-05-10
You Only Learn One Representation: Unified Network for Multiple Tasks Code
#50YOLOv4-P6 with TTA
54.9
box mAP· 2020-11-16
Scaled-YOLOv4: Scaling Cross Stage Partial Network Code
#51YOLOv7-W6 (84 fps)
54.9
box mAP· 2022-07-06
YOLOv7: Trainable bag-of-freebies sets new state-of-the-art for real-time object detectors Code
#52Cascade Eff-B7 NAS-FPN (1280)
54.8
box mAP· 2020-12-13
Simple Copy-Paste is a Strong Data Augmentation Method for Instance Segmentation Code
#53DetectoRS (ResNeXt-101-32x4d, multi-scale)
54.7
box mAP· 2020-06-03
DetectoRS: Detecting Objects with Recursive Feature Pyramid and Switchable Atrous Convolution Code
#54GLEE-Lite
54.7
box mAP· 2023-12-14
General Object Foundation Model for Images and Videos at Scale Code
#55YOLOv4-P6 CSP-P6 (single-scale, 32 fps)
54.3
box mAP· 2020-11-16
Scaled-YOLOv4: Scaling Cross Stage Partial Network Code
#56SpineNet-190 (1280, with Self-training on OpenImages, single-scale)
54.3
box mAP· 2020-06-11
Rethinking Pre-training and Self-training Code
#57UniverseNet-20.08d (Res2Net-101, DCN, multi-scale)
54.1
box mAP· 2021-03-25
USB: Universal-Scale Object Detection Benchmark Code
#58DyHead (ResNeXt-64x4d-101-DCN, multi scale)
54
box mAP· 2021-06-15
Dynamic Head: Unifying Object Detection Heads with Attentions Code
#59dBOT ViT-B (CLIP)
53.6
box mAP· 2022-09-08
Exploring Target Representations for Masked Autoencoders Code
#60PAA (ResNext-152-32x8d + DCN, multi-scale)
53.5
box mAP· 2020-07-16
Probabilistic Anchor Assignment with IoU Prediction for Object Detection Code
#61LSNet (Res2Net-101+ DCN, multi-scale)
53.5
box mAP· 2021-04-11
Location-Sensitive Visual Recognition with Cross-IOU Loss Code
#62dBOT ViT-B
53.5
box mAP· 2022-09-08
Exploring Target Representations for Masked Autoencoders Code
#63ResNeSt-200 (multi-scale)
53.3
box mAP· 2020-04-19
ResNeSt: Split-Attention Networks Code
#64Cascade Mask R-CNN (Triple-ResNeXt152, multi-scale)SOTA
53.3
box mAP· 2019-09-09
CBNet: A Novel Composite Backbone Network Architecture for Object Detection Code
#65DetectoRS (ResNeXt-101-32x4d, single-scale)
53.3
box mAP· 2020-06-03
DetectoRS: Detecting Objects with Recursive Feature Pyramid and Switchable Atrous Convolution Code
#66GFLV2 (Res2Net-101, DCN, multiscale)
53.3
box mAP· 2020-11-25
Generalized Focal Loss V2: Learning Reliable Localization Quality Estimation for Dense Object Detection Code
#67YOLOv7-X (114 fps)
53.1
box mAP· Augmentations· 2022-07-06
YOLOv7: Trainable bag-of-freebies sets new state-of-the-art for real-time object detectors Code
#68RelationNet++ (ResNeXt-64x4d-101-DCN)
52.7
box mAP· 2020-10-29
RelationNet++: Bridging Visual Representations for Object Detection via Transformer Decoder Code
#69EfficientDet-D7 (1536)
52.6
box mAP· Augmentations· 2019-11-20
EfficientDet: Scalable and Efficient Object Detection Code
#70YOLOv4-P5 with TTA
52.5
box mAP· 2020-11-16
Scaled-YOLOv4: Scaling Cross Stage Partial Network Code
#71Deformable DETR (ResNeXt-101+DCN)
52.3
box mAP· 2020-10-08
Deformable DETR: Deformable Transformers for End-to-End Object Detection Code
#72GCNet (ResNeXt-101 + DCN + cascade + GC r4)
52.3
box mAP· 2020-12-24
Global Context Networks Code
#73PP-YOLOE-x(CSPRepResNet-x, 640x640, single-scale )
52.2
box mAP· 2022-03-30
PP-YOLOE: An evolved version of YOLO Code
#74RetinaNet (SpineNet-190, 1280x1280)
52.1
box mAP· 2019-12-10
SpineNet: Learning Scale-Permuted Backbone for Recognition and Localization Code
#75RepPoints v2 (ResNeXt-101, DCN, multi-scale)
52.1
box mAP· 2020-07-16
RepPoints V2: Verification Meets Regression for Object Detection Code
#76AC-FPN Cascade R-CNN (X-152-32x8d-FPN-IN5k, multi scale, only CEM)
51.9
box mAP· 2020-05-23
Attention-guided Context Feature Pyramid Network for Object Detection Code
#77OTA (ResNeXt-101+DCN, multiscale)
51.5
box mAP· 2021-03-26
OTA: Optimal Transport Assignment for Object Detection Code
#78YOLOX-x(Modified CSP v5, 640x640, single-scale)
51.5
box mAP· Augmentations· 2021-07-18
YOLOX: Exceeding YOLO Series in 2021 Code
#79PP-YOLOE-l(CSPRepResNet-l, 640x640, single-scale )
51.4
box mAP· 2022-03-30
PP-YOLOE: An evolved version of YOLO Code
#80YOLOv7 (161 fps)
51.4
box mAP· Augmentations· 2022-07-06
YOLOv7: Trainable bag-of-freebies sets new state-of-the-art for real-time object detectors Code
#81UniverseNet-20.08d (Res2Net-101, DCN, single-scale)
51.3
box mAP· 2021-03-25
USB: Universal-Scale Object Detection Benchmark Code
#82TSD(SENet154-DCN,multi-scale)
51.2
box mAP· 2020-03-17
Revisiting the Sibling Head in Object Detector Code
#83YOLOX-X (Modified CSP v5)
51.2
box mAP· 2021-07-18
YOLOX: Exceeding YOLO Series in 2021 Code
#84iBOT (ViT-B/16)
51.2
box mAP· 2021-11-15
iBOT: Image BERT Pre-Training with Online Tokenizer Code
#85RetinaNet (SpineNet-143, 1280x1280)
50.7
box mAP· 2019-12-10
SpineNet: Learning Scale-Permuted Backbone for Recognition and Localization Code
#86ATSS (ResNetXt-64x4d-101+DCN,multi-scale)
50.7
box mAP· 2019-12-05
Bridging the Gap Between Anchor-based and Anchor-free Detection via Adaptive Training Sample Selection Code
#87NAS-FPN (AmoebaNet-D, learned aug)SOTA
50.7
box mAP· 2019-06-26
Learning Data Augmentation Strategies for Object Detection Code
#88Boosting R-CNN*
50.7
box mAP· 2022-06-28
Boosting R-CNN: Reweighting R-CNN Samples by RPN's Error for Underwater Object Detection Code
#89GFLV2 (Res2Net-101, DCN)
50.6
box mAP· 2020-11-25
Generalized Focal Loss V2: Learning Reliable Localization Quality Estimation for Dense Object Detection Code
#90aLRP Loss (ResNext-101-64x4d, DCN, multiscale test)
50.2
box mAP· 2020-09-28
A Ranking-based, Balanced Loss Function Unifying Classification and Localisation in Object Detection Code
#91FreeAnchor + SEPC (DCN, ResNext-101-64x4d)
50.1
box mAP· 2020-05-06
Scale-Equalizing Pyramid Convolution for Object Detection Code
#92D2Det (ResNet-101-DCN, multi-scale test)
50.1
box mAP
No paperCode
#93Dynamic R-CNN (ResNet-101-DCN, multi-scale)
50.1
box mAP· 2020-04-13
Dynamic R-CNN: Towards High Quality Object Detection via Dynamic Training Code
#94TSD(ResNet-101-Deformable, Image Pyramid)
49.4
box mAP· 2020-03-17
Revisiting the Sibling Head in Object Detector Code
#95RepPoints v2 (ResNeXt-101, DCN)
49.4
box mAP· 2020-07-16
RepPoints V2: Verification Meets Regression for Object Detection Code
#96A2MIM (ViT-B)
49.4
box mAP· 2022-05-27
Architecture-Agnostic Masked Image Modeling -- From ViT back to CNN Code
#97iBOT (ViT-S/16)
49.4
box mAP· 2021-11-15
iBOT: Image BERT Pre-Training with Online Tokenizer Code
#98CPNDet (Hourglass-104, multi-scale)
49.2
box mAP· 2020-07-27
Corner Proposal Network for Anchor-free, Two-stage Object Detection Code
#99GFLV2 (ResNeXt-101, 32x4d, DCN)
49
box mAP· 2020-11-25
Generalized Focal Loss V2: Learning Reliable Localization Quality Estimation for Dense Object Detection Code
#100aLRP Loss (ResNext-101-64x4d, DCN, single scale)
48.9
box mAP· 2020-09-28
A Ranking-based, Balanced Loss Function Unifying Classification and Localisation in Object Detection Code
#101PP-YOLOE-m(CSPRepResNet-m, 640x640, single-scale )
48.9
box mAP· 2022-03-30
PP-YOLOE: An evolved version of YOLO Code
#102UniverseNet-20.08 (Res2Net-50, DCN, single-scale)
48.8
box mAP· 2021-03-25
USB: Universal-Scale Object Detection Benchmark Code
#103SOLQ (ResNet101, single scale)
48.7
box mAP· 2021-06-04
SOLQ: Segmenting Objects by Learning Queries Code
#104RetinaNet (SpineNet-96, 1024x1024)
48.6
box mAP· 2019-12-10
SpineNet: Learning Scale-Permuted Backbone for Recognition and Localization Code
#105TridentNet (ResNet-101-Deformable, Image Pyramid)SOTA
48.4
box mAP· 2019-01-07
Scale-Aware Trident Networks for Object Detection Code
#106GCNet (ResNeXt-101 + DCN + cascade + GC r4)
48.4
box mAP· 2019-04-25
GCNet: Non-local Networks Meet Squeeze-Excitation Networks and Beyond Code
#107GFLV2 (ResNet-101-DCN)
48.3
box mAP· 2020-11-25
Generalized Focal Loss V2: Learning Reliable Localization Quality Estimation for Dense Object Detection Code
#108Swin-S (RPE w/ GAB)
48.23
box mAP· 2023-05-08
Understanding Gaussian Attention Bias of Vision Transformers Using Effective Receptive Fields Code
#109GFL (X-101-32x4d-DCN, single-scale)
48.2
box mAP· 2020-06-08
Generalized Focal Loss: Learning Qualified and Distributed Bounding Boxes for Dense Object Detection Code
#110ISTR (ResNet101-FPN-3x, single-scale)
48.1
box mAP· 2021-05-03
ISTR: End-to-End Instance Segmentation with Transformers Code
#111YOLOX-Darknet53(Darknet53, 640x640, single-scale)
48
box mAP· Augmentations· 2021-07-18
YOLOX: Exceeding YOLO Series in 2021 Code
#112DAT-S (RetinaNet)
47.9
box mAP· 2022-01-03
Vision Transformer with Deformable Attention Code
#113aLRP Loss (ResNext-101-64x4d, single scale)
47.8
box mAP· 2020-09-28
A Ranking-based, Balanced Loss Function Unifying Classification and Localisation in Object Detection Code
#114MatrixNet Corners (ResNet-152, multi-scale)
47.8
box mAP· 2019-08-13
Matrix Nets: A New Deep Architecture for Object Detection Code
#115SOLQ (ResNet50, single scale)
47.8
box mAP· 2021-06-04
SOLQ: Segmenting Objects by Learning Queries Code
#116DyHead (ResNeXt-64x4d-101)
47.7
box mAP· 2021-06-15
Dynamic Head: Unifying Object Detection Heads with Attentions Code
#117SAPD (ResNeXt-101, single-scale)
47.4
box mAP· 2019-11-27
Soft Anchor-Point Object Detection Code
#118PANet (ResNeXt-101, multi-scale)SOTA
47.4
box mAP· 2018-03-05
Path Aggregation Network for Instance Segmentation Code
#119HTC (HRNetV2p-W48)
47.3
box mAP· 2019-08-20
Deep High-Resolution Representation Learning for Visual Recognition Code
#120HTC (ResNeXt-101-FPN)
47.1
box mAP· 2019-01-22
Hybrid Task Cascade for Instance Segmentation Code
#121CenterNet511 (Hourglass-104, multi-scale)
47
box mAP· 2019-04-17
CenterNet: Keypoint Triplets for Object Detection Code
#122MAL (ResNeXt101, multi-scale)
47
box mAP· 2019-12-04
Multiple Anchor Learning for Visual Object Detection Code
#123ISTR (ResNet50-FPN-3x)
46.8
box mAP· 2021-05-03
ISTR: End-to-End Instance Segmentation with Transformers Code
#124RetinaNet (SpineNet-49, 896x896)
46.7
box mAP· 2019-12-10
SpineNet: Learning Scale-Permuted Backbone for Recognition and Localization Code
#125RPDet (ResNet-101-DCN, multi-scale)
46.5
box mAP· 2019-04-25
RepPoints: Point Set Representation for Object Detection Code
#126HoughNet (MS)
46.4
box mAP· 2020-07-05
HoughNet: Integrating near and long-range evidence for bottom-up object detection Code
#127PPDet (ResNeXt-101-FPN, multiscale)
46.3
box mAP· 2020-08-03
Reducing Label Noise in Anchor-Free Object Detection Code
#128GFLV2 (ResNet-101)
46.2
box mAP· 2020-11-25
Generalized Focal Loss V2: Learning Reliable Localization Quality Estimation for Dense Object Detection Code
#129SNIPER (ResNet-101)
46.1
box mAP· 2018-05-23
SNIPER: Efficient Multi-Scale Training Code
#130Mask R-CNN (HRNetV2p-W48 + cascade)
46.1
box mAP· 2019-08-20
Deep High-Resolution Representation Learning for Visual Recognition Code
#131ResNeXt-64x4d-101 NAS-FCOS @128-256 w/improvements
46.1
box mAP· 2019-06-11
NAS-FCOS: Fast Neural Architecture Search for Object Detection Code
#132DCNv2 (ResNet-101, multi-scale)
46
box mAP· 2018-11-27
Deformable ConvNets v2: More Deformable, Better Results Code
#133Gaussian-FCOS
46
box mAP· 2020-06-28
Localization Uncertainty Estimation for Anchor-Free Object Detection
#134Cascade R-CNN-FPN (ResNet-101, map-guided)
45.9
box mAP· 2019-08-21
InstaBoost: Boosting Instance Segmentation via Probability Map Guided Copy-Pasting Code
#135MAL (ResNeXt101, single-scale)
45.9
box mAP· 2019-12-04
Multiple Anchor Learning for Visual Object Detection Code
#136CenterMask+VoVNetV2-99 (single-scale)
45.8
box mAP· 2019-11-15
CenterMask : Real-Time Anchor-Free Instance Segmentation Code
#137D-RFCN + SNIP (DPN-98 with flip, multi-scale)SOTA
45.7
box mAP· 2017-11-22
An Analysis of Scale Invariance in Object Detection - SNIP
#138YOLOv4 (CD53)
45.5
box mAP· Augmentations· 2020-11-16
Scaled-YOLOv4: Scaling Cross Stage Partial Network Code
#139AC-FPN Cascade R-CNN(ResNet-101, single scale)
45
box mAP· 2020-05-23
Attention-guided Context Feature Pyramid Network for Object Detection Code
#140FreeAnchor (ResNeXt-101)
44.8
box mAP· 2019-09-05
FreeAnchor: Learning to Match Anchors for Visual Object Detection Code
#141FCOS (ResNeXt-64x4d-101-FPN 4 + improvements)
44.7
box mAP· 2019-04-02
FCOS: Fully Convolutional One-Stage Object Detection Code
#142CenterMask+VoVNet2-57 (single-scale)
44.7
box mAP· 2019-11-15
CenterMask : Real-Time Anchor-Free Instance Segmentation Code
#143FSAF (ResNeXt-101, multi-scale)
44.6
box mAP· 2019-03-02
Feature Selective Anchor-Free Module for Single-Shot Object Detection Code
#144aLRP Loss (ResNext-101, DCN, 500 scale)
44.6
box mAP· 2020-09-28
A Ranking-based, Balanced Loss Function Unifying Classification and Localisation in Object Detection Code
#145CenterMask + X-101-32x8d (single-scale)
44.6
box mAP· 2019-11-15
CenterMask : Real-Time Anchor-Free Instance Segmentation Code
#146RetinaNet (SpineNet-49, 640x640)
44.3
box mAP· 2019-12-10
SpineNet: Learning Scale-Permuted Backbone for Recognition and Localization Code
#147YOLOF-DC5
44.3
box mAP· 2021-03-17
You Only Look One-level Feature Code
#148GFLV2 (ResNet-50)
44.3
box mAP· 2020-11-25
Generalized Focal Loss V2: Learning Reliable Localization Quality Estimation for Dense Object Detection Code
#149InterNet (ResNet-101-FPN, multi-scale)
44.2
box mAP· 2019-03-28
Feature Intertwiner for Object Detection Code
#150M2Det (VGG-16, multi-scale)
44.2
box mAP· 2018-11-12
M2Det: A Single-Shot Object Detector based on Multi-Level Feature Pyramid Network Code
#151Faster R-CNN (LIP-ResNet-101-MD w FPN)
43.9
box mAP· 2019-08-12
LIP: Local Importance-based Pooling Code
#152M2Det (ResNet-101, multi-scale)
43.9
box mAP· 2018-11-12
M2Det: A Single-Shot Object Detector based on Multi-Level Feature Pyramid Network Code
#153YOLOv3 @800 + ASFF* (Darknet-53)
43.9
box mAP· Augmentations· 2019-11-21
Learning Spatial Fusion for Single-Shot Object Detection Code
#154FoveaBox (ResNeXt-101)
43.9
box mAP· 2019-04-08
FoveaBox: Beyond Anchor-based Object Detector Code
#155ExtremeNet (Hourglass-104, multi-scale)
43.7
box mAP· 2019-01-23
Bottom-up Object Detection by Grouping Extreme and Center Points Code
#156YOLOv4-608
43.5
box mAP· Augmentations· 2020-04-23
YOLOv4: Optimal Speed and Accuracy of Object Detection Code
#157SNIPER (ResNet-50)
43.5
box mAP· 2018-05-23
SNIPER: Efficient Multi-Scale Training Code
#158CenterNet (HRNetV2-W48)
43.5
box mAP· 2019-08-20
Deep High-Resolution Representation Learning for Visual Recognition Code
#159D-RFCN + SNIP (ResNet-101, multi-scale)
43.4
box mAP· 2017-11-22
An Analysis of Scale Invariance in Object Detection - SNIP
#160Grid R-CNN (ResNeXt-101-FPN)
43.2
box mAP· 2018-11-29
Grid R-CNN Code
#161FCOS (ResNeXt-101-64x4d-FPN)
43.2
box mAP· 2019-04-02
FCOS: Fully Convolutional One-Stage Object Detection Code
#162CornerNet-Saccade (Hourglass-104, multi-scale)
43.2
box mAP· 2019-04-18
CornerNet-Lite: Efficient Keypoint Based Object Detection Code
#163PP-YOLOE-s(CSPRepResNet-s, 640x640, single-scale )
43.1
box mAP· 2022-03-30
PP-YOLOE: An evolved version of YOLO Code
#164Libra R-CNN (ResNeXt-101-FPN)
43
box mAP· 2019-04-04
Libra R-CNN: Towards Balanced Learning for Object Detection Code
#165DyHead (ResNet-50)
43
box mAP· 2021-06-15
Dynamic Head: Unifying Object Detection Heads with Attentions Code
#166RPDet (ResNet-101-DCN)
42.8
box mAP· 2019-04-25
RepPoints: Point Set Representation for Object Detection Code
#167SpineNet-49 (640, RetinaNet, single-scale)
42.8
box mAP· 2019-12-10
SpineNet: Learning Scale-Permuted Backbone for Recognition and Localization Code
#168Cascade R-CNN (ResNet-101-FPN+, cascade)
42.8
box mAP· 2017-12-03
Cascade R-CNN: Delving into High Quality Object Detection Code
#169Cascade R-CNN
42.8
box mAP· 2019-06-24
Cascade R-CNN: High Quality Object Detection and Instance Segmentation Code
#170TridentNet (ResNet-101)
42.7
box mAP· 2019-01-07
Scale-Aware Trident Networks for Object Detection Code
#171FCOS (ResNeXt-32x8d-101-FPN)
42.7
box mAP· 2019-04-02
FCOS: Fully Convolutional One-Stage Object Detection Code
#172RetinaMask (ResNeXt-101-FPN-GN)
42.6
box mAP· 2019-01-10
RetinaMask: Learning to predict masks improves state-of-the-art single-shot detection for free Code
#173TAL + TAP
42.5
box mAP· 2021-08-17
TOOD: Task-aligned One-stage Object Detection Code
#174Faster R-CNN (HRNetV2p-W48)
42.4
box mAP· 2019-08-20
Deep High-Resolution Representation Learning for Visual Recognition Code
#175HSD (Rest101, 768x768, single-scale test)
42.3
box mAP
No paperCode
#176CornerNet511 (Hourglass-104, multi-scale)
42.1
box mAP· 2018-08-03
CornerNet: Detecting Objects as Paired Keypoints Code
#177FoveaBox (ResNeXt-101)
42.1
box mAP· 2019-04-08
FoveaBox: Beyond Anchor-based Object Detector Code
#178FCOS (HRNet-W32-5l)
42
box mAP· 2019-04-02
FCOS: Fully Convolutional One-Stage Object Detection Code
#179FoveaBox (ResNeXt-101)
41.9
box mAP· 2019-04-08
FoveaBox: Beyond Anchor-based Object Detector Code
#180RefineDet512+ (ResNet-101)SOTA
41.8
box mAP· 2017-11-18
Single-Shot Refinement Neural Network for Object Detection Code
#181GHM-C + GHM-R (RetinaNet-FPN-ResNeXt-101)
41.6
box mAP· 2018-11-13
Gradient Harmonized Single-stage Detector Code
#182CenterNet-DLA (DLA-34, multi-scale)
41.6
box mAP· 2019-04-16
Objects as Points Code
#183RetinaNet (SpineNet-49S, 640x640)
41.5
box mAP· 2019-12-10
SpineNet: Learning Scale-Permuted Backbone for Recognition and Localization Code
#184RPDet (ResNet-101)
41
box mAP· 2019-04-25
RepPoints: Point Set Representation for Object Detection Code
#185M2Det (VGG-16, single-scale)
41
box mAP· 2018-11-12
M2Det: A Single-Shot Object Detector based on Multi-Level Feature Pyramid Network Code
#186LeYOLO (Large@768)
41
box mAP· 2024-06-20
LeYOLO, New Scalable and Efficient CNN Architecture for Object Detection Code
#187FSAF (ResNet-101, single-scale)
40.9
box mAP· 2019-03-02
Feature Selective Anchor-Free Module for Single-Shot Object Detection Code
#188RetinaNet (ResNeXt-101-FPN)SOTA
40.8
box mAP· 2017-08-07
Focal Loss for Dense Object Detection Code
#189Cascade R-CNN (ResNet-50-FPN+, cascade)
40.6
box mAP· 2017-12-03
Cascade R-CNN: Delving into High Quality Object Detection Code
#190Faster R-CNN (Cascade RPN)
40.6
box mAP· Augmentations· 2019-09-15
Cascade RPN: Delving into High-Quality Region Proposal Network with Adaptive Convolution Code
#191ResNet-50-DW-DPN (Deformable Kernels)
40.6
box mAP· 2019-10-07
Deformable Kernels: Adapting Effective Receptive Fields for Object Deformation Code
#192IoU-Net
40.6
box mAP· 2018-07-30
Acquisition of Localization Confidence for Accurate Object Detection Code
#193FCOS (HRNetV2p-W48)
40.5
box mAP· Augmentations· 2019-08-20
Deep High-Resolution Representation Learning for Visual Recognition Code
#194ResNet-50-FPN Mask R-CNN + KL Loss + var voting + soft-NMS
40.4
box mAP· 2018-09-23
Bounding Box Regression with Uncertainty for Accurate Object Detection Code
#195RDSNet (ResNet-101, RetinaNet, mask, MBRM)
40.3
box mAP· 2019-12-11
RDSNet: A New Deep Architecture for Reciprocal Object Detection and Instance Segmentation Code
#196ExtremeNet (Hourglass-104, single-scale)
40.2
box mAP· 2019-01-23
Bottom-up Object Detection by Grouping Extreme and Center Points Code
#197Mask R-CNN (ResNet-101-FPN, CBN)
40.1
box mAP· 2020-02-13
Cross-Iteration Batch Normalization Code
#198Fast R-CNN (Cascade RPN)
40.1
box mAP· Augmentations· 2019-09-15
Cascade RPN: Delving into High-Quality Region Proposal Network with Adaptive Convolution Code
#199Mask R-CNN (ResNeXt-101-FPN)SOTA
39.8
box mAP· 2017-03-20
Mask R-CNN Code
#200GA-Faster-RCNN
39.8
box mAP· 2019-01-10
Region Proposal by Guided Anchoring Code
#201ResNet-50 NAS-FCOS @256
39.8
box mAP· 2019-06-11
NAS-FCOS: Fast Neural Architecture Search for Object Detection Code
#202A2MIM (ResNet-50 2x)
39.8
box mAP· 2022-05-27
Architecture-Agnostic Masked Image Modeling -- From ViT back to CNN Code
#203FPN (ResNet101 backbone)
39.5
box mAP· 2017-08-28
ChainerCV: a Library for Deep Learning in Computer Vision Code
#204RetinaMask (ResNet-50-FPN)
39.4
box mAP· 2019-01-10
RetinaMask: Learning to predict masks improves state-of-the-art single-shot detection for free Code
#205LeYOLO (Medium@640)
39.3
box mAP· 2024-06-20
LeYOLO, New Scalable and Efficient CNN Architecture for Object Detection Code
#206AA-ResNet-10 + RetinaNet
39.2
box mAP· 2019-04-22
Attention Augmented Convolutional Networks Code
#207MAL (ResNet50, single-scale)
39.2
box mAP· 2019-12-04
Multiple Anchor Learning for Visual Object Detection Code
#208RetinaNet (ResNet-101-FPN)
39.1
box mAP· 2017-08-07
Focal Loss for Dense Object Detection Code
#209Cascade R-CNN (ResNet-101-FPN+)
38.8
box mAP· 2017-12-03
Cascade R-CNN: Delving into High Quality Object Detection Code
#210M2Det (ResNet-101, single-scale)
38.8
box mAP· 2018-11-12
M2Det: A Single-Shot Object Detector based on Multi-Level Feature Pyramid Network Code
#211SaccadeNet (DLA-34-DCN)
38.5
box mAP· 2020-03-26
SaccadeNet: A Fast and Accurate Object Detector Code
#212Mask R-CNN (ResNet-101-FPN)
38.2
box mAP· 2017-03-20
Mask R-CNN Code
#213LeYOLO (Small@640)
38.2
box mAP· 2024-06-20
LeYOLO, New Scalable and Efficient CNN Architecture for Object Detection Code
#214WSMA-Seg
38.1
box mAP· 2019-04-30
Segmentation is All You Need
#215Faster R-CNN + FPN + CGD
37.9
box mAP· 2019-07-23
Compact Global Descriptor for Neural Networks Code
#216CornerNet511 (Hourglass-52, single-scale)
37.8
box mAP· 2018-08-03
CornerNet: Detecting Objects as Paired Keypoints Code
#217RefineDet512+ (VGG-16)
37.6
box mAP· 2017-11-18
Single-Shot Refinement Neural Network for Object Detection Code
#218DeformConv-R-FCN (Aligned-Inception-ResNet)SOTA
37.5
box mAP· 2017-03-17
Deformable Convolutional Networks Code
#219Faster R-CNN (ImageNet+300M)
37.4
box mAP· 2017-07-10
Revisiting Unreasonable Effectiveness of Data in Deep Learning Era Code
#220Mask R-CNN (Bottleneck-injected ResNet-50, FPN)
36.9
box mAP· 2020-11-25
torchdistill: A Modular, Configuration-Driven Framework for Knowledge Distillation Code
#221Faster R-CNN + TDMSOTA
36.8
box mAP· 2016-12-20
Beyond Skip Connections: Top-Down Modulation for Object Detection Code
#222Cascade R-CNN (ResNet-50-FPN+)
36.5
box mAP· 2017-12-03
Cascade R-CNN: Delving into High Quality Object Detection Code
#223RefineDet512 (ResNet-101)
36.4
box mAP· 2017-11-18
Single-Shot Refinement Neural Network for Object Detection Code
#224Faster R-CNN + FPNSOTA
36.2
box mAP· Augmentations· 2016-12-09
Feature Pyramid Networks for Object Detection Code
#225Faster R-CNN (Bottleneck-injected ResNet-50 and FPN)
35.9
box mAP· 2020-11-25
torchdistill: A Modular, Configuration-Driven Framework for Knowledge Distillation Code