TasksSotADatasetsPapersMethodsSubmitAbout
Papers With Code 2

A community resource for machine learning research: papers, code, benchmarks, and state-of-the-art results.

Explore

Notable BenchmarksAll SotADatasetsPapersMethods

Community

Submit ResultsAbout

Data sourced from the PWC Archive (CC-BY-SA 4.0). Built by the community, for the community.

SotA/Methodology/16k/COCO minival

16k on COCO minival

Metric: box AP (higher is better)

LeaderboardDataset
Loading chart...

Results

Submit a result
#Model↕box AP▼AugmentationsPaperDate↕Code
1PE_spatial (DETA)66YesPerception Encoder: The best visual embeddings a...2025-04-17Code
2Co-DETR65.9YesDETRs with Collaborative Hybrid Assignments Trai...2022-11-22Code
3M3I Pre-training (InternImage-H)65YesTowards All-in-one Pre-training via Maximizing M...2022-11-17Code
4InternImage-H65YesInternImage: Exploring Large-Scale Vision Founda...2022-11-10Code
5Co-DETR (Swin-L)64.7YesDETRs with Collaborative Hybrid Assignments Trai...2022-11-22Code
6Focal-Stable-DINO (Focal-Huge, no TTA)64.6YesA Strong and Reproducible Object Detector with O...2023-04-25Code
7EVA64.5YesEVA: Exploring the Limits of Masked Visual Repre...2022-11-14Code
8ViT-CoMer64.3No--Code
9FocalNet-H (DINO)64.2YesFocal Modulation Networks2022-03-22Code
10InternImage-XL64.2YesInternImage: Exploring Large-Scale Vision Founda...2022-11-10Code
11CP-DETR-L Swin-L(Fine tuning separately in COCO)64.1YesCP-DETR: Concept Prompt Guide DETR Toward Strong...2024-12-13-
12RevCol-H(DINO)63.8YesReversible Column Networks2022-12-22Code
13DINO (Swin-L)63.2NoDINO: DETR with Improved DeNoising Anchor Boxes ...2022-03-07Code
14Grounding DINO63YesGrounding DINO: Marrying DINO with Grounded Pre-...2023-03-09Code
15SwinV2-G (HTC++)62.5YesSwin Transformer V2: Scaling Up Capacity and Res...2021-11-18Code
16Florence-CoSwin-H62YesFlorence: A New Foundation Model for Computer Vi...2021-11-22Code
17GLEE-Pro62YesGeneral Object Foundation Model for Images and V...2023-12-14Code
18ViTDet, ViT-H Cascade (multiscale)61.3NoExploring Plain Vision Transformer Backbones for...2022-03-30Code
19GLIP (Swin-L, multi-scale)60.8YesGrounded Language-Image Pre-training2021-12-07Code
20Soft Teacher + Swin-L (HTC++, multi-scale)60.7YesEnd-to-End Semi-Supervised Object Detection with...2021-06-16Code
21UNINEXT-H60.6YesUniversal Instance Perception as Object Discover...2023-03-12Code
22ViT-Adapter-L (HTC++, BEiTv2 pretrain, multi-scale)60.5NoVision Transformer Adapter for Dense Predictions2022-05-17Code
23ViTDet, ViT-H Cascade60.4NoExploring Plain Vision Transformer Backbones for...2022-03-30Code
24GLEE-Plus60.4YesGeneral Object Foundation Model for Images and V...2023-12-14Code
25DyHead (Swin-L, multi scale, self-training)60.3YesDynamic Head: Unifying Object Detection Heads wi...2021-06-15Code
26ViT-Adapter-L (HTC++, BEiT pretrain, multi-scale)60.2NoVision Transformer Adapter for Dense Predictions2022-05-17Code
27Soft Teacher+Swin-L(HTC++, single scale)60.1YesEnd-to-End Semi-Supervised Object Detection with...2021-06-16Code
28CBNetV2 (Dual-Swin-L HTC, multi-scale)59.6NoCBNet: A Composite Backbone Network Architecture...2021-07-01Code
29Frozen Backbone, SwinV2-G-ext22K (HTC)59.3NoCould Giant Pretrained Image Models Extract Univ...2022-11-03-
30HorNet-L59.2NoHorNet: Efficient High-Order Spatial Interaction...2022-07-28Code
31MOAT-3 (IN-22K pretraining, single-scale)59.2NoMOAT: Alternating Mobile Convolution and Attenti...2022-10-04Code
32CBNetV2 (Dual-Swin-L HTC, multi-scale)59.1NoCBNet: A Composite Backbone Network Architecture...2021-07-01Code
33Focal-L (DyHead, multi-scale)58.7NoFocal Self-attention for Local-Global Interactio...2021-07-01Code
34MViTv2-L (Cascade Mask R-CNN, multi-scale, IN21k pre-train)58.7NoMViTv2: Improved Multiscale Vision Transformers ...2021-12-02Code
35MOAT-2 (IN-22K pretraining, single-scale)58.5NoMOAT: Alternating Mobile Convolution and Attenti...2022-10-04Code
36DyHead (Swin-L, multi scale)58.4NoDynamic Head: Unifying Object Detection Heads wi...2021-06-15Code
37Swin-L (HTC++, multi scale)58NoSwin Transformer: Hierarchical Vision Transforme...2021-03-25Code
38MOAT-1 (IN-1K pretraining, single-scale)57.7NoMOAT: Alternating Mobile Convolution and Attenti...2022-10-04Code
39UM-MAE(HTC++, Swin-L, IN1K)57.4NoUniform Masking: Enabling MAE Pre-training for P...2022-05-20Code
40YOLOv6-L6(46 fps, 1280, V100)57.2NoYOLOv6 v3.0: A Full-Scale Reloading2023-01-13Code
41Swin-L (HTC++, single scale)57.1NoSwin Transformer: Hierarchical Vision Transforme...2021-03-25Code
42TransNeXt-Base (IN-1K pretrain, DINO 1x)57.1NoTransNeXt: Robust Foveal Visual Perception for V...2023-11-28Code
43Cascade Eff-B7 NAS-FPN (1280, self-training Copy Paste, single-scale)57YesSimple Copy-Paste is a Strong Data Augmentation ...2020-12-13Code
44TransNeXt-Small (IN-1K pretrain, DINO 1x)56.6NoTransNeXt: Robust Foveal Visual Perception for V...2023-11-28Code
45QueryInst (single scale)56.1NoInstances as Queries2021-05-05Code
46MViTv2-H (Cascade Mask R-CNN, single-scale, IN21k pre-train)56.1NoMViTv2: Improved Multiscale Vision Transformers ...2021-12-02Code
47MOAT-0 (IN-1K pretraining, single-scale)55.9NoMOAT: Alternating Mobile Convolution and Attenti...2022-10-04Code
48TransNeXt-Tiny (IN-1K pretrain, DINO 1x)55.7NoTransNeXt: Robust Foveal Visual Perception for V...2023-11-28Code
49YOLOv4-P7 CSP-P7 (single-scale, 16 fps)55.4NoScaled-YOLOv4: Scaling Cross Stage Partial Network2020-11-16Code
50tiny-MOAT-3 (IN-1K pretraining, single-scale)55.2NoMOAT: Alternating Mobile Convolution and Attenti...2022-10-04Code
51FAN-L-Hybrid55.1NoUnderstanding The Robustness in Vision Transform...2022-04-26Code
52Hiera-L55NoHiera: A Hierarchical Vision Transformer without...2023-06-01Code
53GLEE-Lite55YesGeneral Object Foundation Model for Images and V...2023-12-14Code
54TEC(VIT-B, Mask-RCNN)54.6NoTowards Sustainable Self-supervised Learning2022-10-20Code
55Cascade Eff-B7 NAS-FPN (1280)54.5NoSimple Copy-Paste is a Strong Data Augmentation ...2020-12-13Code
56CAE (ViT-L, Mask R-CNN, 1x schedule)54.5NoContext Autoencoder for Self-Supervised Represen...2022-02-07Code
57MViTv2-L (Cascade Mask R-CNN, single-scale)54.3NoMViTv2: Improved Multiscale Vision Transformers ...2021-12-02Code
58SpineNet-190 (1280, with Self-training on OpenImages, single-scale)54.2YesRethinking Pre-training and Self-training2020-06-11Code
59Cascade RCNN-RS (SpineNet-143L, single scale)53.6NoSimple Training Strategies and Model Scaling for...2021-06-30Code
60UniverseNet-20.08d (Res2Net-101, DCN, multi-scale)53.5NoUSB: Universal-Scale Object Detection Benchmark2021-03-25Code
61MAE (ViT-L, Mask R-CNN)53.3NoMasked Autoencoders Are Scalable Vision Learners2021-11-11Code
62Cascade RCNN-RS (ResNet-200, single scale)53.1NoSimple Training Strategies and Model Scaling for...2021-06-30Code
63tiny-MOAT-2 (IN-1K pretraining, single-scale)53NoMOAT: Alternating Mobile Convolution and Attenti...2022-10-04Code
64MViT-L (Mask R-CNN, single-scale, IN21k pre-train)52.7NoMViTv2: Improved Multiscale Vision Transformers ...2021-12-02Code
65ResNeSt-200 (multi-scale)52.47NoResNeSt: Split-Attention Networks2020-04-19Code
66ActiveMLP-B (Cascade Mask R-CNN)52.3NoActive Token Mixer2022-03-11Code
67RetinaNet (SpineNet-190, 1536x1536)52.2NoSpineNet: Learning Scale-Permuted Backbone for R...2019-12-10Code
68EfficientDet-D7 (1536)52.1NoEfficientDet: Scalable and Efficient Object Dete...2019-11-20Code
69tiny-MOAT-1 (IN-1K pretraining, single-scale)51.9NoMOAT: Alternating Mobile Convolution and Attenti...2022-10-04Code
70GCNet (ResNeXt-101 + DCN + cascade + GC r4)51.8NoGlobal Context Networks2020-12-24Code
71ELSA-S (Cascade Mask RCNN)51.6NoELSA: Enhanced Local Self-Attention for Vision T...2021-12-23Code
72FocalNet-T (LRF, Cascade Mask R-CNN)51.5NoFocal Modulation Networks2022-03-22Code
73DINO-5scale (24 epoch)51.3NoDINO: DETR with Improved DeNoising Anchor Boxes ...2022-03-07Code
74DINO-5scale (36 epoch)51.2NoDINO: DETR with Improved DeNoising Anchor Boxes ...2022-03-07Code
75ResNeSt-200-DCN (single-scale)50.91NoResNeSt: Split-Attention Networks2020-04-19Code
76UniverseNet-20.08d (Res2Net-101, DCN, single-scale)50.9NoUSB: Universal-Scale Object Detection Benchmark2021-03-25Code
77ResNeSt-200 (single-scale)50.54NoResNeSt: Split-Attention Networks2020-04-19Code
78tiny-MOAT-0 (IN-1K pretraining, single-scale)50.5NoMOAT: Alternating Mobile Convolution and Attenti...2022-10-04Code
79MAE (ViT-B, Mask R-CNN)50.3NoMasked Autoencoders Are Scalable Vision Learners2021-11-11Code
80Sparse R-CNN (PVTv2-B2)50.1NoPVT v2: Improved Baselines with Pyramid Vision T...2021-06-25Code
81Pix2seq (ViT-L)50YesPix2seq: A Language Modeling Framework for Objec...2021-09-22Code
82DaViT-T (Mask R-CNN, 36 epochs)49.9NoDaViT: Dual Attention Vision Transformers2022-04-07Code
83BoTNet 200 (Mask R-CNN, single scale, 72 epochs)49.7NoBottleneck Transformers for Visual Recognition2021-01-27Code
84BoTNet 152 (Mask R-CNN, single scale, 72 epochs)49.5NoBottleneck Transformers for Visual Recognition2021-01-27Code
85DN-Deformable-DETR-R50++49.5NoDN-DETR: Accelerate DETR Training by Introducing...2022-03-02Code
86REGO-Deformable DETR-X10149.1NoRecurrent Glimpse-based Decoder for Detection wi...2021-12-09Code
87CenterMask+VoVNet99 (multi-scale)48.6NoCenterMask : Real-Time Anchor-Free Instance Segm...2019-11-15Code
88Mask R-CNN (ResNeXt-152-FPN, cascade)48.6NoRethinking ImageNet Pre-training2018-11-21Code
89UniverseNet-20.08 (Res2Net-50, DCN, single-scale)48.5NoUSB: Universal-Scale Object Detection Benchmark2021-03-25Code
90XCiT-M24/848.5NoXCiT: Cross-Covariance Image Transformers2021-06-17Code
91ELSA-S (Mask RCNN)48.3NoELSA: Enhanced Local Self-Attention for Vision T...2021-12-23Code
92XCiT-S24/848.1NoXCiT: Cross-Covariance Image Transformers2021-06-17Code
93GCNet (ResNeXt-101 + DCN + cascade + GC r16)47.9NoGCNet: Non-local Networks Meet Squeeze-Excitatio...2019-04-25Code
94MAE-Det(MAE-Det-L+GFLV2)47.8NoMAE-DET: Revisiting Maximum Entropy Principle in...2021-11-26Code
95Res2Net101+HTC47.5NoRes2Net: A New Multi-scale Backbone Architecture2019-04-02Code
96Mask R-CNN (ResNet-101-FPN, GN, Cascade)47.4NoRethinking ImageNet Pre-training2018-11-21Code
97Pix2seq (R50-C4)47.3NoPix2seq: A Language Modeling Framework for Objec...2021-09-22Code
98Pix2seq (ViT-B)47.1NoPix2seq: A Language Modeling Framework for Objec...2021-09-22Code
99HTC (HRNetV2p-W48)47NoDeep High-Resolution Representation Learning for...2019-08-20Code
100PatchConvNet-S120 (Mask R-CNN)47NoAugmenting Convolutional networks with attention...2021-12-27Code
101RPDet (ResNeXt-101-DCN, multi-scale)46.8NoRepPoints: Point Set Representation for Object D...2019-04-25Code
102DAB-DETR-DC5-R10146.6NoDAB-DETR: Dynamic Anchor Boxes are Better Querie...2022-01-28Code
103DyHead (ResNet-101)46.5NoDynamic Head: Unifying Object Detection Heads wi...2021-06-15Code
104Mask R-CNN (ResNeXt-152-FPN)46.4NoRethinking ImageNet Pre-training2018-11-21Code
105RPDet (ResNet-101-DCN, multi-scale)46.4NoRepPoints: Point Set Representation for Object D...2019-04-25Code
106PatchConvNet-S60 (Mask R-CNN)46.4NoAugmenting Convolutional networks with attention...2021-12-27Code
107Cascade Mask R-CNN (ResNet-50)46.3NoDeep Residual Learning for Image Recognition2015-12-10Code
108HoughNet (HG-104, MS)46.1NoHoughNet: Integrating near and long-range eviden...2020-07-05Code
109Mask R-CNN (HRNetV2p-W48, cascade)46NoDeep High-Resolution Representation Learning for...2019-08-20Code
110Conditional DETR-DC5-R10145.9NoConditional DETR for Fast Training Convergence2021-08-13Code
111BoTNet 50 (72 epochs)45.9NoBottleneck Transformers for Visual Recognition2021-01-27Code
112Sparse R-CNN (ResNet-101, learnable proposals, random crop aug, FPN)45.6NoSparse R-CNN: End-to-End Object Detection with L...2020-11-25Code
113CenterMask+VoVNetV2-99 (single-scale)45.6NoCenterMask : Real-Time Anchor-Free Instance Segm...2019-11-15Code
114HTC (HRNetV2p-W32)45.3NoDeep High-Resolution Representation Learning for...2019-08-20Code
115Anchor DETR-DC5-R10145.1NoAnchor DETR: Query Design for Transformer-Based ...2021-09-15Code
116Conditional DETR-DC5-R5045.1NoConditional DETR for Fast Training Convergence2021-08-13Code
117Mask R-CNN (ResNeXt-152 + 1 NL)45NoNon-local Neural Networks2017-11-21Code
118Pix2seq (R101-DC5)45NoPix2seq: A Language Modeling Framework for Objec...2021-09-22Code
119Mask R-CNN-FPN (AOGNet-40M)44.9NoAttentive Normalization2019-08-04Code
120DETR-DC5 (ResNet-101)44.9NoEnd-to-End Object Detection with Transformers2020-05-26Code
121Mask R-CNN (VoVNetV2-99, single-scale)44.9NoCenterMask : Real-Time Anchor-Free Instance Segm...2019-11-15Code
122R3-CNN (ResNet-50-FPN, DCN)44.8NoRecursively Refined R-CNN: Instance Segmentation...2021-04-03Code
123RPDet (ResNet-101-DCN, multi-scale train)44.8NoRepPoints: Point Set Representation for Object D...2019-04-25Code
124RetinaNet (ViL-Base, multi-scale, 3x)44.7NoMulti-Scale Vision Longformer: A New Vision Tran...2021-03-29Code
125Cascade R-CNN (HRNetV2p-W48)44.6NoDeep High-Resolution Representation Learning for...2019-08-20Code
126CenterMask+VoVNetV2-57 (single-scale)44.6NoCenterMask : Real-Time Anchor-Free Instance Segm...2019-11-15Code
127Conditional DETR-R10144.5NoConditional DETR for Fast Training Convergence2021-08-13Code
128Sparse R-CNN (ResNet-50, learnable proposals, random crop aug, FPN)44.5NoSparse R-CNN: End-to-End Object Detection with L...2020-11-25Code
129GFL (ResNet-50)44.5NoDeep Residual Learning for Image Recognition2015-12-10Code
130RPDet (ResNeXt-101-DCN)44.5NoRepPoints: Point Set Representation for Object D...2019-04-25Code
131CenterMask+X101-32x8d (single-scale)44.4NoCenterMask : Real-Time Anchor-Free Instance Segm...2019-11-15Code
132RetinaNet (ViL-Base)44.3NoMulti-Scale Vision Longformer: A New Vision Tran...2021-03-29Code
133R3-CNN (ResNet-50-FPN, GC-Net)44.3NoRecursively Refined R-CNN: Instance Segmentation...2021-04-03Code
134Anchor DETR-DC5-R5044.2NoAnchor DETR: Query Design for Transformer-Based ...2021-09-15Code
135DAB-DETR-R10144.1NoDAB-DETR: Dynamic Anchor Boxes are Better Querie...2022-01-28Code
136Faster RCNN-R101-FPN+44NoEnd-to-End Object Detection with Transformers2020-05-26Code
137Cascade R-CNN (HRNetV2p-W32)43.7NoDeep High-Resolution Representation Learning for...2019-08-20Code
138Sparse R-CNN (ResNet-101, FPN)43.5NoSparse R-CNN: End-to-End Object Detection with L...2020-11-25Code
139ATSS (ResNet-50)43.5NoDeep Residual Learning for Image Recognition2015-12-10Code
140PVT-Large (RetinaNet 3x,MS)43.4NoPyramid Vision Transformer: A Versatile Backbone...2021-02-24Code
141ExtremeNet (Hourglass-104, multi-scale)43.3NoBottom-up Object Detection by Grouping Extreme a...2019-01-23Code
142Pix2seq (R50-DC5 )43.2NoPix2seq: A Language Modeling Framework for Objec...2021-09-22Code
143HTC (cascade)43.2NoHybrid Task Cascade for Instance Segmentation2019-01-22Code
144Mask R-CNN-FPN (ResNeXt-101, GN+WS)43.12NoMicro-Batch Training with Batch-Channel Normaliz...2019-03-25Code
145HTC (HRNetV2p-W18)43.1NoDeep High-Resolution Representation Learning for...2019-08-20Code
146Mask R-CNN (ResNet-101, DCNv2)43.1NoDeformable ConvNets v2: More Deformable, Better ...2018-11-27Code
147Conditional DETR-R5043NoConditional DETR for Fast Training Convergence2021-08-13Code
148HoughNet (HG-104)43NoHoughNet: Integrating near and long-range eviden...2020-07-05Code
149Faster R-CNN (FPN, X-volution)42.8NoX-volution: On the unification of convolution an...2021-06-04-
150Cascade R-CNN (ResNet-101-FPN+, cascade)42.7NoCascade R-CNN: Delving into High Quality Object ...2017-12-03Code
151PVT-Large (RetinaNet 1x)42.6NoPyramid Vision Transformer: A Versatile Backbone...2021-02-24Code
152CornerNet-Saccade (Hourglass-54)42.6NoCornerNet-Lite: Efficient Keypoint Based Object ...2019-04-18Code
153Pix2seq (R50)42.6NoPix2seq: A Language Modeling Framework for Objec...2021-09-22Code
154Mask R-CNN (ResNet-101-FPN, GroupNorm, long)42.3NoGroup Normalization2018-03-22Code
155Sparse R-CNN (ResNet-50, FPN)42.3NoSparse R-CNN: End-to-End Object Detection with L...2020-11-25Code
156Mask R-CNN (HRNetV2p-W32)42.3NoDeep High-Resolution Representation Learning for...2019-08-20Code
157DETR-ResNet50 with iRPE-K (300 epochs)42.3NoRethinking and Improving Relative Position Encod...2021-07-29Code
158TridentNet (ResNet-101)42NoScale-Aware Trident Networks for Object Detection2019-01-07Code
159R3-CNN (ResNet-50-FPN)42NoRecursively Refined R-CNN: Instance Segmentation...2021-04-03Code
160Faster R-CNN (HRNetV2p-W48)41.8NoDeep High-Resolution Representation Learning for...2019-08-20Code
161Faster R-CNN (LIP-ResNet-101)41.7NoLIP: Local Importance-based Pooling2019-08-12Code
162Faster R-CNN (ResNet-101, DCNv2)41.7NoDeformable ConvNets v2: More Deformable, Better ...2018-11-27Code
163FSAF (ResNeXt-101, anchor-based branches)41.6NoFeature Selective Anchor-Free Module for Single-...2019-03-02Code
164CornerNet-Saccade (Hourglass-104)41.4NoCornerNet-Lite: Efficient Keypoint Based Object ...2019-04-18Code
165Grid R-CNN (ResNet-101-FPN)41.3NoGrid R-CNN2018-11-29Code
166Cascade R-CNN (HRNetV2p-W18)41.3NoDeep High-Resolution Representation Learning for...2019-08-20Code
167CenterNet511 (Hourglass-52)41.3NoCenterNet: Keypoint Triplets for Object Detection2019-04-17Code
168RetinaMask (ResNet-101-FPN)41.1NoRetinaMask: Learning to predict masks improves s...2019-01-10Code
169PoolFormer-S36 (Mask R-CNN)41NoMetaFormer Is Actually What You Need for Vision2021-11-22Code
170Faster R-CNN (HRNetV2p-W32)40.9NoDeep High-Resolution Representation Learning for...2019-08-20Code
171VirTex Mask R-CNN (ResNet-50-FPN)40.9NoVirTex: Learning Visual Representations from Tex...2020-06-11Code
172Mask R-CNN (ResNet-101 + 1 NL)40.8NoNon-local Neural Networks2017-11-21Code
173Mask R-CNN (ResNet-50-FPN, GroupNorm, long)40.8NoGroup Normalization2018-03-22Code
174RPDet (ResNet-50, multi-scale train)40.8NoRepPoints: Point Set Representation for Object D...2019-04-25Code
175DETR-ResNet50 with iRPE-K (150 epochs)40.8NoRethinking and Improving Relative Position Encod...2021-07-29Code
176Faster R-CNN+aLRP Loss (ResNet-50, 500 scale)40.7NoA Ranking-based, Balanced Loss Function Unifying...2020-09-28Code
177PPDet (ResNet-101-FPN)40.5NoReducing Label Noise in Anchor-Free Object Detec...2020-08-03Code
178GCnet (ResNet-50-FPN, GRoIE)40.3NoGCNet: Non-local Networks Meet Squeeze-Excitatio...2019-04-25Code
179Mask R-CNN (ResNet-50-FPN, GroupNorm)40.3NoGroup Normalization2018-03-22Code
180Cascade R-CNN (ResNet-50-FPN+)40.3NoCascade R-CNN: Delving into High Quality Object ...2017-12-03Code
181ExtremeNet (Hourglass-104, single-scale)40.3NoBottom-up Object Detection by Grouping Extreme a...2019-01-23Code
182RPDet (ResNet-101)40.3NoRepPoints: Point Set Representation for Object D...2019-04-25Code
183RetinaNet+aLRP Loss (ResNet-50, 500 scale)40.2NoA Ranking-based, Balanced Loss Function Unifying...2020-09-28Code
184Mask R-CNN (ResNet-101-FPN)40NoMask R-CNN2017-03-20Code
185FPN+39.8NoFeature Pyramid Networks for Object Detection2016-12-09Code
186FoveaBox+aLRP Loss (ResNet-50, 500 scale)39.7NoA Ranking-based, Balanced Loss Function Unifying...2020-09-28Code
187Grid R-CNN (ResNet-50-FPN)39.6NoGrid R-CNN2018-11-29Code
188Mask R-CNN (ResNet-50, ACNet)39.5NoAdaptively Connected Neural Networks2019-04-07Code
189FSAF (ResNet-101, anchor-based branches)39.3NoFeature Selective Anchor-Free Module for Single-...2019-03-02Code
190Mask R-CNN (HRNetV2p-W18)39.2NoDeep High-Resolution Representation Learning for...2019-08-20Code
191Mask R-CNN (ResNet-50 + 1 NL)39NoNon-local Neural Networks2017-11-21Code
192FoveaBox (ResNet-101-FPN, 800x800)38.9NoFoveaBox: Beyond Anchor-based Object Detector2019-04-08Code
193FCOS (ResNet-50-FPN + improvements)38.6NoFCOS: Fully Convolutional One-Stage Object Detec...2019-04-02Code
194RPDet (ResNet-50)38.6NoRepPoints: Point Set Representation for Object D...2019-04-25Code
195Libra R-CNN (ResNet-50 FPN)38.5NoLibra R-CNN: Towards Balanced Learning for Objec...2019-04-04Code
196Mask R-CNN (ResNet-50-FPN, GRoIE)38.4NoA novel Region of Interest Extraction Layer for ...2020-04-28Code
197CornerNet511 (Hourglass-104)38.4NoCornerNet: Detecting Objects as Paired Keypoints2018-08-03Code
198FoveaBox+Retina (ResNet-50)38.1NoFoveaBox: Beyond Anchor-based Object Detector2019-04-08Code
199Faster R-CNN (HRNetV2p-W18)38NoDeep High-Resolution Representation Learning for...2019-08-20Code
200FoveaBox (ResNet-101-FPN, 600x600)38NoFoveaBox: Beyond Anchor-based Object Detector2019-04-08Code
201FSAF (ResNet-101)37.9NoFeature Selective Anchor-Free Module for Single-...2019-03-02Code
202Mask R-CNN (ResNet-50-FPN)37.7NoMask R-CNN2017-03-20Code
203Faster R-CNN (ResNet-50-FPN, GRoIE)37.5NoA novel Region of Interest Extraction Layer for ...2020-04-28Code
204Mask R-CNN (ResNeXt-101-FPN)36.7NoMask R-CNN2017-03-20Code
205FoveaBox (ResNet-50-FPN, 600x600)36NoFoveaBox: Beyond Anchor-based Object Detector2019-04-08Code
206FSAF (ResNet-50)35.9NoFeature Selective Anchor-Free Module for Single-...2019-03-02Code
207GHM-C + GHM-R (RetinaNet-FPN-ResNet-50, M=30)35.8NoGradient Harmonized Single-stage Detector2018-11-13Code
208Online Fg Bal. Sampling+Hard Negative Mining (ResNet-50)35.6NoGenerating Positive Bounding Boxes for Balanced ...2019-09-21Code
209M2Det (ResNet-1o1, 320x320)34.1NoM2Det: A Single-Shot Object Detector based on Mu...2018-11-12Code
210Faster R-CNN (Res2Net-50)33.7NoRes2Net: A New Multi-scale Backbone Architecture2019-04-02Code
211M2Det (VGG-16, 320x320)33.2NoM2Det: A Single-Shot Object Detector based on Mu...2018-11-12Code