| 1 | Focal-Stable-DINO (Focal-Huge, no TTA) | 78.5 | Yes | A Strong and Reproducible Object Detector with O... | 2023-04-25 | Code |
| 2 | EVA | 78.5 | Yes | EVA: Exploring the Limits of Masked Visual Repre... | 2022-11-14 | Code |
| 3 | UNINEXT-H | 75.3 | Yes | Universal Instance Perception as Object Discover... | 2023-03-12 | Code |
| 4 | DyHead (Swin-L, multi scale, self-training) | 74.2 | Yes | Dynamic Head: Unifying Object Detection Heads wi... | 2021-06-15 | Code |
| 5 | Focal-L (DyHead, multi-scale) | 73.4 | No | Focal Self-attention for Local-Global Interactio... | 2021-07-01 | Code |
| 6 | DyHead (Swin-L, multi scale) | 73.2 | No | Dynamic Head: Unifying Object Detection Heads wi... | 2021-06-15 | Code |
| 7 | SOLQ (Swin-L, single scale) | 71.9 | No | SOLQ: Segmenting Objects by Learning Queries | 2021-06-04 | Code |
| 8 | QueryInst (single scale) | 71.5 | No | Instances as Queries | 2021-05-05 | Code |
| 9 | Cascade RCNN-RS (SpineNet-143L, single scale) | 70.6 | No | Simple Training Strategies and Model Scaling for... | 2021-06-30 | Code |
| 10 | Cascade RCNN-RS (ResNet-200, single scale) | 70.3 | No | Simple Training Strategies and Model Scaling for... | 2021-06-30 | Code |
| 11 | YOLOR-D6 (1280, single-scale, 31 fps) | 68.7 | No | You Only Learn One Representation: Unified Netwo... | 2021-05-10 | Code |
| 12 | UniverseNet-20.08d (Res2Net-101, DCN, multi-scale) | 68.1 | No | USB: Universal-Scale Object Detection Benchmark | 2021-03-25 | Code |
| 13 | EfficientDet-D7x (single-scale) | 67.9 | No | EfficientDet: Scalable and Efficient Object Dete... | 2019-11-20 | Code |
| 14 | YOLOv4-P7 CSP-P7 (single-scale, 16 fps) | 67.4 | No | Scaled-YOLOv4: Scaling Cross Stage Partial Network | 2020-11-16 | Code |
| 15 | DyHead (ResNeXt-64x4d-101-DCN, multi scale) | 66.3 | No | Dynamic Head: Unifying Object Detection Heads wi... | 2021-06-15 | Code |
| 16 | ResNeSt-200 (multi-scale) | 66.29 | No | ResNeSt: Split-Attention Networks | 2020-04-19 | Code |
| 17 | ResNeSt-200-DCN (single-scale) | 65.83 | No | ResNeSt: Split-Attention Networks | 2020-04-19 | Code |
| 18 | DINO-5scale (24 epoch) | 65.8 | No | DINO: DETR with Improved DeNoising Anchor Boxes ... | 2022-03-07 | Code |
| 19 | UniverseNet-20.08d (Res2Net-101, DCN, single-scale) | 65.8 | No | USB: Universal-Scale Object Detection Benchmark | 2021-03-25 | Code |
| 20 | DN-Deformable-DETR-R50++ | 65.4 | No | DN-DETR: Accelerate DETR Training by Introducing... | 2022-03-02 | Code |
| 21 | DINO-5scale (36 epoch) | 65.3 | No | DINO: DETR with Improved DeNoising Anchor Boxes ... | 2022-03-07 | Code |
| 22 | YOLOR-P6 (1280, single-scale, 72 fps) | 65.2 | No | You Only Learn One Representation: Unified Netwo... | 2021-05-10 | Code |
| 23 | REGO-Deformable DETR-X101 | 65 | No | Recurrent Glimpse-based Decoder for Detection wi... | 2021-12-09 | Code |
| 24 | DAB-DETR-DC5-R101 | 64.1 | No | DAB-DETR: Dynamic Anchor Boxes are Better Querie... | 2022-01-28 | Code |
| 25 | ResNeSt-200 (single-scale) | 63.9 | No | ResNeSt: Split-Attention Networks | 2020-04-19 | Code |
| 26 | Conditional DETR-R101 | 63.6 | No | Conditional DETR for Fast Training Convergence | 2021-08-13 | Code |
| 27 | Conditional DETR-DC5-R101 | 63.3 | No | Conditional DETR for Fast Training Convergence | 2021-08-13 | Code |
| 28 | DAB-DETR-R101 | 62.9 | No | DAB-DETR: Dynamic Anchor Boxes are Better Querie... | 2022-01-28 | Code |
| 29 | UniverseNet-20.08 (Res2Net-50, DCN, single-scale) | 62.7 | No | USB: Universal-Scale Object Detection Benchmark | 2021-03-25 | Code |
| 30 | DETR-DC5 (ResNet-101) | 62.3 | No | End-to-End Object Detection with Transformers | 2020-05-26 | Code |
| 31 | HTC (HRNetV2p-W48) | 62.2 | No | Deep High-Resolution Representation Learning for... | 2019-08-20 | Code |
| 32 | Conditional DETR-DC5-R50 | 62.2 | No | Conditional DETR for Fast Training Convergence | 2021-08-13 | Code |
| 33 | Res2Net101+HTC | 62.1 | No | Res2Net: A New Multi-scale Backbone Architecture | 2019-04-02 | Code |
| 34 | Sparse R-CNN (ResNet-101, learnable proposals, random crop aug, FPN) | 61.6 | No | Sparse R-CNN: End-to-End Object Detection with L... | 2020-11-25 | Code |
| 35 | Anchor DETR-DC5-R101 | 61.6 | No | Anchor DETR: Query Design for Transformer-Based ... | 2021-09-15 | Code |
| 36 | Conditional DETR-R50 | 61.5 | No | Conditional DETR for Fast Training Convergence | 2021-08-13 | Code |
| 37 | MAE-Det(MAE-Det-L+GFLV2) | 61.1 | No | MAE-DET: Revisiting Maximum Entropy Principle in... | 2021-11-26 | Code |
| 38 | Anchor DETR-DC5-R50 | 60.6 | No | Anchor DETR: Query Design for Transformer-Based ... | 2021-09-15 | Code |
| 39 | Pix2seq (R101-DC5) | 60.4 | No | Pix2seq: A Language Modeling Framework for Objec... | 2021-09-22 | Code |
| 40 | Mask R-CNN (HRNetV2p-W48, cascade) | 60.1 | No | Deep High-Resolution Representation Learning for... | 2019-08-20 | Code |
| 41 | HoughNet (HG-104, MS) | 59.7 | No | HoughNet: Integrating near and long-range eviden... | 2020-07-05 | Code |
| 42 | Sparse R-CNN (ResNet-101, FPN) | 59.7 | No | Sparse R-CNN: End-to-End Object Detection with L... | 2020-11-25 | Code |
| 43 | R3-CNN (ResNet-50-FPN, DCN) | 59.6 | No | Recursively Refined R-CNN: Instance Segmentation... | 2021-04-03 | Code |
| 44 | HTC (HRNetV2p-W32) | 59.5 | No | Deep High-Resolution Representation Learning for... | 2019-08-20 | Code |
| 45 | Sparse R-CNN (ResNet-50, learnable proposals, random crop aug, FPN) | 59.5 | No | Sparse R-CNN: End-to-End Object Detection with L... | 2020-11-25 | Code |
| 46 | PVT-Large (RetinaNet 3x,MS) | 59.5 | No | Pyramid Vision Transformer: A Versatile Backbone... | 2021-02-24 | Code |
| 47 | ExtremeNet (Hourglass-104, multi-scale) | 59.4 | No | Bottom-up Object Detection by Grouping Extreme a... | 2019-01-23 | Code |
| 48 | R3-CNN (ResNet-50-FPN, GC-Net) | 58.9 | No | Recursively Refined R-CNN: Instance Segmentation... | 2021-04-03 | Code |
| 49 | CenterMask+VoVNetV2-99 (single-scale) | 58.8 | No | CenterMask : Real-Time Anchor-Free Instance Segm... | 2019-11-15 | Code |
| 50 | Faster R-CNN (ResNet-101, DCNv2) | 58.7 | No | Deformable ConvNets v2: More Deformable, Better ... | 2018-11-27 | Code |
| 51 | Pix2seq (R50-DC5 ) | 58.6 | No | Pix2seq: A Language Modeling Framework for Objec... | 2021-09-22 | Code |
| 52 | Cascade R-CNN (HRNetV2p-W48) | 58.5 | No | Deep High-Resolution Representation Learning for... | 2019-08-20 | Code |
| 53 | PVT-Large (RetinaNet 1x) | 58.4 | No | Pyramid Vision Transformer: A Versatile Backbone... | 2021-02-24 | Code |
| 54 | CornerNet-Saccade (Hourglass-54) | 58.4 | No | CornerNet-Lite: Efficient Keypoint Based Object ... | 2019-04-18 | Code |
| 55 | RetinaNet (ViL-Base) | 58.3 | No | Multi-Scale Vision Longformer: A New Vision Tran... | 2021-03-29 | Code |
| 56 | RetinaNet (ViL-Base, multi-scale, 3x) | 58.1 | No | Multi-Scale Vision Longformer: A New Vision Tran... | 2021-03-29 | Code |
| 57 | Mask R-CNN (VoVNetV2-99, single-scale) | 57.7 | No | CenterMask : Real-Time Anchor-Free Instance Segm... | 2019-11-15 | Code |
| 58 | Sparse R-CNN (ResNet-50, FPN) | 57.6 | No | Sparse R-CNN: End-to-End Object Detection with L... | 2020-11-25 | Code |
| 59 | Cascade R-CNN (HRNetV2p-W32) | 57.4 | No | Deep High-Resolution Representation Learning for... | 2019-08-20 | Code |
| 60 | Cascade R-CNN (ResNet-101-FPN+, cascade) | 57.4 | No | Cascade R-CNN: Delving into High Quality Object ... | 2017-12-03 | Code |
| 61 | CenterMask+X101-32x8d (single-scale) | 57.1 | No | CenterMask : Real-Time Anchor-Free Instance Segm... | 2019-11-15 | Code |
| 62 | CornerNet-Saccade (Hourglass-104) | 57.1 | No | CornerNet-Lite: Efficient Keypoint Based Object ... | 2019-04-18 | Code |
| 63 | TridentNet (ResNet-101) | 56.9 | No | Scale-Aware Trident Networks for Object Detection | 2019-01-07 | Code |
| 64 | Mask R-CNN-FPN (ResNeXt-101, GN+WS) | 56.39 | No | Micro-Batch Training with Batch-Channel Normaliz... | 2019-03-25 | Code |
| 65 | ExtremeNet (Hourglass-104, single-scale) | 56.1 | No | Bottom-up Object Detection by Grouping Extreme a... | 2019-01-23 | Code |
| 66 | Faster RCNN-R101-FPN+ | 56 | No | End-to-End Object Detection with Transformers | 2020-05-26 | Code |
| 67 | HoughNet (HG-104) | 55.8 | No | HoughNet: Integrating near and long-range eviden... | 2020-07-05 | Code |
| 68 | CenterNet511 (Hourglass-52) | 55.8 | No | CenterNet: Keypoint Triplets for Object Detection | 2019-04-17 | Code |
| 69 | R3-CNN (ResNet-50-FPN) | 55.7 | No | Recursively Refined R-CNN: Instance Segmentation... | 2021-04-03 | Code |
| 70 | Faster R-CNN (FPN, X-volution) | 55 | No | X-volution: On the unification of convolution an... | 2021-06-04 | - |
| 71 | Faster R-CNN (HRNetV2p-W48) | 54.6 | No | Deep High-Resolution Representation Learning for... | 2019-08-20 | Code |
| 72 | Grid R-CNN (ResNet-101-FPN) | 54.1 | No | Grid R-CNN | 2018-11-29 | Code |
| 73 | Cascade R-CNN (HRNetV2p-W18) | 54.1 | No | Deep High-Resolution Representation Learning for... | 2019-08-20 | Code |
| 74 | Cascade R-CNN (ResNet-50-FPN+) | 54.1 | No | Cascade R-CNN: Delving into High Quality Object ... | 2017-12-03 | Code |
| 75 | Faster R-CNN (HRNetV2p-W32) | 53.3 | No | Deep High-Resolution Representation Learning for... | 2019-08-20 | Code |
| 76 | FoveaBox (ResNet-101-FPN, 600x600) | 52.7 | No | FoveaBox: Beyond Anchor-based Object Detector | 2019-04-08 | Code |
| 77 | FPN+ | 52.6 | No | Feature Pyramid Networks for Object Detection | 2016-12-09 | Code |
| 78 | GCnet (ResNet-50-FPN, GRoIE) | 52.5 | No | GCNet: Non-local Networks Meet Squeeze-Excitatio... | 2019-04-25 | Code |
| 79 | HTC (cascade) | 52.3 | No | Hybrid Task Cascade for Instance Segmentation | 2019-01-22 | Code |
| 80 | PPDet (ResNet-101-FPN) | 52.3 | No | Reducing Label Noise in Anchor-Free Object Detec... | 2020-08-03 | Code |
| 81 | CornerNet511 (Hourglass-104) | 51.8 | No | CornerNet: Detecting Objects as Paired Keypoints | 2018-08-03 | Code |
| 82 | FoveaBox (ResNet-101-FPN, 800x800) | 51.7 | No | FoveaBox: Beyond Anchor-based Object Detector | 2019-04-08 | Code |
| 83 | Grid R-CNN (ResNet-50-FPN) | 51.5 | No | Grid R-CNN | 2018-11-29 | Code |
| 84 | Faster R-CNN (Res2Net-50) | 51.1 | No | Res2Net: A New Multi-scale Backbone Architecture | 2019-04-02 | Code |
| 85 | Mask R-CNN (HRNetV2p-W18) | 51 | No | Deep High-Resolution Representation Learning for... | 2019-08-20 | Code |
| 86 | Libra R-CNN (ResNet-50 FPN) | 50.5 | No | Libra R-CNN: Towards Balanced Learning for Objec... | 2019-04-04 | Code |
| 87 | FoveaBox (ResNet-50-FPN, 600x600) | 50.5 | No | FoveaBox: Beyond Anchor-based Object Detector | 2019-04-08 | Code |
| 88 | FCOS (ResNet-50-FPN + improvements) | 49.8 | No | FCOS: Fully Convolutional One-Stage Object Detec... | 2019-04-02 | Code |
| 89 | Mask R-CNN (ResNet-50-FPN, GRoIE) | 49.7 | No | A novel Region of Interest Extraction Layer for ... | 2020-04-28 | Code |
| 90 | Faster R-CNN (HRNetV2p-W18) | 49.6 | No | Deep High-Resolution Representation Learning for... | 2019-08-20 | Code |
| 91 | M2Det (ResNet-1o1, 320x320) | 49.3 | No | M2Det: A Single-Shot Object Detector based on Mu... | 2018-11-12 | Code |
| 92 | M2Det (VGG-16, 320x320) | 49.1 | No | M2Det: A Single-Shot Object Detector based on Mu... | 2018-11-12 | Code |
| 93 | FSAF (ResNet-50) | 48.2 | No | Feature Selective Anchor-Free Module for Single-... | 2019-03-02 | Code |
| 94 | Faster R-CNN (ResNet-50-FPN, GRoIE) | 47.8 | No | A novel Region of Interest Extraction Layer for ... | 2020-04-28 | Code |
| 95 | GHM-C + GHM-R (RetinaNet-FPN-ResNet-50, M=30) | 46.7 | No | Gradient Harmonized Single-stage Detector | 2018-11-13 | Code |