| 1 | PE_spatial (DETA) | 66 | Yes | Perception Encoder: The best visual embeddings a... | 2025-04-17 | Code |
| 2 | Co-DETR | 65.9 | Yes | DETRs with Collaborative Hybrid Assignments Trai... | 2022-11-22 | Code |
| 3 | M3I Pre-training (InternImage-H) | 65 | Yes | Towards All-in-one Pre-training via Maximizing M... | 2022-11-17 | Code |
| 4 | InternImage-H | 65 | Yes | InternImage: Exploring Large-Scale Vision Founda... | 2022-11-10 | Code |
| 5 | Co-DETR (Swin-L) | 64.7 | Yes | DETRs with Collaborative Hybrid Assignments Trai... | 2022-11-22 | Code |
| 6 | Focal-Stable-DINO (Focal-Huge, no TTA) | 64.6 | Yes | A Strong and Reproducible Object Detector with O... | 2023-04-25 | Code |
| 7 | EVA | 64.5 | Yes | EVA: Exploring the Limits of Masked Visual Repre... | 2022-11-14 | Code |
| 8 | ViT-CoMer | 64.3 | No | - | - | Code |
| 9 | FocalNet-H (DINO) | 64.2 | Yes | Focal Modulation Networks | 2022-03-22 | Code |
| 10 | InternImage-XL | 64.2 | Yes | InternImage: Exploring Large-Scale Vision Founda... | 2022-11-10 | Code |
| 11 | CP-DETR-L Swin-L(Fine tuning separately in COCO) | 64.1 | Yes | CP-DETR: Concept Prompt Guide DETR Toward Strong... | 2024-12-13 | - |
| 12 | RevCol-H(DINO) | 63.8 | Yes | Reversible Column Networks | 2022-12-22 | Code |
| 13 | DINO (Swin-L) | 63.2 | No | DINO: DETR with Improved DeNoising Anchor Boxes ... | 2022-03-07 | Code |
| 14 | Grounding DINO | 63 | Yes | Grounding DINO: Marrying DINO with Grounded Pre-... | 2023-03-09 | Code |
| 15 | SwinV2-G (HTC++) | 62.5 | Yes | Swin Transformer V2: Scaling Up Capacity and Res... | 2021-11-18 | Code |
| 16 | Florence-CoSwin-H | 62 | Yes | Florence: A New Foundation Model for Computer Vi... | 2021-11-22 | Code |
| 17 | GLEE-Pro | 62 | Yes | General Object Foundation Model for Images and V... | 2023-12-14 | Code |
| 18 | ViTDet, ViT-H Cascade (multiscale) | 61.3 | No | Exploring Plain Vision Transformer Backbones for... | 2022-03-30 | Code |
| 19 | GLIP (Swin-L, multi-scale) | 60.8 | Yes | Grounded Language-Image Pre-training | 2021-12-07 | Code |
| 20 | Soft Teacher + Swin-L (HTC++, multi-scale) | 60.7 | Yes | End-to-End Semi-Supervised Object Detection with... | 2021-06-16 | Code |
| 21 | UNINEXT-H | 60.6 | Yes | Universal Instance Perception as Object Discover... | 2023-03-12 | Code |
| 22 | ViT-Adapter-L (HTC++, BEiTv2 pretrain, multi-scale) | 60.5 | No | Vision Transformer Adapter for Dense Predictions | 2022-05-17 | Code |
| 23 | ViTDet, ViT-H Cascade | 60.4 | No | Exploring Plain Vision Transformer Backbones for... | 2022-03-30 | Code |
| 24 | GLEE-Plus | 60.4 | Yes | General Object Foundation Model for Images and V... | 2023-12-14 | Code |
| 25 | DyHead (Swin-L, multi scale, self-training) | 60.3 | Yes | Dynamic Head: Unifying Object Detection Heads wi... | 2021-06-15 | Code |
| 26 | ViT-Adapter-L (HTC++, BEiT pretrain, multi-scale) | 60.2 | No | Vision Transformer Adapter for Dense Predictions | 2022-05-17 | Code |
| 27 | Soft Teacher+Swin-L(HTC++, single scale) | 60.1 | Yes | End-to-End Semi-Supervised Object Detection with... | 2021-06-16 | Code |
| 28 | CBNetV2 (Dual-Swin-L HTC, multi-scale) | 59.6 | No | CBNet: A Composite Backbone Network Architecture... | 2021-07-01 | Code |
| 29 | Frozen Backbone, SwinV2-G-ext22K (HTC) | 59.3 | No | Could Giant Pretrained Image Models Extract Univ... | 2022-11-03 | - |
| 30 | HorNet-L | 59.2 | No | HorNet: Efficient High-Order Spatial Interaction... | 2022-07-28 | Code |
| 31 | MOAT-3 (IN-22K pretraining, single-scale) | 59.2 | No | MOAT: Alternating Mobile Convolution and Attenti... | 2022-10-04 | Code |
| 32 | CBNetV2 (Dual-Swin-L HTC, multi-scale) | 59.1 | No | CBNet: A Composite Backbone Network Architecture... | 2021-07-01 | Code |
| 33 | Focal-L (DyHead, multi-scale) | 58.7 | No | Focal Self-attention for Local-Global Interactio... | 2021-07-01 | Code |
| 34 | MViTv2-L (Cascade Mask R-CNN, multi-scale, IN21k pre-train) | 58.7 | No | MViTv2: Improved Multiscale Vision Transformers ... | 2021-12-02 | Code |
| 35 | MOAT-2 (IN-22K pretraining, single-scale) | 58.5 | No | MOAT: Alternating Mobile Convolution and Attenti... | 2022-10-04 | Code |
| 36 | DyHead (Swin-L, multi scale) | 58.4 | No | Dynamic Head: Unifying Object Detection Heads wi... | 2021-06-15 | Code |
| 37 | Swin-L (HTC++, multi scale) | 58 | No | Swin Transformer: Hierarchical Vision Transforme... | 2021-03-25 | Code |
| 38 | MOAT-1 (IN-1K pretraining, single-scale) | 57.7 | No | MOAT: Alternating Mobile Convolution and Attenti... | 2022-10-04 | Code |
| 39 | UM-MAE(HTC++, Swin-L, IN1K) | 57.4 | No | Uniform Masking: Enabling MAE Pre-training for P... | 2022-05-20 | Code |
| 40 | YOLOv6-L6(46 fps, 1280, V100) | 57.2 | No | YOLOv6 v3.0: A Full-Scale Reloading | 2023-01-13 | Code |
| 41 | Swin-L (HTC++, single scale) | 57.1 | No | Swin Transformer: Hierarchical Vision Transforme... | 2021-03-25 | Code |
| 42 | TransNeXt-Base (IN-1K pretrain, DINO 1x) | 57.1 | No | TransNeXt: Robust Foveal Visual Perception for V... | 2023-11-28 | Code |
| 43 | Cascade Eff-B7 NAS-FPN (1280, self-training Copy Paste, single-scale) | 57 | Yes | Simple Copy-Paste is a Strong Data Augmentation ... | 2020-12-13 | Code |
| 44 | TransNeXt-Small (IN-1K pretrain, DINO 1x) | 56.6 | No | TransNeXt: Robust Foveal Visual Perception for V... | 2023-11-28 | Code |
| 45 | QueryInst (single scale) | 56.1 | No | Instances as Queries | 2021-05-05 | Code |
| 46 | MViTv2-H (Cascade Mask R-CNN, single-scale, IN21k pre-train) | 56.1 | No | MViTv2: Improved Multiscale Vision Transformers ... | 2021-12-02 | Code |
| 47 | MOAT-0 (IN-1K pretraining, single-scale) | 55.9 | No | MOAT: Alternating Mobile Convolution and Attenti... | 2022-10-04 | Code |
| 48 | TransNeXt-Tiny (IN-1K pretrain, DINO 1x) | 55.7 | No | TransNeXt: Robust Foveal Visual Perception for V... | 2023-11-28 | Code |
| 49 | YOLOv4-P7 CSP-P7 (single-scale, 16 fps) | 55.4 | No | Scaled-YOLOv4: Scaling Cross Stage Partial Network | 2020-11-16 | Code |
| 50 | tiny-MOAT-3 (IN-1K pretraining, single-scale) | 55.2 | No | MOAT: Alternating Mobile Convolution and Attenti... | 2022-10-04 | Code |
| 51 | FAN-L-Hybrid | 55.1 | No | Understanding The Robustness in Vision Transform... | 2022-04-26 | Code |
| 52 | Hiera-L | 55 | No | Hiera: A Hierarchical Vision Transformer without... | 2023-06-01 | Code |
| 53 | GLEE-Lite | 55 | Yes | General Object Foundation Model for Images and V... | 2023-12-14 | Code |
| 54 | TEC(VIT-B, Mask-RCNN) | 54.6 | No | Towards Sustainable Self-supervised Learning | 2022-10-20 | Code |
| 55 | Cascade Eff-B7 NAS-FPN (1280) | 54.5 | No | Simple Copy-Paste is a Strong Data Augmentation ... | 2020-12-13 | Code |
| 56 | CAE (ViT-L, Mask R-CNN, 1x schedule) | 54.5 | No | Context Autoencoder for Self-Supervised Represen... | 2022-02-07 | Code |
| 57 | MViTv2-L (Cascade Mask R-CNN, single-scale) | 54.3 | No | MViTv2: Improved Multiscale Vision Transformers ... | 2021-12-02 | Code |
| 58 | SpineNet-190 (1280, with Self-training on OpenImages, single-scale) | 54.2 | Yes | Rethinking Pre-training and Self-training | 2020-06-11 | Code |
| 59 | Cascade RCNN-RS (SpineNet-143L, single scale) | 53.6 | No | Simple Training Strategies and Model Scaling for... | 2021-06-30 | Code |
| 60 | UniverseNet-20.08d (Res2Net-101, DCN, multi-scale) | 53.5 | No | USB: Universal-Scale Object Detection Benchmark | 2021-03-25 | Code |
| 61 | MAE (ViT-L, Mask R-CNN) | 53.3 | No | Masked Autoencoders Are Scalable Vision Learners | 2021-11-11 | Code |
| 62 | Cascade RCNN-RS (ResNet-200, single scale) | 53.1 | No | Simple Training Strategies and Model Scaling for... | 2021-06-30 | Code |
| 63 | tiny-MOAT-2 (IN-1K pretraining, single-scale) | 53 | No | MOAT: Alternating Mobile Convolution and Attenti... | 2022-10-04 | Code |
| 64 | MViT-L (Mask R-CNN, single-scale, IN21k pre-train) | 52.7 | No | MViTv2: Improved Multiscale Vision Transformers ... | 2021-12-02 | Code |
| 65 | ResNeSt-200 (multi-scale) | 52.47 | No | ResNeSt: Split-Attention Networks | 2020-04-19 | Code |
| 66 | ActiveMLP-B (Cascade Mask R-CNN) | 52.3 | No | Active Token Mixer | 2022-03-11 | Code |
| 67 | RetinaNet (SpineNet-190, 1536x1536) | 52.2 | No | SpineNet: Learning Scale-Permuted Backbone for R... | 2019-12-10 | Code |
| 68 | EfficientDet-D7 (1536) | 52.1 | No | EfficientDet: Scalable and Efficient Object Dete... | 2019-11-20 | Code |
| 69 | tiny-MOAT-1 (IN-1K pretraining, single-scale) | 51.9 | No | MOAT: Alternating Mobile Convolution and Attenti... | 2022-10-04 | Code |
| 70 | GCNet (ResNeXt-101 + DCN + cascade + GC r4) | 51.8 | No | Global Context Networks | 2020-12-24 | Code |
| 71 | ELSA-S (Cascade Mask RCNN) | 51.6 | No | ELSA: Enhanced Local Self-Attention for Vision T... | 2021-12-23 | Code |
| 72 | FocalNet-T (LRF, Cascade Mask R-CNN) | 51.5 | No | Focal Modulation Networks | 2022-03-22 | Code |
| 73 | DINO-5scale (24 epoch) | 51.3 | No | DINO: DETR with Improved DeNoising Anchor Boxes ... | 2022-03-07 | Code |
| 74 | DINO-5scale (36 epoch) | 51.2 | No | DINO: DETR with Improved DeNoising Anchor Boxes ... | 2022-03-07 | Code |
| 75 | ResNeSt-200-DCN (single-scale) | 50.91 | No | ResNeSt: Split-Attention Networks | 2020-04-19 | Code |
| 76 | UniverseNet-20.08d (Res2Net-101, DCN, single-scale) | 50.9 | No | USB: Universal-Scale Object Detection Benchmark | 2021-03-25 | Code |
| 77 | ResNeSt-200 (single-scale) | 50.54 | No | ResNeSt: Split-Attention Networks | 2020-04-19 | Code |
| 78 | tiny-MOAT-0 (IN-1K pretraining, single-scale) | 50.5 | No | MOAT: Alternating Mobile Convolution and Attenti... | 2022-10-04 | Code |
| 79 | MAE (ViT-B, Mask R-CNN) | 50.3 | No | Masked Autoencoders Are Scalable Vision Learners | 2021-11-11 | Code |
| 80 | Sparse R-CNN (PVTv2-B2) | 50.1 | No | PVT v2: Improved Baselines with Pyramid Vision T... | 2021-06-25 | Code |
| 81 | Pix2seq (ViT-L) | 50 | Yes | Pix2seq: A Language Modeling Framework for Objec... | 2021-09-22 | Code |
| 82 | DaViT-T (Mask R-CNN, 36 epochs) | 49.9 | No | DaViT: Dual Attention Vision Transformers | 2022-04-07 | Code |
| 83 | BoTNet 200 (Mask R-CNN, single scale, 72 epochs) | 49.7 | No | Bottleneck Transformers for Visual Recognition | 2021-01-27 | Code |
| 84 | BoTNet 152 (Mask R-CNN, single scale, 72 epochs) | 49.5 | No | Bottleneck Transformers for Visual Recognition | 2021-01-27 | Code |
| 85 | DN-Deformable-DETR-R50++ | 49.5 | No | DN-DETR: Accelerate DETR Training by Introducing... | 2022-03-02 | Code |
| 86 | REGO-Deformable DETR-X101 | 49.1 | No | Recurrent Glimpse-based Decoder for Detection wi... | 2021-12-09 | Code |
| 87 | CenterMask+VoVNet99 (multi-scale) | 48.6 | No | CenterMask : Real-Time Anchor-Free Instance Segm... | 2019-11-15 | Code |
| 88 | Mask R-CNN (ResNeXt-152-FPN, cascade) | 48.6 | No | Rethinking ImageNet Pre-training | 2018-11-21 | Code |
| 89 | UniverseNet-20.08 (Res2Net-50, DCN, single-scale) | 48.5 | No | USB: Universal-Scale Object Detection Benchmark | 2021-03-25 | Code |
| 90 | XCiT-M24/8 | 48.5 | No | XCiT: Cross-Covariance Image Transformers | 2021-06-17 | Code |
| 91 | ELSA-S (Mask RCNN) | 48.3 | No | ELSA: Enhanced Local Self-Attention for Vision T... | 2021-12-23 | Code |
| 92 | XCiT-S24/8 | 48.1 | No | XCiT: Cross-Covariance Image Transformers | 2021-06-17 | Code |
| 93 | GCNet (ResNeXt-101 + DCN + cascade + GC r16) | 47.9 | No | GCNet: Non-local Networks Meet Squeeze-Excitatio... | 2019-04-25 | Code |
| 94 | MAE-Det(MAE-Det-L+GFLV2) | 47.8 | No | MAE-DET: Revisiting Maximum Entropy Principle in... | 2021-11-26 | Code |
| 95 | Res2Net101+HTC | 47.5 | No | Res2Net: A New Multi-scale Backbone Architecture | 2019-04-02 | Code |
| 96 | Mask R-CNN (ResNet-101-FPN, GN, Cascade) | 47.4 | No | Rethinking ImageNet Pre-training | 2018-11-21 | Code |
| 97 | Pix2seq (R50-C4) | 47.3 | No | Pix2seq: A Language Modeling Framework for Objec... | 2021-09-22 | Code |
| 98 | Pix2seq (ViT-B) | 47.1 | No | Pix2seq: A Language Modeling Framework for Objec... | 2021-09-22 | Code |
| 99 | HTC (HRNetV2p-W48) | 47 | No | Deep High-Resolution Representation Learning for... | 2019-08-20 | Code |
| 100 | PatchConvNet-S120 (Mask R-CNN) | 47 | No | Augmenting Convolutional networks with attention... | 2021-12-27 | Code |
| 101 | RPDet (ResNeXt-101-DCN, multi-scale) | 46.8 | No | RepPoints: Point Set Representation for Object D... | 2019-04-25 | Code |
| 102 | DAB-DETR-DC5-R101 | 46.6 | No | DAB-DETR: Dynamic Anchor Boxes are Better Querie... | 2022-01-28 | Code |
| 103 | DyHead (ResNet-101) | 46.5 | No | Dynamic Head: Unifying Object Detection Heads wi... | 2021-06-15 | Code |
| 104 | Mask R-CNN (ResNeXt-152-FPN) | 46.4 | No | Rethinking ImageNet Pre-training | 2018-11-21 | Code |
| 105 | RPDet (ResNet-101-DCN, multi-scale) | 46.4 | No | RepPoints: Point Set Representation for Object D... | 2019-04-25 | Code |
| 106 | PatchConvNet-S60 (Mask R-CNN) | 46.4 | No | Augmenting Convolutional networks with attention... | 2021-12-27 | Code |
| 107 | Cascade Mask R-CNN (ResNet-50) | 46.3 | No | Deep Residual Learning for Image Recognition | 2015-12-10 | Code |
| 108 | HoughNet (HG-104, MS) | 46.1 | No | HoughNet: Integrating near and long-range eviden... | 2020-07-05 | Code |
| 109 | Mask R-CNN (HRNetV2p-W48, cascade) | 46 | No | Deep High-Resolution Representation Learning for... | 2019-08-20 | Code |
| 110 | Conditional DETR-DC5-R101 | 45.9 | No | Conditional DETR for Fast Training Convergence | 2021-08-13 | Code |
| 111 | BoTNet 50 (72 epochs) | 45.9 | No | Bottleneck Transformers for Visual Recognition | 2021-01-27 | Code |
| 112 | Sparse R-CNN (ResNet-101, learnable proposals, random crop aug, FPN) | 45.6 | No | Sparse R-CNN: End-to-End Object Detection with L... | 2020-11-25 | Code |
| 113 | CenterMask+VoVNetV2-99 (single-scale) | 45.6 | No | CenterMask : Real-Time Anchor-Free Instance Segm... | 2019-11-15 | Code |
| 114 | HTC (HRNetV2p-W32) | 45.3 | No | Deep High-Resolution Representation Learning for... | 2019-08-20 | Code |
| 115 | Anchor DETR-DC5-R101 | 45.1 | No | Anchor DETR: Query Design for Transformer-Based ... | 2021-09-15 | Code |
| 116 | Conditional DETR-DC5-R50 | 45.1 | No | Conditional DETR for Fast Training Convergence | 2021-08-13 | Code |
| 117 | Mask R-CNN (ResNeXt-152 + 1 NL) | 45 | No | Non-local Neural Networks | 2017-11-21 | Code |
| 118 | Pix2seq (R101-DC5) | 45 | No | Pix2seq: A Language Modeling Framework for Objec... | 2021-09-22 | Code |
| 119 | Mask R-CNN-FPN (AOGNet-40M) | 44.9 | No | Attentive Normalization | 2019-08-04 | Code |
| 120 | DETR-DC5 (ResNet-101) | 44.9 | No | End-to-End Object Detection with Transformers | 2020-05-26 | Code |
| 121 | Mask R-CNN (VoVNetV2-99, single-scale) | 44.9 | No | CenterMask : Real-Time Anchor-Free Instance Segm... | 2019-11-15 | Code |
| 122 | R3-CNN (ResNet-50-FPN, DCN) | 44.8 | No | Recursively Refined R-CNN: Instance Segmentation... | 2021-04-03 | Code |
| 123 | RPDet (ResNet-101-DCN, multi-scale train) | 44.8 | No | RepPoints: Point Set Representation for Object D... | 2019-04-25 | Code |
| 124 | RetinaNet (ViL-Base, multi-scale, 3x) | 44.7 | No | Multi-Scale Vision Longformer: A New Vision Tran... | 2021-03-29 | Code |
| 125 | Cascade R-CNN (HRNetV2p-W48) | 44.6 | No | Deep High-Resolution Representation Learning for... | 2019-08-20 | Code |
| 126 | CenterMask+VoVNetV2-57 (single-scale) | 44.6 | No | CenterMask : Real-Time Anchor-Free Instance Segm... | 2019-11-15 | Code |
| 127 | Conditional DETR-R101 | 44.5 | No | Conditional DETR for Fast Training Convergence | 2021-08-13 | Code |
| 128 | Sparse R-CNN (ResNet-50, learnable proposals, random crop aug, FPN) | 44.5 | No | Sparse R-CNN: End-to-End Object Detection with L... | 2020-11-25 | Code |
| 129 | GFL (ResNet-50) | 44.5 | No | Deep Residual Learning for Image Recognition | 2015-12-10 | Code |
| 130 | RPDet (ResNeXt-101-DCN) | 44.5 | No | RepPoints: Point Set Representation for Object D... | 2019-04-25 | Code |
| 131 | CenterMask+X101-32x8d (single-scale) | 44.4 | No | CenterMask : Real-Time Anchor-Free Instance Segm... | 2019-11-15 | Code |
| 132 | RetinaNet (ViL-Base) | 44.3 | No | Multi-Scale Vision Longformer: A New Vision Tran... | 2021-03-29 | Code |
| 133 | R3-CNN (ResNet-50-FPN, GC-Net) | 44.3 | No | Recursively Refined R-CNN: Instance Segmentation... | 2021-04-03 | Code |
| 134 | Anchor DETR-DC5-R50 | 44.2 | No | Anchor DETR: Query Design for Transformer-Based ... | 2021-09-15 | Code |
| 135 | DAB-DETR-R101 | 44.1 | No | DAB-DETR: Dynamic Anchor Boxes are Better Querie... | 2022-01-28 | Code |
| 136 | Faster RCNN-R101-FPN+ | 44 | No | End-to-End Object Detection with Transformers | 2020-05-26 | Code |
| 137 | Cascade R-CNN (HRNetV2p-W32) | 43.7 | No | Deep High-Resolution Representation Learning for... | 2019-08-20 | Code |
| 138 | Sparse R-CNN (ResNet-101, FPN) | 43.5 | No | Sparse R-CNN: End-to-End Object Detection with L... | 2020-11-25 | Code |
| 139 | ATSS (ResNet-50) | 43.5 | No | Deep Residual Learning for Image Recognition | 2015-12-10 | Code |
| 140 | PVT-Large (RetinaNet 3x,MS) | 43.4 | No | Pyramid Vision Transformer: A Versatile Backbone... | 2021-02-24 | Code |
| 141 | ExtremeNet (Hourglass-104, multi-scale) | 43.3 | No | Bottom-up Object Detection by Grouping Extreme a... | 2019-01-23 | Code |
| 142 | Pix2seq (R50-DC5 ) | 43.2 | No | Pix2seq: A Language Modeling Framework for Objec... | 2021-09-22 | Code |
| 143 | HTC (cascade) | 43.2 | No | Hybrid Task Cascade for Instance Segmentation | 2019-01-22 | Code |
| 144 | Mask R-CNN-FPN (ResNeXt-101, GN+WS) | 43.12 | No | Micro-Batch Training with Batch-Channel Normaliz... | 2019-03-25 | Code |
| 145 | HTC (HRNetV2p-W18) | 43.1 | No | Deep High-Resolution Representation Learning for... | 2019-08-20 | Code |
| 146 | Mask R-CNN (ResNet-101, DCNv2) | 43.1 | No | Deformable ConvNets v2: More Deformable, Better ... | 2018-11-27 | Code |
| 147 | Conditional DETR-R50 | 43 | No | Conditional DETR for Fast Training Convergence | 2021-08-13 | Code |
| 148 | HoughNet (HG-104) | 43 | No | HoughNet: Integrating near and long-range eviden... | 2020-07-05 | Code |
| 149 | Faster R-CNN (FPN, X-volution) | 42.8 | No | X-volution: On the unification of convolution an... | 2021-06-04 | - |
| 150 | Cascade R-CNN (ResNet-101-FPN+, cascade) | 42.7 | No | Cascade R-CNN: Delving into High Quality Object ... | 2017-12-03 | Code |
| 151 | PVT-Large (RetinaNet 1x) | 42.6 | No | Pyramid Vision Transformer: A Versatile Backbone... | 2021-02-24 | Code |
| 152 | CornerNet-Saccade (Hourglass-54) | 42.6 | No | CornerNet-Lite: Efficient Keypoint Based Object ... | 2019-04-18 | Code |
| 153 | Pix2seq (R50) | 42.6 | No | Pix2seq: A Language Modeling Framework for Objec... | 2021-09-22 | Code |
| 154 | Mask R-CNN (ResNet-101-FPN, GroupNorm, long) | 42.3 | No | Group Normalization | 2018-03-22 | Code |
| 155 | Sparse R-CNN (ResNet-50, FPN) | 42.3 | No | Sparse R-CNN: End-to-End Object Detection with L... | 2020-11-25 | Code |
| 156 | Mask R-CNN (HRNetV2p-W32) | 42.3 | No | Deep High-Resolution Representation Learning for... | 2019-08-20 | Code |
| 157 | DETR-ResNet50 with iRPE-K (300 epochs) | 42.3 | No | Rethinking and Improving Relative Position Encod... | 2021-07-29 | Code |
| 158 | TridentNet (ResNet-101) | 42 | No | Scale-Aware Trident Networks for Object Detection | 2019-01-07 | Code |
| 159 | R3-CNN (ResNet-50-FPN) | 42 | No | Recursively Refined R-CNN: Instance Segmentation... | 2021-04-03 | Code |
| 160 | Faster R-CNN (HRNetV2p-W48) | 41.8 | No | Deep High-Resolution Representation Learning for... | 2019-08-20 | Code |
| 161 | Faster R-CNN (LIP-ResNet-101) | 41.7 | No | LIP: Local Importance-based Pooling | 2019-08-12 | Code |
| 162 | Faster R-CNN (ResNet-101, DCNv2) | 41.7 | No | Deformable ConvNets v2: More Deformable, Better ... | 2018-11-27 | Code |
| 163 | FSAF (ResNeXt-101, anchor-based branches) | 41.6 | No | Feature Selective Anchor-Free Module for Single-... | 2019-03-02 | Code |
| 164 | CornerNet-Saccade (Hourglass-104) | 41.4 | No | CornerNet-Lite: Efficient Keypoint Based Object ... | 2019-04-18 | Code |
| 165 | Grid R-CNN (ResNet-101-FPN) | 41.3 | No | Grid R-CNN | 2018-11-29 | Code |
| 166 | Cascade R-CNN (HRNetV2p-W18) | 41.3 | No | Deep High-Resolution Representation Learning for... | 2019-08-20 | Code |
| 167 | CenterNet511 (Hourglass-52) | 41.3 | No | CenterNet: Keypoint Triplets for Object Detection | 2019-04-17 | Code |
| 168 | RetinaMask (ResNet-101-FPN) | 41.1 | No | RetinaMask: Learning to predict masks improves s... | 2019-01-10 | Code |
| 169 | PoolFormer-S36 (Mask R-CNN) | 41 | No | MetaFormer Is Actually What You Need for Vision | 2021-11-22 | Code |
| 170 | Faster R-CNN (HRNetV2p-W32) | 40.9 | No | Deep High-Resolution Representation Learning for... | 2019-08-20 | Code |
| 171 | VirTex Mask R-CNN (ResNet-50-FPN) | 40.9 | No | VirTex: Learning Visual Representations from Tex... | 2020-06-11 | Code |
| 172 | Mask R-CNN (ResNet-101 + 1 NL) | 40.8 | No | Non-local Neural Networks | 2017-11-21 | Code |
| 173 | Mask R-CNN (ResNet-50-FPN, GroupNorm, long) | 40.8 | No | Group Normalization | 2018-03-22 | Code |
| 174 | RPDet (ResNet-50, multi-scale train) | 40.8 | No | RepPoints: Point Set Representation for Object D... | 2019-04-25 | Code |
| 175 | DETR-ResNet50 with iRPE-K (150 epochs) | 40.8 | No | Rethinking and Improving Relative Position Encod... | 2021-07-29 | Code |
| 176 | Faster R-CNN+aLRP Loss (ResNet-50, 500 scale) | 40.7 | No | A Ranking-based, Balanced Loss Function Unifying... | 2020-09-28 | Code |
| 177 | PPDet (ResNet-101-FPN) | 40.5 | No | Reducing Label Noise in Anchor-Free Object Detec... | 2020-08-03 | Code |
| 178 | GCnet (ResNet-50-FPN, GRoIE) | 40.3 | No | GCNet: Non-local Networks Meet Squeeze-Excitatio... | 2019-04-25 | Code |
| 179 | Mask R-CNN (ResNet-50-FPN, GroupNorm) | 40.3 | No | Group Normalization | 2018-03-22 | Code |
| 180 | Cascade R-CNN (ResNet-50-FPN+) | 40.3 | No | Cascade R-CNN: Delving into High Quality Object ... | 2017-12-03 | Code |
| 181 | ExtremeNet (Hourglass-104, single-scale) | 40.3 | No | Bottom-up Object Detection by Grouping Extreme a... | 2019-01-23 | Code |
| 182 | RPDet (ResNet-101) | 40.3 | No | RepPoints: Point Set Representation for Object D... | 2019-04-25 | Code |
| 183 | RetinaNet+aLRP Loss (ResNet-50, 500 scale) | 40.2 | No | A Ranking-based, Balanced Loss Function Unifying... | 2020-09-28 | Code |
| 184 | Mask R-CNN (ResNet-101-FPN) | 40 | No | Mask R-CNN | 2017-03-20 | Code |
| 185 | FPN+ | 39.8 | No | Feature Pyramid Networks for Object Detection | 2016-12-09 | Code |
| 186 | FoveaBox+aLRP Loss (ResNet-50, 500 scale) | 39.7 | No | A Ranking-based, Balanced Loss Function Unifying... | 2020-09-28 | Code |
| 187 | Grid R-CNN (ResNet-50-FPN) | 39.6 | No | Grid R-CNN | 2018-11-29 | Code |
| 188 | Mask R-CNN (ResNet-50, ACNet) | 39.5 | No | Adaptively Connected Neural Networks | 2019-04-07 | Code |
| 189 | FSAF (ResNet-101, anchor-based branches) | 39.3 | No | Feature Selective Anchor-Free Module for Single-... | 2019-03-02 | Code |
| 190 | Mask R-CNN (HRNetV2p-W18) | 39.2 | No | Deep High-Resolution Representation Learning for... | 2019-08-20 | Code |
| 191 | Mask R-CNN (ResNet-50 + 1 NL) | 39 | No | Non-local Neural Networks | 2017-11-21 | Code |
| 192 | FoveaBox (ResNet-101-FPN, 800x800) | 38.9 | No | FoveaBox: Beyond Anchor-based Object Detector | 2019-04-08 | Code |
| 193 | FCOS (ResNet-50-FPN + improvements) | 38.6 | No | FCOS: Fully Convolutional One-Stage Object Detec... | 2019-04-02 | Code |
| 194 | RPDet (ResNet-50) | 38.6 | No | RepPoints: Point Set Representation for Object D... | 2019-04-25 | Code |
| 195 | Libra R-CNN (ResNet-50 FPN) | 38.5 | No | Libra R-CNN: Towards Balanced Learning for Objec... | 2019-04-04 | Code |
| 196 | Mask R-CNN (ResNet-50-FPN, GRoIE) | 38.4 | No | A novel Region of Interest Extraction Layer for ... | 2020-04-28 | Code |
| 197 | CornerNet511 (Hourglass-104) | 38.4 | No | CornerNet: Detecting Objects as Paired Keypoints | 2018-08-03 | Code |
| 198 | FoveaBox+Retina (ResNet-50) | 38.1 | No | FoveaBox: Beyond Anchor-based Object Detector | 2019-04-08 | Code |
| 199 | Faster R-CNN (HRNetV2p-W18) | 38 | No | Deep High-Resolution Representation Learning for... | 2019-08-20 | Code |
| 200 | FoveaBox (ResNet-101-FPN, 600x600) | 38 | No | FoveaBox: Beyond Anchor-based Object Detector | 2019-04-08 | Code |
| 201 | FSAF (ResNet-101) | 37.9 | No | Feature Selective Anchor-Free Module for Single-... | 2019-03-02 | Code |
| 202 | Mask R-CNN (ResNet-50-FPN) | 37.7 | No | Mask R-CNN | 2017-03-20 | Code |
| 203 | Faster R-CNN (ResNet-50-FPN, GRoIE) | 37.5 | No | A novel Region of Interest Extraction Layer for ... | 2020-04-28 | Code |
| 204 | Mask R-CNN (ResNeXt-101-FPN) | 36.7 | No | Mask R-CNN | 2017-03-20 | Code |
| 205 | FoveaBox (ResNet-50-FPN, 600x600) | 36 | No | FoveaBox: Beyond Anchor-based Object Detector | 2019-04-08 | Code |
| 206 | FSAF (ResNet-50) | 35.9 | No | Feature Selective Anchor-Free Module for Single-... | 2019-03-02 | Code |
| 207 | GHM-C + GHM-R (RetinaNet-FPN-ResNet-50, M=30) | 35.8 | No | Gradient Harmonized Single-stage Detector | 2018-11-13 | Code |
| 208 | Online Fg Bal. Sampling+Hard Negative Mining (ResNet-50) | 35.6 | No | Generating Positive Bounding Boxes for Balanced ... | 2019-09-21 | Code |
| 209 | M2Det (ResNet-1o1, 320x320) | 34.1 | No | M2Det: A Single-Shot Object Detector based on Mu... | 2018-11-12 | Code |
| 210 | Faster R-CNN (Res2Net-50) | 33.7 | No | Res2Net: A New Multi-scale Backbone Architecture | 2019-04-02 | Code |
| 211 | M2Det (VGG-16, 320x320) | 33.2 | No | M2Det: A Single-Shot Object Detector based on Mu... | 2018-11-12 | Code |