| 1 | Co-DETR | 57.1 | Yes | DETRs with Collaborative Hybrid Assignments Trai... | 2022-11-22 | Code |
| 2 | CBNetV2 (EVA02, single-scale) | 56.1 | Yes | CBNet: A Composite Backbone Network Architecture... | 2021-07-01 | Code |
| 3 | EVA | 55.5 | Yes | EVA: Exploring the Limits of Masked Visual Repre... | 2022-11-14 | Code |
| 4 | FD-SwinV2-G | 55.4 | Yes | Contrastive Learning Rivals Masked Image Modelin... | 2022-05-27 | Code |
| 5 | Mask Frozen-DETR | 55.3 | Yes | Mask Frozen-DETR: High Quality Instance Segmenta... | 2023-08-07 | - |
| 6 | BEiT-3 | 54.8 | No | Image as a Foreign Language: BEiT Pretraining fo... | 2022-08-22 | Code |
| 7 | MasK DINO (SwinL, multi-scale) | 54.7 | Yes | Mask DINO: Towards A Unified Transformer-based F... | 2022-06-06 | Code |
| 8 | ViT-Adapter-L (HTC++, BEiTv2, O365, multi-scale) | 54.5 | Yes | Vision Transformer Adapter for Dense Predictions | 2022-05-17 | Code |
| 9 | GLEE-Pro | 54.5 | Yes | General Object Foundation Model for Images and V... | 2023-12-14 | Code |
| 10 | SwinV2-G (HTC++) | 54.4 | Yes | Swin Transformer V2: Scaling Up Capacity and Res... | 2021-11-18 | Code |
| 11 | GLEE-Plus | 53.3 | Yes | General Object Foundation Model for Images and V... | 2023-12-14 | Code |
| 12 | Soft Teacher + Swin-L (HTC++, multi-scale) | 53 | Yes | End-to-End Semi-Supervised Object Detection with... | 2021-06-16 | Code |
| 13 | ViT-Adapter-L (HTC++, BEiTv2 pretrain, multi-scale) | 53 | No | Vision Transformer Adapter for Dense Predictions | 2022-05-17 | Code |
| 14 | Mask DINO (SwinL, single -scale) | 52.8 | No | Mask DINO: Towards A Unified Transformer-based F... | 2022-06-06 | Code |
| 15 | ViT-Adapter-L (HTC++, BEiT pretrain, multi-scale) | 52.5 | No | Vision Transformer Adapter for Dense Predictions | 2022-05-17 | Code |
| 16 | CBNetV2 (Dual-Swin-L HTC, multi-scale) | 52.3 | No | CBNet: A Composite Backbone Network Architecture... | 2021-07-01 | Code |
| 17 | UNINEXT-H | 51.8 | Yes | Universal Instance Perception as Object Discover... | 2023-03-12 | Code |
| 18 | CBNetV2 (Dual-Swin-L HTC, single-scale) | 51.6 | No | CBNet: A Composite Backbone Network Architecture... | 2021-07-01 | Code |
| 19 | Focal-L (HTC++, multi-scale) | 51.3 | No | Focal Self-attention for Local-Global Interactio... | 2021-07-01 | Code |
| 20 | Swin-L (HTC++, multi scale) | 51.1 | No | Swin Transformer: Hierarchical Vision Transforme... | 2021-03-25 | Code |
| 21 | Mask2Former (Swin-L, single scale) | 50.5 | No | Masked-attention Mask Transformer for Universal ... | 2021-12-02 | Code |
| 22 | Swin-L (HTC++, single scale) | 50.2 | No | Swin Transformer: Hierarchical Vision Transforme... | 2021-03-25 | Code |
| 23 | ISTR-SMT (Swin-L, single scale) | 49.7 | No | ISTR: End-to-End Instance Segmentation with Tran... | 2021-05-03 | Code |
| 24 | QueryInst (single scale) | 49.1 | No | Instances as Queries | 2021-05-05 | Code |
| 25 | Cascade Eff-B7 NAS-FPN (1280, self-training Copy Paste, single-scale) | 49.1 | Yes | Simple Copy-Paste is a Strong Data Augmentation ... | 2020-12-13 | Code |
| 26 | dBOT ViT-L (CLIP) | 48.8 | No | Exploring Target Representations for Masked Auto... | 2022-09-08 | Code |
| 27 | MogaNet-XL (Cascade Mask R-CNN) | 48.8 | No | MogaNet: Multi-order Gated Aggregation Network | 2022-11-07 | Code |
| 28 | DetectoRS (ResNeXt-101-64x4d, multi-scale) | 48.5 | No | DetectoRS: Detecting Objects with Recursive Feat... | 2020-06-03 | Code |
| 29 | dBOT ViT-L | 48.3 | No | Exploring Target Representations for Masked Auto... | 2022-09-08 | Code |
| 30 | DiffusionInst-SwinL | 48.3 | No | DiffusionInst: Diffusion Model for Instance Segm... | 2022-12-06 | Code |
| 31 | GLEE-Lite | 48.3 | Yes | General Object Foundation Model for Images and V... | 2023-12-14 | Code |
| 32 | DiffusionInst-SwinB | 47.6 | No | DiffusionInst: Diffusion Model for Instance Segm... | 2022-12-06 | Code |
| 33 | DetectoRS (ResNeXt-101-32x4d, multi-scale) | 47.1 | No | DetectoRS: Detecting Objects with Recursive Feat... | 2020-06-03 | Code |
| 34 | Cascade Eff-B7 NAS-FPN (1280) | 46.9 | No | Simple Copy-Paste is a Strong Data Augmentation ... | 2020-12-13 | Code |
| 35 | SOLQ (Swin-L, single scale) | 46.7 | No | SOLQ: Segmenting Objects by Learning Queries | 2021-06-04 | Code |
| 36 | dBOT ViT-B | 46.3 | No | Exploring Target Representations for Masked Auto... | 2022-09-08 | Code |
| 37 | dBOT ViT-B (CLIP) | 46.2 | No | Exploring Target Representations for Masked Auto... | 2022-09-08 | Code |
| 38 | Mask R-CNN (SpineNet-190, 1536x1536) | 46.1 | No | SpineNet: Learning Scale-Permuted Backbone for R... | 2019-12-10 | Code |
| 39 | MogaNet-L (Cascade Mask R-CNN) | 46.1 | No | MogaNet: Multi-order Gated Aggregation Network | 2022-11-07 | Code |
| 40 | MogaNet-B (Cascade Mask R-CNN) | 46 | No | MogaNet: Multi-order Gated Aggregation Network | 2022-11-07 | Code |
| 41 | Swin-B + Cascade Mask R-CNN (tri-layer modelling) | 45.9 | No | A Tri-Layer Plugin to Improve Occluded Detection | 2022-10-18 | Code |
| 42 | GCNet (ResNeXt-101 + DCN + cascade + GC r4) | 45.4 | No | Global Context Networks | 2020-12-24 | Code |
| 43 | MogaNet-S (Cascade Mask R-CNN) | 45.1 | No | MogaNet: Multi-order Gated Aggregation Network | 2022-11-07 | Code |
| 44 | gSwin-S | 45.03 | No | gSwin: Gated MLP Vision Model with Hierarchical ... | 2022-08-24 | - |
| 45 | iBOT (ViT-B/16) | 44.2 | Yes | iBOT: Image BERT Pre-Training with Online Tokeni... | 2021-11-15 | Code |
| 46 | gSwin-T | 44.16 | No | gSwin: Gated MLP Vision Model with Hierarchical ... | 2022-08-24 | - |
| 47 | MogaNet-L (Mask R-CNN 1x) | 44.1 | No | MogaNet: Multi-order Gated Aggregation Network | 2022-11-07 | Code |
| 48 | A2MIM (ViT-B) | 43.5 | No | Architecture-Agnostic Masked Image Modeling -- F... | 2022-05-27 | Code |
| 49 | Cascade Mask R-CNN (ResNeXt152, CBNet) | 43.3 | No | CBNet: A Novel Composite Backbone Network Archit... | 2019-09-09 | Code |
| 50 | MogaNet-B (Mask R-CNN 1x) | 43.2 | No | MogaNet: Multi-order Gated Aggregation Network | 2022-11-07 | Code |
| 51 | gSwin-VT | 42.87 | No | gSwin: Gated MLP Vision Model with Hierarchical ... | 2022-08-24 | - |
| 52 | iBOT (ViT-S/16) | 42.6 | Yes | iBOT: Image BERT Pre-Training with Online Tokeni... | 2021-11-15 | Code |
| 53 | Box2Mask-T | 42.4 | No | Box2Mask: Box-supervised Instance Segmentation v... | 2022-12-03 | Code |
| 54 | Mask Transfiner(ResNet101-FPN) | 42.2 | No | Mask Transfiner for High-Quality Instance Segmen... | 2021-11-26 | Code |
| 55 | MogaNet-S (Mask R-CNN 1x) | 42.2 | No | MogaNet: Multi-order Gated Aggregation Network | 2022-11-07 | Code |
| 56 | PANet | 42 | No | Path Aggregation Network for Instance Segmentation | 2018-03-05 | Code |
| 57 | CenterMask + VoVNet99 | 41.8 | No | CenterMask : Real-Time Anchor-Free Instance Segm... | 2019-11-15 | Code |
| 58 | SOLOv2(Res-DCN-101-FPN) | 41.7 | No | SOLOv2: Dynamic and Fast Instance Segmentation | 2020-03-23 | Code |
| 59 | BCNet(ResNeXt-101 + FPN+ FCOS) | 41.7 | No | Deep Occlusion-Aware Instance Segmentation with ... | 2021-03-23 | Code |
| 60 | DiffusionInst-ResNet101 | 41.5 | No | DiffusionInst: Diffusion Model for Instance Segm... | 2022-12-06 | Code |
| 61 | BlendMask (ResNet-101 + DCN interval=3) | 41.3 | No | BlendMask: Top-Down Meets Bottom-Up for Instance... | 2020-01-02 | Code |
| 62 | HTC + ResNeXt-101-FPN + DCN | 41.2 | Yes | Hybrid Task Cascade for Instance Segmentation | 2019-01-22 | Code |
| 63 | SOLQ (ResNet101, single scale) | 40.9 | No | SOLQ: Segmenting Objects by Learning Queries | 2021-06-04 | Code |
| 64 | CenterMask + VoVNetV2-99 (single-scale) | 40.6 | No | CenterMask : Real-Time Anchor-Free Instance Segm... | 2019-11-15 | Code |
| 65 | SOLO(Res-DCN-101-FPN) | 40.4 | No | SOLO: Segmenting Objects by Locations | 2019-12-10 | Code |
| 66 | D2Det (ResNet-101, single-scale test) | 40.2 | No | - | - | Code |
| 67 | BoxTeacher | 40 | No | BoxTeacher: Exploring High-Quality Pseudo Labels... | 2022-10-11 | Code |
| 68 | BCNet(ResNet-101-FPN + Faster RCNN) | 39.8 | No | Deep Occlusion-Aware Instance Segmentation with ... | 2021-03-23 | Code |
| 69 | SOLQ (ResNet50, single scale) | 39.7 | No | SOLQ: Segmenting Objects by Learning Queries | 2021-06-04 | Code |
| 70 | CenterMask + X101-32x8d (single-scale) | 39.6 | No | CenterMask : Real-Time Anchor-Free Instance Segm... | 2019-11-15 | Code |
| 71 | BCNet(ResNet-101-FPN + FCOS) | 39.6 | No | Deep Occlusion-Aware Instance Segmentation with ... | 2021-03-23 | Code |
| 72 | CPMask | 39.2 | No | Commonality-Parsing Network across Shape and App... | 2020-07-24 | Code |
| 73 | MogaNet-T (Mask R-CNN 1x) | 39.1 | No | MogaNet: Multi-order Gated Aggregation Network | 2022-11-07 | Code |
| 74 | PolarMask++ (ResNeXt-101-DCN) | 38.7 | Yes | PolarMask++: Enhanced Polar Representation for S... | 2021-05-05 | Code |
| 75 | ISDA (ours) | 38.7 | No | ISDA: Position-Aware Instance Segmentation with ... | 2022-02-23 | Code |
| 76 | CenterMask + ResNet-101-FPN | 38.3 | No | CenterMask : Real-Time Anchor-Free Instance Segm... | 2019-11-15 | Code |
| 77 | SipMask (ResNet-101, single-scale test) | 38.1 | No | SipMask: Spatial Information Preservation for Fa... | 2020-07-29 | Code |
| 78 | DiscoBox | 37.9 | No | DiscoBox: Weakly Supervised Instance Segmentatio... | 2021-05-13 | Code |
| 79 | EmbedMask(R-101-FPN) | 37.7 | No | EmbedMask: Embedding Coupling for One-stage Inst... | 2019-12-04 | Code |
| 80 | MogaNet-XT | 37.6 | No | MogaNet: Multi-order Gated Aggregation Network | 2022-11-07 | Code |
| 81 | Mask R-CNN (ResNeXt-101-FPN) | 37.1 | No | Mask R-CNN | 2017-03-20 | Code |
| 82 | DiffusionInst-ResNet50 | 37.1 | No | DiffusionInst: Diffusion Model for Instance Segm... | 2022-12-06 | Code |
| 83 | VirTex Mask R-CNN (ResNet-50-FPN) | 36.9 | No | VirTex: Learning Visual Representations from Tex... | 2020-06-11 | Code |
| 84 | MogaNet-T | 35.8 | No | MogaNet: Multi-order Gated Aggregation Network | 2022-11-07 | Code |
| 85 | BoxInst | 35 | No | BoxInst: High-Performance Instance Segmentation ... | 2020-12-03 | Code |
| 86 | A2MIM (ResNet-50 2x) | 34.9 | No | Architecture-Agnostic Masked Image Modeling -- F... | 2022-05-27 | Code |
| 87 | E2EC DLA-34 | 33.8 | No | E2EC: An End-to-End Contour-based Method for Hig... | 2022-03-08 | Code |
| 88 | Mask R-CNN (Bottleneck-injected ResNet-50, FPN) | 33.6 | No | torchdistill: A Modular, Configuration-Driven Fr... | 2020-11-25 | Code |
| 89 | BoxCaseg | 30.9 | Yes | Weakly-supervised Instance Segmentation via Clas... | 2021-04-04 | - |
| 90 | BBAM | 25.7 | No | BBAM: Bounding Box Attribution Map for Weakly Su... | 2021-03-16 | Code |
| 91 | BBTP | 21.1 | No | - | - | Code |