TasksSotADatasetsPapersMethodsSubmitAbout
Papers With Code 2

A community resource for machine learning research: papers, code, benchmarks, and state-of-the-art results.

Explore

Notable BenchmarksAll SotADatasetsPapersMethods

Community

Submit ResultsAbout

Data sourced from the PWC Archive (CC-BY-SA 4.0). Built by the community, for the community.

SotA/Computer Vision/Instance Segmentation/COCO minival

Instance Segmentation on COCO minival

Metric: mask AP (higher is better)

LeaderboardDataset
Loading chart...

Results

Submit a result
#Model↕mask AP▼Extra DataPaperDate↕Code
1Co-DETR56.6YesDETRs with Collaborative Hybrid Assignments Trai...2022-11-22Code
2ViT-CoMer-L (Mask RCNN, DINOv2)55.9No--Code
3InternImage-H55.4YesInternImage: Exploring Large-Scale Vision Founda...2022-11-10Code
4EVA55YesEVA: Exploring the Limits of Masked Visual Repre...2022-11-14Code
5Mask Frozen-DETR54.9YesMask Frozen-DETR: High Quality Instance Segmenta...2023-08-07-
6MasK DINO (SwinL, multi-scale)54.5YesMask DINO: Towards A Unified Transformer-based F...2022-06-06Code
7ViT-Adapter-L (HTC++, BEiTv2, O365, multi-scale)54.2YesVision Transformer Adapter for Dense Predictions2022-05-17Code
8GLEE-Pro54.2YesGeneral Object Foundation Model for Images and V...2023-12-14Code
9SwinV2-G (HTC++)53.7YesSwin Transformer V2: Scaling Up Capacity and Res...2021-11-18Code
10ViTDet, ViT-H Cascade (multiscale)53.1NoExploring Plain Vision Transformer Backbones for...2022-03-30Code
11GLEE-Plus53YesGeneral Object Foundation Model for Images and V...2023-12-14Code
12Mask DINO (SwinL)52.6NoMask DINO: Towards A Unified Transformer-based F...2022-06-06Code
13Soft Teacher + Swin-L(HTC++, multi-scale)52.5YesEnd-to-End Semi-Supervised Object Detection with...2021-06-16Code
14ViT-Adapter-L (HTC++, BEiTv2 pretrain, multi-scale)52.5NoVision Transformer Adapter for Dense Predictions2022-05-17Code
15ViT-Adapter-L (HTC++, BEiT pretrain, multi-scale)52.2NoVision Transformer Adapter for Dense Predictions2022-05-17Code
16ViTDet, ViT-H Cascade52NoExploring Plain Vision Transformer Backbones for...2022-03-30Code
17Soft Teacher + Swin-L(HTC++, single-scale)51.9YesEnd-to-End Semi-Supervised Object Detection with...2021-06-16Code
18CBNetV2 (Dual-Swin-L HTC, multi-scale)51.8NoCBNet: A Composite Backbone Network Architecture...2021-07-01Code
19Frozen Backbone, SwinV2-G-ext22K (HTC)51.6NoCould Giant Pretrained Image Models Extract Univ...2022-11-03-
20CBNetV2 (Dual-Swin-L HTC, multi-scale)51NoCBNet: A Composite Backbone Network Architecture...2021-07-01Code
21Focal-L (HTC++, multi-scale)50.9NoFocal Self-attention for Local-Global Interactio...2021-07-01Code
22DiNAT-L (single-scale, Mask2Former)50.8NoDilated Neighborhood Attention Transformer2022-09-29Code
23MViTv2-L (Cascade Mask R-CNN, multi-scale, IN21k pre-train)50.5NoMViTv2: Improved Multiscale Vision Transformers ...2021-12-02Code
24Swin-L (HTC++, multi scale)50.4NoSwin Transformer: Hierarchical Vision Transforme...2021-03-25Code
25MOAT-3 (IN-22K pretraining, single-scale)50.3NoMOAT: Alternating Mobile Convolution and Attenti...2022-10-04Code
26Mask2Former (Swin-L)50.1NoMasked-attention Mask Transformer for Universal ...2021-12-02Code
27Swin-L (HTC++, single scale)49.5NoSwin Transformer: Hierarchical Vision Transforme...2021-03-25Code
28MOAT-2 (IN-22K pretraining, single-scale)49.3NoMOAT: Alternating Mobile Convolution and Attenti...2022-10-04Code
29MOAT-1 (IN-1K pretraining, single-scale)49NoMOAT: Alternating Mobile Convolution and Attenti...2022-10-04Code
30QueryInst (single scale)48.9NoInstances as Queries2021-05-05Code
31Cascade Eff-B7 NAS-FPN (1280, self-training Copy Paste, single-scale)48.9YesSimple Copy-Paste is a Strong Data Augmentation ...2020-12-13Code
32InternImage-XL48.8NoInternImage: Exploring Large-Scale Vision Founda...2022-11-10Code
33CenterNet2 (Swin-L w/ X-Paste + Copy-Paste)48.8NoX-Paste: Revisiting Scalable Copy-Paste for Inst...2022-12-07Code
34Heira-L48.6NoHiera: A Hierarchical Vision Transformer without...2023-06-01Code
35InternImage-L48.5NoInternImage: Exploring Large-Scale Vision Founda...2022-11-10Code
36MViTv2-H (Cascade Mask R-CNN, single-scale, IN21k pre-train)48.5NoMViTv2: Improved Multiscale Vision Transformers ...2021-12-02Code
37GLEE-Lite48.4YesGeneral Object Foundation Model for Images and V...2023-12-14Code
38MOAT-0 (IN-1K pretraining, single-scale)47.4NoMOAT: Alternating Mobile Convolution and Attenti...2022-10-04Code
39MViTv2-L (Cascade Mask R-CNN, single-scale)47.1NoMViTv2: Improved Multiscale Vision Transformers ...2021-12-02Code
40MPViT-B (Cascade Mask R-CNN, multi-scale, IN1k pre-train)47NoMPViT: Multi-Path Vision Transformer for Dense P...2021-12-21Code
41tiny-MOAT-3 (IN-1K pretraining, single-scale)47NoMOAT: Alternating Mobile Convolution and Attenti...2022-10-04Code
42Cascade Eff-B7 NAS-FPN (1280)46.8NoSimple Copy-Paste is a Strong Data Augmentation ...2020-12-13Code
43ResNeSt-200 (multi-scale)46.25NoResNeSt: Split-Attention Networks2020-04-19Code
44MViT-L (Mask R-CNN, single-scale)46.2NoMViTv2: Improved Multiscale Vision Transformers ...2021-12-02Code
45RetinaNet (SpineNet-190, 1536x1536)46.1NoSpineNet: Learning Scale-Permuted Backbone for R...2019-12-10Code
46MPViT-B (Cascade R-CNN, sinlge-scale, IN-1K pre-train)45.8NoMPViT: Multi-Path Vision Transformer for Dense P...2021-12-21Code
47Mask R-CNN (ViL Base, multi-scale, 3x lr)45.7NoMulti-Scale Vision Longformer: A New Vision Tran...2021-03-29Code
48Mask R-CNN (ViL Base, 1x lr)45.1NoMulti-Scale Vision Longformer: A New Vision Tran...2021-03-29Code
49tiny-MOAT-2 (IN-1K pretraining, single-scale)45NoMOAT: Alternating Mobile Convolution and Attenti...2022-10-04Code
50GCNet (ResNeXt-101 + DCN + cascade + GC r4)44.7NoGlobal Context Networks2020-12-24Code
51tiny-MOAT-1 (IN-1K pretraining, single-scale)44.6NoMOAT: Alternating Mobile Convolution and Attenti...2022-10-04Code
52InternImage-S44.5NoInternImage: Exploring Large-Scale Vision Founda...2022-11-10Code
53ResNeSt-200-DCN (single-scale)44.5NoResNeSt: Split-Attention Networks2020-04-19Code
54ELSA-S (Cascade Mask RCNN)44.4NoELSA: Enhanced Local Self-Attention for Vision T...2021-12-23Code
55BoTNet 200 (Mask R-CNN, single scale, 72 epochs)44.4NoBottleneck Transformers for Visual Recognition2021-01-27Code
56DaViT-T (Mask R-CNN, 36 epochs)44.3NoDaViT: Dual Attention Vision Transformers2022-04-07Code
57ResNeSt-200 (single-scale)44.21NoResNeSt: Split-Attention Networks2020-04-19Code
58InternImage-T43.7NoInternImage: Exploring Large-Scale Vision Founda...2022-11-10Code
59BoTNet 152 (Mask R-CNN, single scale, 72 epochs)43.7NoBottleneck Transformers for Visual Recognition2021-01-27Code
60XCiT-M24/843.7NoXCiT: Cross-Covariance Image Transformers2021-06-17Code
61tiny-MOAT-0 (IN-1K pretraining, single-scale)43.3NoMOAT: Alternating Mobile Convolution and Attenti...2022-10-04Code
62ELSA-S (Mask RCNN)43NoELSA: Enhanced Local Self-Attention for Vision T...2021-12-23Code
63XCiT-S24/843NoXCiT: Cross-Covariance Image Transformers2021-06-17Code
64CenterMask-VoVNetV2-99 (multi-scale)42.5NoCenterMask : Real-Time Anchor-Free Instance Segm...2019-11-15Code
65ResNeSt-101 (single-scale)41.56NoResNeSt: Split-Attention Networks2020-04-19Code
66SIW41.4NoScaling up Multi-domain Semantic Segmentation wi...2022-02-04-
67Res2Net-101+HTC41.3NoRes2Net: A New Multi-scale Backbone Architecture2019-04-02Code
68HTC (HRNetV2p-W48)41NoDeep High-Resolution Representation Learning for...2019-08-20Code
69HTC (HRNetV2p-W48)41NoDeep High-Resolution Representation Learning for...2019-08-20Code
70GCNet (ResNeXt-101 + DCN + cascade + GC r16)40.9NoGCNet: Non-local Networks Meet Squeeze-Excitatio...2019-04-25Code
71BoTNet 50 (72 epochs)40.7NoBottleneck Transformers for Visual Recognition2021-01-27Code
72R3-CNN (ResNet-50-FPN, DCN)40.4NoRecursively Refined R-CNN: Instance Segmentation...2021-04-03Code
73Mask R-CNN (ResNext-152, +1 NL)40.3NoNon-local Neural Networks2017-11-21Code
74Mask R-CNN-FPN (AOGNet-40M)40.2NoAttentive Normalization2019-08-04Code
75R3-CNN (ResNet-50-FPN, GC-Net)40.2NoRecursively Refined R-CNN: Instance Segmentation...2021-04-03Code
76CenterMask-VoVNetV2-99-3x40.2NoCenterMask : Real-Time Anchor-Free Instance Segm...2019-11-15Code
77R3-CNN (ResNet-50-FPN, GRoIE)39.1NoRecursively Refined R-CNN: Instance Segmentation...2021-04-03Code
78Mask Scoring R-CNN (ResNet-101-FPN-DCN)39.1NoMask Scoring R-CNN2019-03-01Code
79Mask R-CNN-FPN (ResNeXt-101, GN+WS)38.34NoMicro-Batch Training with Batch-Channel Normaliz...2019-03-25Code
80R3-CNN (ResNet-50-FPN)38.2NoRecursively Refined R-CNN: Instance Segmentation...2021-04-03Code
81HTC (ResNet-50)38.2NoHybrid Task Cascade for Instance Segmentation2019-01-22Code
82Mask Scoring R-CNN (ResNet-101 FPN)38.2NoMask Scoring R-CNN2019-03-01Code
83PANet (ResNet-50)37.8NoPath Aggregation Network for Instance Segmentation2018-03-05Code
84GCnet (ResNet-50-FPN, GRoIE)37.2NoA novel Region of Interest Extraction Layer for ...2020-04-28Code
85Mask R-CNN (FPN, X-volution, SA)37.2NoX-volution: On the unification of convolution an...2021-06-04-
86Mask R-CNN (ResNet-101, +1 NL)37.1NoNon-local Neural Networks2017-11-21Code
87Mask Scoring R-CNN (ResNet-50 FPN)36NoMask Scoring R-CNN2019-03-01Code
88Mask R-CNN (ResNet-50-FPN, GRoIE)35.8NoA novel Region of Interest Extraction Layer for ...2020-04-28Code
89Faster R-CNN (Res2Net-50)35.6NoRes2Net: A New Multi-scale Backbone Architecture2019-04-02Code
90Mask R-CNN (ResNet-50, +1 NL)35.5NoNon-local Neural Networks2017-11-21Code
91Mask R-CNN (ResNet-50, ACNet)35.2NoAdaptively Connected Neural Networks2019-04-07Code
92YOLACT-550 (ResNet-50)29.9NoYOLACT: Real-time Instance Segmentation2019-04-04Code