Instance Segmentation on COCO minival

Metric: mask AP (higher is better)

LeaderboardDataset

Loading chart...

Results

Hide extra data

Sort:

#	Model↕	mask AP▼	Extra Data	Paper	Date↕	Code
1	Co-DETR	56.6	Yes	DETRs with Collaborative Hybrid Assignments Trai...	2022-11-22	Code
2	ViT-CoMer-L (Mask RCNN, DINOv2)	55.9	No	-	-	Code
3	InternImage-H	55.4	Yes	InternImage: Exploring Large-Scale Vision Founda...	2022-11-10	Code
4	EVA	55	Yes	EVA: Exploring the Limits of Masked Visual Repre...	2022-11-14	Code
5	Mask Frozen-DETR	54.9	Yes	Mask Frozen-DETR: High Quality Instance Segmenta...	2023-08-07	-
6	MasK DINO (SwinL, multi-scale)	54.5	Yes	Mask DINO: Towards A Unified Transformer-based F...	2022-06-06	Code
7	ViT-Adapter-L (HTC++, BEiTv2, O365, multi-scale)	54.2	Yes	Vision Transformer Adapter for Dense Predictions	2022-05-17	Code
8	GLEE-Pro	54.2	Yes	General Object Foundation Model for Images and V...	2023-12-14	Code
9	SwinV2-G (HTC++)	53.7	Yes	Swin Transformer V2: Scaling Up Capacity and Res...	2021-11-18	Code
10	ViTDet, ViT-H Cascade (multiscale)	53.1	No	Exploring Plain Vision Transformer Backbones for...	2022-03-30	Code
11	GLEE-Plus	53	Yes	General Object Foundation Model for Images and V...	2023-12-14	Code
12	Mask DINO (SwinL)	52.6	No	Mask DINO: Towards A Unified Transformer-based F...	2022-06-06	Code
13	Soft Teacher + Swin-L(HTC++, multi-scale)	52.5	Yes	End-to-End Semi-Supervised Object Detection with...	2021-06-16	Code
14	ViT-Adapter-L (HTC++, BEiTv2 pretrain, multi-scale)	52.5	No	Vision Transformer Adapter for Dense Predictions	2022-05-17	Code
15	ViT-Adapter-L (HTC++, BEiT pretrain, multi-scale)	52.2	No	Vision Transformer Adapter for Dense Predictions	2022-05-17	Code
16	ViTDet, ViT-H Cascade	52	No	Exploring Plain Vision Transformer Backbones for...	2022-03-30	Code
17	Soft Teacher + Swin-L(HTC++, single-scale)	51.9	Yes	End-to-End Semi-Supervised Object Detection with...	2021-06-16	Code
18	CBNetV2 (Dual-Swin-L HTC, multi-scale)	51.8	No	CBNet: A Composite Backbone Network Architecture...	2021-07-01	Code
19	Frozen Backbone, SwinV2-G-ext22K (HTC)	51.6	No	Could Giant Pretrained Image Models Extract Univ...	2022-11-03	-
20	CBNetV2 (Dual-Swin-L HTC, multi-scale)	51	No	CBNet: A Composite Backbone Network Architecture...	2021-07-01	Code
21	Focal-L (HTC++, multi-scale)	50.9	No	Focal Self-attention for Local-Global Interactio...	2021-07-01	Code
22	DiNAT-L (single-scale, Mask2Former)	50.8	No	Dilated Neighborhood Attention Transformer	2022-09-29	Code
23	MViTv2-L (Cascade Mask R-CNN, multi-scale, IN21k pre-train)	50.5	No	MViTv2: Improved Multiscale Vision Transformers ...	2021-12-02	Code
24	Swin-L (HTC++, multi scale)	50.4	No	Swin Transformer: Hierarchical Vision Transforme...	2021-03-25	Code
25	MOAT-3 (IN-22K pretraining, single-scale)	50.3	No	MOAT: Alternating Mobile Convolution and Attenti...	2022-10-04	Code
26	Mask2Former (Swin-L)	50.1	No	Masked-attention Mask Transformer for Universal ...	2021-12-02	Code
27	Swin-L (HTC++, single scale)	49.5	No	Swin Transformer: Hierarchical Vision Transforme...	2021-03-25	Code
28	MOAT-2 (IN-22K pretraining, single-scale)	49.3	No	MOAT: Alternating Mobile Convolution and Attenti...	2022-10-04	Code
29	MOAT-1 (IN-1K pretraining, single-scale)	49	No	MOAT: Alternating Mobile Convolution and Attenti...	2022-10-04	Code
30	QueryInst (single scale)	48.9	No	Instances as Queries	2021-05-05	Code
31	Cascade Eff-B7 NAS-FPN (1280, self-training Copy Paste, single-scale)	48.9	Yes	Simple Copy-Paste is a Strong Data Augmentation ...	2020-12-13	Code
32	InternImage-XL	48.8	No	InternImage: Exploring Large-Scale Vision Founda...	2022-11-10	Code
33	CenterNet2 (Swin-L w/ X-Paste + Copy-Paste)	48.8	No	X-Paste: Revisiting Scalable Copy-Paste for Inst...	2022-12-07	Code
34	Heira-L	48.6	No	Hiera: A Hierarchical Vision Transformer without...	2023-06-01	Code
35	InternImage-L	48.5	No	InternImage: Exploring Large-Scale Vision Founda...	2022-11-10	Code
36	MViTv2-H (Cascade Mask R-CNN, single-scale, IN21k pre-train)	48.5	No	MViTv2: Improved Multiscale Vision Transformers ...	2021-12-02	Code
37	GLEE-Lite	48.4	Yes	General Object Foundation Model for Images and V...	2023-12-14	Code
38	MOAT-0 (IN-1K pretraining, single-scale)	47.4	No	MOAT: Alternating Mobile Convolution and Attenti...	2022-10-04	Code
39	MViTv2-L (Cascade Mask R-CNN, single-scale)	47.1	No	MViTv2: Improved Multiscale Vision Transformers ...	2021-12-02	Code
40	MPViT-B (Cascade Mask R-CNN, multi-scale, IN1k pre-train)	47	No	MPViT: Multi-Path Vision Transformer for Dense P...	2021-12-21	Code
41	tiny-MOAT-3 (IN-1K pretraining, single-scale)	47	No	MOAT: Alternating Mobile Convolution and Attenti...	2022-10-04	Code
42	Cascade Eff-B7 NAS-FPN (1280)	46.8	No	Simple Copy-Paste is a Strong Data Augmentation ...	2020-12-13	Code
43	ResNeSt-200 (multi-scale)	46.25	No	ResNeSt: Split-Attention Networks	2020-04-19	Code
44	MViT-L (Mask R-CNN, single-scale)	46.2	No	MViTv2: Improved Multiscale Vision Transformers ...	2021-12-02	Code
45	RetinaNet (SpineNet-190, 1536x1536)	46.1	No	SpineNet: Learning Scale-Permuted Backbone for R...	2019-12-10	Code
46	MPViT-B (Cascade R-CNN, sinlge-scale, IN-1K pre-train)	45.8	No	MPViT: Multi-Path Vision Transformer for Dense P...	2021-12-21	Code
47	Mask R-CNN (ViL Base, multi-scale, 3x lr)	45.7	No	Multi-Scale Vision Longformer: A New Vision Tran...	2021-03-29	Code
48	Mask R-CNN (ViL Base, 1x lr)	45.1	No	Multi-Scale Vision Longformer: A New Vision Tran...	2021-03-29	Code
49	tiny-MOAT-2 (IN-1K pretraining, single-scale)	45	No	MOAT: Alternating Mobile Convolution and Attenti...	2022-10-04	Code
50	GCNet (ResNeXt-101 + DCN + cascade + GC r4)	44.7	No	Global Context Networks	2020-12-24	Code
51	tiny-MOAT-1 (IN-1K pretraining, single-scale)	44.6	No	MOAT: Alternating Mobile Convolution and Attenti...	2022-10-04	Code
52	InternImage-S	44.5	No	InternImage: Exploring Large-Scale Vision Founda...	2022-11-10	Code
53	ResNeSt-200-DCN (single-scale)	44.5	No	ResNeSt: Split-Attention Networks	2020-04-19	Code
54	ELSA-S (Cascade Mask RCNN)	44.4	No	ELSA: Enhanced Local Self-Attention for Vision T...	2021-12-23	Code
55	BoTNet 200 (Mask R-CNN, single scale, 72 epochs)	44.4	No	Bottleneck Transformers for Visual Recognition	2021-01-27	Code
56	DaViT-T (Mask R-CNN, 36 epochs)	44.3	No	DaViT: Dual Attention Vision Transformers	2022-04-07	Code
57	ResNeSt-200 (single-scale)	44.21	No	ResNeSt: Split-Attention Networks	2020-04-19	Code
58	InternImage-T	43.7	No	InternImage: Exploring Large-Scale Vision Founda...	2022-11-10	Code
59	BoTNet 152 (Mask R-CNN, single scale, 72 epochs)	43.7	No	Bottleneck Transformers for Visual Recognition	2021-01-27	Code
60	XCiT-M24/8	43.7	No	XCiT: Cross-Covariance Image Transformers	2021-06-17	Code
61	tiny-MOAT-0 (IN-1K pretraining, single-scale)	43.3	No	MOAT: Alternating Mobile Convolution and Attenti...	2022-10-04	Code
62	ELSA-S (Mask RCNN)	43	No	ELSA: Enhanced Local Self-Attention for Vision T...	2021-12-23	Code
63	XCiT-S24/8	43	No	XCiT: Cross-Covariance Image Transformers	2021-06-17	Code
64	CenterMask-VoVNetV2-99 (multi-scale)	42.5	No	CenterMask : Real-Time Anchor-Free Instance Segm...	2019-11-15	Code
65	ResNeSt-101 (single-scale)	41.56	No	ResNeSt: Split-Attention Networks	2020-04-19	Code
66	SIW	41.4	No	Scaling up Multi-domain Semantic Segmentation wi...	2022-02-04	-
67	Res2Net-101+HTC	41.3	No	Res2Net: A New Multi-scale Backbone Architecture	2019-04-02	Code
68	HTC (HRNetV2p-W48)	41	No	Deep High-Resolution Representation Learning for...	2019-08-20	Code
69	HTC (HRNetV2p-W48)	41	No	Deep High-Resolution Representation Learning for...	2019-08-20	Code
70	GCNet (ResNeXt-101 + DCN + cascade + GC r16)	40.9	No	GCNet: Non-local Networks Meet Squeeze-Excitatio...	2019-04-25	Code
71	BoTNet 50 (72 epochs)	40.7	No	Bottleneck Transformers for Visual Recognition	2021-01-27	Code
72	R3-CNN (ResNet-50-FPN, DCN)	40.4	No	Recursively Refined R-CNN: Instance Segmentation...	2021-04-03	Code
73	Mask R-CNN (ResNext-152, +1 NL)	40.3	No	Non-local Neural Networks	2017-11-21	Code
74	Mask R-CNN-FPN (AOGNet-40M)	40.2	No	Attentive Normalization	2019-08-04	Code
75	R3-CNN (ResNet-50-FPN, GC-Net)	40.2	No	Recursively Refined R-CNN: Instance Segmentation...	2021-04-03	Code
76	CenterMask-VoVNetV2-99-3x	40.2	No	CenterMask : Real-Time Anchor-Free Instance Segm...	2019-11-15	Code
77	R3-CNN (ResNet-50-FPN, GRoIE)	39.1	No	Recursively Refined R-CNN: Instance Segmentation...	2021-04-03	Code
78	Mask Scoring R-CNN (ResNet-101-FPN-DCN)	39.1	No	Mask Scoring R-CNN	2019-03-01	Code
79	Mask R-CNN-FPN (ResNeXt-101, GN+WS)	38.34	No	Micro-Batch Training with Batch-Channel Normaliz...	2019-03-25	Code
80	R3-CNN (ResNet-50-FPN)	38.2	No	Recursively Refined R-CNN: Instance Segmentation...	2021-04-03	Code
81	HTC (ResNet-50)	38.2	No	Hybrid Task Cascade for Instance Segmentation	2019-01-22	Code
82	Mask Scoring R-CNN (ResNet-101 FPN)	38.2	No	Mask Scoring R-CNN	2019-03-01	Code
83	PANet (ResNet-50)	37.8	No	Path Aggregation Network for Instance Segmentation	2018-03-05	Code
84	GCnet (ResNet-50-FPN, GRoIE)	37.2	No	A novel Region of Interest Extraction Layer for ...	2020-04-28	Code
85	Mask R-CNN (FPN, X-volution, SA)	37.2	No	X-volution: On the unification of convolution an...	2021-06-04	-
86	Mask R-CNN (ResNet-101, +1 NL)	37.1	No	Non-local Neural Networks	2017-11-21	Code
87	Mask Scoring R-CNN (ResNet-50 FPN)	36	No	Mask Scoring R-CNN	2019-03-01	Code
88	Mask R-CNN (ResNet-50-FPN, GRoIE)	35.8	No	A novel Region of Interest Extraction Layer for ...	2020-04-28	Code
89	Faster R-CNN (Res2Net-50)	35.6	No	Res2Net: A New Multi-scale Backbone Architecture	2019-04-02	Code
90	Mask R-CNN (ResNet-50, +1 NL)	35.5	No	Non-local Neural Networks	2017-11-21	Code
91	Mask R-CNN (ResNet-50, ACNet)	35.2	No	Adaptively Connected Neural Networks	2019-04-07	Code
92	YOLACT-550 (ResNet-50)	29.9	No	YOLACT: Real-time Instance Segmentation	2019-04-04	Code

#1Co-DETRSOTA
56.6
mask AP· Extra Data· 2022-11-22
DETRs with Collaborative Hybrid Assignments Training Code
#2ViT-CoMer-L (Mask RCNN, DINOv2)
55.9
mask AP
No paperCode
#3InternImage-HSOTA
55.4
mask AP· Extra Data· 2022-11-10
InternImage: Exploring Large-Scale Vision Foundation Models with Deformable Convolutions Code
#4EVA
55
mask AP· Extra Data· 2022-11-14
EVA: Exploring the Limits of Masked Visual Representation Learning at Scale Code
#5Mask Frozen-DETR
54.9
mask AP· Extra Data· 2023-08-07
Mask Frozen-DETR: High Quality Instance Segmentation with One GPU
#6MasK DINO (SwinL, multi-scale)SOTA
54.5
mask AP· Extra Data· 2022-06-06
Mask DINO: Towards A Unified Transformer-based Framework for Object Detection and Segmentation Code
#7ViT-Adapter-L (HTC++, BEiTv2, O365, multi-scale)SOTA
54.2
mask AP· Extra Data· 2022-05-17
Vision Transformer Adapter for Dense Predictions Code
#8GLEE-Pro
54.2
mask AP· Extra Data· 2023-12-14
General Object Foundation Model for Images and Videos at Scale Code
#9SwinV2-G (HTC++)SOTA
53.7
mask AP· Extra Data· 2021-11-18
Swin Transformer V2: Scaling Up Capacity and Resolution Code
#10ViTDet, ViT-H Cascade (multiscale)
53.1
mask AP· 2022-03-30
Exploring Plain Vision Transformer Backbones for Object Detection Code
#11GLEE-Plus
53
mask AP· Extra Data· 2023-12-14
General Object Foundation Model for Images and Videos at Scale Code
#12Mask DINO (SwinL)
52.6
mask AP· 2022-06-06
Mask DINO: Towards A Unified Transformer-based Framework for Object Detection and Segmentation Code
#13Soft Teacher + Swin-L(HTC++, multi-scale)SOTA
52.5
mask AP· Extra Data· 2021-06-16
End-to-End Semi-Supervised Object Detection with Soft Teacher Code
#14ViT-Adapter-L (HTC++, BEiTv2 pretrain, multi-scale)
52.5
mask AP· 2022-05-17
Vision Transformer Adapter for Dense Predictions Code
#15ViT-Adapter-L (HTC++, BEiT pretrain, multi-scale)
52.2
mask AP· 2022-05-17
Vision Transformer Adapter for Dense Predictions Code
#16ViTDet, ViT-H Cascade
52
mask AP· 2022-03-30
Exploring Plain Vision Transformer Backbones for Object Detection Code
#17Soft Teacher + Swin-L(HTC++, single-scale)
51.9
mask AP· Extra Data· 2021-06-16
End-to-End Semi-Supervised Object Detection with Soft Teacher Code
#18CBNetV2 (Dual-Swin-L HTC, multi-scale)
51.8
mask AP· 2021-07-01
CBNet: A Composite Backbone Network Architecture for Object Detection Code
#19Frozen Backbone, SwinV2-G-ext22K (HTC)
51.6
mask AP· 2022-11-03
Could Giant Pretrained Image Models Extract Universal Representations?
#20CBNetV2 (Dual-Swin-L HTC, multi-scale)
51
mask AP· 2021-07-01
CBNet: A Composite Backbone Network Architecture for Object Detection Code
#21Focal-L (HTC++, multi-scale)
50.9
mask AP· 2021-07-01
Focal Self-attention for Local-Global Interactions in Vision Transformers Code
#22DiNAT-L (single-scale, Mask2Former)
50.8
mask AP· 2022-09-29
Dilated Neighborhood Attention Transformer Code
#23MViTv2-L (Cascade Mask R-CNN, multi-scale, IN21k pre-train)
50.5
mask AP· 2021-12-02
MViTv2: Improved Multiscale Vision Transformers for Classification and Detection Code
#24Swin-L (HTC++, multi scale)SOTA
50.4
mask AP· 2021-03-25
Swin Transformer: Hierarchical Vision Transformer using Shifted Windows Code
#25MOAT-3 (IN-22K pretraining, single-scale)
50.3
mask AP· 2022-10-04
MOAT: Alternating Mobile Convolution and Attention Brings Strong Vision Models Code
#26Mask2Former (Swin-L)
50.1
mask AP· 2021-12-02
Masked-attention Mask Transformer for Universal Image Segmentation Code
#27Swin-L (HTC++, single scale)
49.5
mask AP· 2021-03-25
Swin Transformer: Hierarchical Vision Transformer using Shifted Windows Code
#28MOAT-2 (IN-22K pretraining, single-scale)
49.3
mask AP· 2022-10-04
MOAT: Alternating Mobile Convolution and Attention Brings Strong Vision Models Code
#29MOAT-1 (IN-1K pretraining, single-scale)
49
mask AP· 2022-10-04
MOAT: Alternating Mobile Convolution and Attention Brings Strong Vision Models Code
#30QueryInst (single scale)
48.9
mask AP· 2021-05-05
Instances as Queries Code
#31Cascade Eff-B7 NAS-FPN (1280, self-training Copy Paste, single-scale)SOTA
48.9
mask AP· Extra Data· 2020-12-13
Simple Copy-Paste is a Strong Data Augmentation Method for Instance Segmentation Code
#32InternImage-XL
48.8
mask AP· 2022-11-10
InternImage: Exploring Large-Scale Vision Foundation Models with Deformable Convolutions Code
#33CenterNet2 (Swin-L w/ X-Paste + Copy-Paste)
48.8
mask AP· 2022-12-07
X-Paste: Revisiting Scalable Copy-Paste for Instance Segmentation using CLIP and StableDiffusion Code
#34Heira-L
48.6
mask AP· 2023-06-01
Hiera: A Hierarchical Vision Transformer without the Bells-and-Whistles Code
#35InternImage-L
48.5
mask AP· 2022-11-10
InternImage: Exploring Large-Scale Vision Foundation Models with Deformable Convolutions Code
#36MViTv2-H (Cascade Mask R-CNN, single-scale, IN21k pre-train)
48.5
mask AP· 2021-12-02
MViTv2: Improved Multiscale Vision Transformers for Classification and Detection Code
#37GLEE-Lite
48.4
mask AP· Extra Data· 2023-12-14
General Object Foundation Model for Images and Videos at Scale Code
#38MOAT-0 (IN-1K pretraining, single-scale)
47.4
mask AP· 2022-10-04
MOAT: Alternating Mobile Convolution and Attention Brings Strong Vision Models Code
#39MViTv2-L (Cascade Mask R-CNN, single-scale)
47.1
mask AP· 2021-12-02
MViTv2: Improved Multiscale Vision Transformers for Classification and Detection Code
#40MPViT-B (Cascade Mask R-CNN, multi-scale, IN1k pre-train)
47
mask AP· 2021-12-21
MPViT: Multi-Path Vision Transformer for Dense Prediction Code
#41tiny-MOAT-3 (IN-1K pretraining, single-scale)
47
mask AP· 2022-10-04
MOAT: Alternating Mobile Convolution and Attention Brings Strong Vision Models Code
#42Cascade Eff-B7 NAS-FPN (1280)
46.8
mask AP· 2020-12-13
Simple Copy-Paste is a Strong Data Augmentation Method for Instance Segmentation Code
#43ResNeSt-200 (multi-scale)SOTA
46.25
mask AP· 2020-04-19
ResNeSt: Split-Attention Networks Code
#44MViT-L (Mask R-CNN, single-scale)
46.2
mask AP· 2021-12-02
MViTv2: Improved Multiscale Vision Transformers for Classification and Detection Code
#45RetinaNet (SpineNet-190, 1536x1536)SOTA
46.1
mask AP· 2019-12-10
SpineNet: Learning Scale-Permuted Backbone for Recognition and Localization Code
#46MPViT-B (Cascade R-CNN, sinlge-scale, IN-1K pre-train)
45.8
mask AP· 2021-12-21
MPViT: Multi-Path Vision Transformer for Dense Prediction Code
#47Mask R-CNN (ViL Base, multi-scale, 3x lr)
45.7
mask AP· 2021-03-29
Multi-Scale Vision Longformer: A New Vision Transformer for High-Resolution Image Encoding Code
#48Mask R-CNN (ViL Base, 1x lr)
45.1
mask AP· 2021-03-29
Multi-Scale Vision Longformer: A New Vision Transformer for High-Resolution Image Encoding Code
#49tiny-MOAT-2 (IN-1K pretraining, single-scale)
45
mask AP· 2022-10-04
MOAT: Alternating Mobile Convolution and Attention Brings Strong Vision Models Code
#50GCNet (ResNeXt-101 + DCN + cascade + GC r4)
44.7
mask AP· 2020-12-24
Global Context Networks Code
#51tiny-MOAT-1 (IN-1K pretraining, single-scale)
44.6
mask AP· 2022-10-04
MOAT: Alternating Mobile Convolution and Attention Brings Strong Vision Models Code
#52InternImage-S
44.5
mask AP· 2022-11-10
InternImage: Exploring Large-Scale Vision Foundation Models with Deformable Convolutions Code
#53ResNeSt-200-DCN (single-scale)
44.5
mask AP· 2020-04-19
ResNeSt: Split-Attention Networks Code
#54ELSA-S (Cascade Mask RCNN)
44.4
mask AP· 2021-12-23
ELSA: Enhanced Local Self-Attention for Vision Transformer Code
#55BoTNet 200 (Mask R-CNN, single scale, 72 epochs)
44.4
mask AP· 2021-01-27
Bottleneck Transformers for Visual Recognition Code
#56DaViT-T (Mask R-CNN, 36 epochs)
44.3
mask AP· 2022-04-07
DaViT: Dual Attention Vision Transformers Code
#57ResNeSt-200 (single-scale)
44.21
mask AP· 2020-04-19
ResNeSt: Split-Attention Networks Code
#58InternImage-T
43.7
mask AP· 2022-11-10
InternImage: Exploring Large-Scale Vision Foundation Models with Deformable Convolutions Code
#59BoTNet 152 (Mask R-CNN, single scale, 72 epochs)
43.7
mask AP· 2021-01-27
Bottleneck Transformers for Visual Recognition Code
#60XCiT-M24/8
43.7
mask AP· 2021-06-17
XCiT: Cross-Covariance Image Transformers Code
#61tiny-MOAT-0 (IN-1K pretraining, single-scale)
43.3
mask AP· 2022-10-04
MOAT: Alternating Mobile Convolution and Attention Brings Strong Vision Models Code
#62ELSA-S (Mask RCNN)
43
mask AP· 2021-12-23
ELSA: Enhanced Local Self-Attention for Vision Transformer Code
#63XCiT-S24/8
43
mask AP· 2021-06-17
XCiT: Cross-Covariance Image Transformers Code
#64CenterMask-VoVNetV2-99 (multi-scale)SOTA
42.5
mask AP· 2019-11-15
CenterMask : Real-Time Anchor-Free Instance Segmentation Code
#65ResNeSt-101 (single-scale)
41.56
mask AP· 2020-04-19
ResNeSt: Split-Attention Networks Code
#66SIW
41.4
mask AP· 2022-02-04
Scaling up Multi-domain Semantic Segmentation with Sentence Embeddings
#67Res2Net-101+HTCSOTA
41.3
mask AP· 2019-04-02
Res2Net: A New Multi-scale Backbone Architecture Code
#68HTC (HRNetV2p-W48)
41
mask AP· 2019-08-20
Deep High-Resolution Representation Learning for Visual Recognition Code
#69HTC (HRNetV2p-W48)
41
mask AP· 2019-08-20
Deep High-Resolution Representation Learning for Visual Recognition Code
#70GCNet (ResNeXt-101 + DCN + cascade + GC r16)
40.9
mask AP· 2019-04-25
GCNet: Non-local Networks Meet Squeeze-Excitation Networks and Beyond Code
#71BoTNet 50 (72 epochs)
40.7
mask AP· 2021-01-27
Bottleneck Transformers for Visual Recognition Code
#72R3-CNN (ResNet-50-FPN, DCN)
40.4
mask AP· 2021-04-03
Recursively Refined R-CNN: Instance Segmentation with Self-RoI Rebalancing Code
#73Mask R-CNN (ResNext-152, +1 NL)SOTA
40.3
mask AP· 2017-11-21
Non-local Neural Networks Code
#74Mask R-CNN-FPN (AOGNet-40M)
40.2
mask AP· 2019-08-04
Attentive Normalization Code
#75R3-CNN (ResNet-50-FPN, GC-Net)
40.2
mask AP· 2021-04-03
Recursively Refined R-CNN: Instance Segmentation with Self-RoI Rebalancing Code
#76CenterMask-VoVNetV2-99-3x
40.2
mask AP· 2019-11-15
CenterMask : Real-Time Anchor-Free Instance Segmentation Code
#77R3-CNN (ResNet-50-FPN, GRoIE)
39.1
mask AP· 2021-04-03
Recursively Refined R-CNN: Instance Segmentation with Self-RoI Rebalancing Code
#78Mask Scoring R-CNN (ResNet-101-FPN-DCN)
39.1
mask AP· 2019-03-01
Mask Scoring R-CNN Code
#79Mask R-CNN-FPN (ResNeXt-101, GN+WS)
38.34
mask AP· 2019-03-25
Micro-Batch Training with Batch-Channel Normalization and Weight Standardization Code
#80R3-CNN (ResNet-50-FPN)
38.2
mask AP· 2021-04-03
Recursively Refined R-CNN: Instance Segmentation with Self-RoI Rebalancing Code
#81HTC (ResNet-50)
38.2
mask AP· 2019-01-22
Hybrid Task Cascade for Instance Segmentation Code
#82Mask Scoring R-CNN (ResNet-101 FPN)
38.2
mask AP· 2019-03-01
Mask Scoring R-CNN Code
#83PANet (ResNet-50)
37.8
mask AP· 2018-03-05
Path Aggregation Network for Instance Segmentation Code
#84GCnet (ResNet-50-FPN, GRoIE)
37.2
mask AP· 2020-04-28
A novel Region of Interest Extraction Layer for Instance Segmentation Code
#85Mask R-CNN (FPN, X-volution, SA)
37.2
mask AP· 2021-06-04
X-volution: On the unification of convolution and self-attention
#86Mask R-CNN (ResNet-101, +1 NL)
37.1
mask AP· 2017-11-21
Non-local Neural Networks Code
#87Mask Scoring R-CNN (ResNet-50 FPN)
36
mask AP· 2019-03-01
Mask Scoring R-CNN Code
#88Mask R-CNN (ResNet-50-FPN, GRoIE)
35.8
mask AP· 2020-04-28
A novel Region of Interest Extraction Layer for Instance Segmentation Code
#89Faster R-CNN (Res2Net-50)
35.6
mask AP· 2019-04-02
Res2Net: A New Multi-scale Backbone Architecture Code
#90Mask R-CNN (ResNet-50, +1 NL)
35.5
mask AP· 2017-11-21
Non-local Neural Networks Code
#91Mask R-CNN (ResNet-50, ACNet)
35.2
mask AP· 2019-04-07
Adaptively Connected Neural Networks Code
#92YOLACT-550 (ResNet-50)
29.9
mask AP· 2019-04-04
YOLACT: Real-time Instance Segmentation Code