Tasks
SotA
Datasets
Papers
Methods
Submit
About
SotA
/
Computer Vision
/
Instance Segmentation
/
ADE20K val
Instance Segmentation on ADE20K val
Metric: AP (higher is better)
Leaderboard
Dataset
Loading chart...
Results
Submit a result
Hide extra data
Export CSV
Sort:
AP (best first)
AP (worst first)
Date (newest first)
Date (oldest first)
Model name (A→Z)
#
Model
↕
AP
▼
Extra Data
Paper
Date
↕
Code
1
OneFormer (InternImage-H, emb_dim=1024, single-scale, 896x896, COCO-Pretrained)
44.2
No
OneFormer: One Transformer to Rule Universal Ima...
2022-11-10
Code
2
OpenSeeD
42.6
Yes
A Simple Framework for Open-Vocabulary Segmentat...
2023-03-14
Code
3
ViT-P (OneFormer, DiNAT-L, single-scale, 1280x1280, COCO_pretrain)
40.7
Yes
The Missing Point in Vision Transformers for Uni...
2025-05-26
Code
4
OneFormer (DiNAT-L, single-scale, 1280x1280, COCO-pretrain)
40.2
Yes
OneFormer: One Transformer to Rule Universal Ima...
2022-11-10
Code
5
X-Decoder (Davit-d5, Deform, single-scale, 1280x1280)
38.7
Yes
Generalized Decoding for Pixel, Image, and Langu...
2022-12-21
Code
6
ViT-P (OneFormer, DiNAT-L, single-scale, 1280x1280)
37.8
No
The Missing Point in Vision Transformers for Uni...
2025-05-26
Code
7
OneFormer (DiNAT-L, single-scale)
36
No
OneFormer: One Transformer to Rule Universal Ima...
2022-11-10
Code
8
OneFormer (Swin-L, single-scale)
35.9
No
OneFormer: One Transformer to Rule Universal Ima...
2022-11-10
Code
9
X-Decoder (L)
35.8
Yes
Generalized Decoding for Pixel, Image, and Langu...
2022-12-21
Code
10
DiNAT-L (Mask2Former, single-scale)
35.4
No
Dilated Neighborhood Attention Transformer
2022-09-29
Code
11
Mask2Former (Swin-L, single-scale)
34.9
No
Masked-attention Mask Transformer for Universal ...
2021-12-02
Code
12
Mask2Former (Swin-L + FAPN)
33.4
No
Masked-attention Mask Transformer for Universal ...
2021-12-02
Code
13
Mask2Former (ResNet50)
26.4
No
Masked-attention Mask Transformer for Universal ...
2021-12-02
Code
#1
OneFormer (InternImage-H, emb_dim=1024, single-scale, 896x896, COCO-Pretrained)
SOTA
44.2
AP
· 2022-11-10
OneFormer: One Transformer to Rule Universal Image Segmentation
Code
#2
OpenSeeD
42.6
AP
· Extra Data
· 2023-03-14
A Simple Framework for Open-Vocabulary Segmentation and Detection
Code
#3
ViT-P (OneFormer, DiNAT-L, single-scale, 1280x1280, COCO_pretrain)
40.7
AP
· Extra Data
· 2025-05-26
The Missing Point in Vision Transformers for Universal Image Segmentation
Code
#4
OneFormer (DiNAT-L, single-scale, 1280x1280, COCO-pretrain)
40.2
AP
· Extra Data
· 2022-11-10
OneFormer: One Transformer to Rule Universal Image Segmentation
Code
#5
X-Decoder (Davit-d5, Deform, single-scale, 1280x1280)
38.7
AP
· Extra Data
· 2022-12-21
Generalized Decoding for Pixel, Image, and Language
Code
#6
ViT-P (OneFormer, DiNAT-L, single-scale, 1280x1280)
37.8
AP
· 2025-05-26
The Missing Point in Vision Transformers for Universal Image Segmentation
Code
#7
OneFormer (DiNAT-L, single-scale)
36
AP
· 2022-11-10
OneFormer: One Transformer to Rule Universal Image Segmentation
Code
#8
OneFormer (Swin-L, single-scale)
35.9
AP
· 2022-11-10
OneFormer: One Transformer to Rule Universal Image Segmentation
Code
#9
X-Decoder (L)
35.8
AP
· Extra Data
· 2022-12-21
Generalized Decoding for Pixel, Image, and Language
Code
#10
DiNAT-L (Mask2Former, single-scale)
SOTA
35.4
AP
· 2022-09-29
Dilated Neighborhood Attention Transformer
Code
#11
Mask2Former (Swin-L, single-scale)
SOTA
34.9
AP
· 2021-12-02
Masked-attention Mask Transformer for Universal Image Segmentation
Code
#12
Mask2Former (Swin-L + FAPN)
33.4
AP
· 2021-12-02
Masked-attention Mask Transformer for Universal Image Segmentation
Code
#13
Mask2Former (ResNet50)
26.4
AP
· 2021-12-02
Masked-attention Mask Transformer for Universal Image Segmentation
Code