Tasks
SotA
Datasets
Papers
Methods
Submit
About
SotA
/
Medical
/
Semantic Segmentation
/
ADE20K val
Semantic Segmentation on ADE20K val
Metric: PQ (higher is better)
Leaderboard
Dataset
Loading chart...
Results
Submit a result
Hide extra data
Export CSV
Sort:
PQ (best first)
PQ (worst first)
Date (newest first)
Date (oldest first)
Model name (A→Z)
#
Model
↕
PQ
▼
Extra Data
Paper
Date
↕
Code
1
OneFormer (InternImage-H, emb_dim=256, single-scale, 896x896)
54.5
No
OneFormer: One Transformer to Rule Universal Ima...
2022-11-10
Code
2
ViT-P (OneFormer, DiNAT-L, single-scale, 1280x1280, COCO_pretrain)
54
Yes
The Missing Point in Vision Transformers for Uni...
2025-05-26
Code
3
OpenSeed(SwinL, single scale, 1280x1280)
53.7
Yes
A Simple Framework for Open-Vocabulary Segmentat...
2023-03-14
Code
4
OneFormer (DiNAT-L, single-scale, 1280x1280, COCO-Pretrain)
53.4
Yes
OneFormer: One Transformer to Rule Universal Ima...
2022-11-10
Code
5
EoMT (DINOv2-g, single-scale, 1280x1280, COCO pre-trained)
52.8
Yes
Your ViT is Secretly an Image Segmentation Model
2025-03-24
Code
6
X-Decoder (Davit-d5, Deform, single-scale, 1280x1280)
52.4
Yes
Generalized Decoding for Pixel, Image, and Langu...
2022-12-21
Code
7
ViT-P (OneFormer, DiNAT-L, single-scale, 1280x1280)
51.9
No
The Missing Point in Vision Transformers for Uni...
2025-05-26
Code
8
OneFormer (DiNAT-L, single-scale, 1280x1280)
51.5
No
OneFormer: One Transformer to Rule Universal Ima...
2022-11-10
Code
9
OneFormer (Swin-L, single-scale, 1280x1280)
51.4
No
OneFormer: One Transformer to Rule Universal Ima...
2022-11-10
Code
10
kMaX-DeepLab (ConvNeXt-L, single-scale, 1281x1281)
50.9
No
kMaX-DeepLab: k-means Mask Transformer
2022-07-08
Code
11
OneFormer (DiNAT-L, single-scale, 640x640)
50.5
No
OneFormer: One Transformer to Rule Universal Ima...
2022-11-10
Code
12
OneFormer (ConvNeXt-XL, single-scale, 640x640)
50.1
No
OneFormer: One Transformer to Rule Universal Ima...
2022-11-10
Code
13
OneFormer (ConvNeXt-L, single-scale, 640x640)
50
No
OneFormer: One Transformer to Rule Universal Ima...
2022-11-10
Code
14
OneFormer (Swin-L, single-scale, 640x640)
49.8
No
OneFormer: One Transformer to Rule Universal Ima...
2022-11-10
Code
15
X-Decoder (L)
49.6
Yes
Generalized Decoding for Pixel, Image, and Langu...
2022-12-21
Code
16
DiNAT-L (Mask2Former, 640x640)
49.4
No
Dilated Neighborhood Attention Transformer
2022-09-29
Code
17
kMaX-DeepLab (ConvNeXt-L, single-scale, 641x641)
48.7
No
kMaX-DeepLab: k-means Mask Transformer
2022-07-08
Code
18
Mask2Former (Swin-L)
48.1
No
Masked-attention Mask Transformer for Universal ...
2021-12-02
Code
19
Mask2Former (Swin-L + FAPN, 640x640)
46.2
No
Masked-attention Mask Transformer for Universal ...
2021-12-02
Code
20
kMaX-DeepLab (ResNet50, single-scale, 1281x1281)
42.3
No
kMaX-DeepLab: k-means Mask Transformer
2022-07-08
Code
21
kMaX-DeepLab (ResNet50, single-scale, 641x641)
41.5
No
kMaX-DeepLab: k-means Mask Transformer
2022-07-08
Code
22
Mask2Former (ResNet-50, 640x640)
39.7
No
Masked-attention Mask Transformer for Universal ...
2021-12-02
Code
23
Panoptic-DeepLab (SwideRNet)
37.9
No
Masked-attention Mask Transformer for Universal ...
2021-12-02
Code
24
MaskFormer (R101 + 6 Enc)
35.7
No
Per-Pixel Classification is Not All You Need for...
2021-07-13
Code
#1
OneFormer (InternImage-H, emb_dim=256, single-scale, 896x896)
SOTA
54.5
PQ
· 2022-11-10
OneFormer: One Transformer to Rule Universal Image Segmentation
Code
#2
ViT-P (OneFormer, DiNAT-L, single-scale, 1280x1280, COCO_pretrain)
54
PQ
· Extra Data
· 2025-05-26
The Missing Point in Vision Transformers for Universal Image Segmentation
Code
#3
OpenSeed(SwinL, single scale, 1280x1280)
53.7
PQ
· Extra Data
· 2023-03-14
A Simple Framework for Open-Vocabulary Segmentation and Detection
Code
#4
OneFormer (DiNAT-L, single-scale, 1280x1280, COCO-Pretrain)
53.4
PQ
· Extra Data
· 2022-11-10
OneFormer: One Transformer to Rule Universal Image Segmentation
Code
#5
EoMT (DINOv2-g, single-scale, 1280x1280, COCO pre-trained)
52.8
PQ
· Extra Data
· 2025-03-24
Your ViT is Secretly an Image Segmentation Model
Code
#6
X-Decoder (Davit-d5, Deform, single-scale, 1280x1280)
52.4
PQ
· Extra Data
· 2022-12-21
Generalized Decoding for Pixel, Image, and Language
Code
#7
ViT-P (OneFormer, DiNAT-L, single-scale, 1280x1280)
51.9
PQ
· 2025-05-26
The Missing Point in Vision Transformers for Universal Image Segmentation
Code
#8
OneFormer (DiNAT-L, single-scale, 1280x1280)
51.5
PQ
· 2022-11-10
OneFormer: One Transformer to Rule Universal Image Segmentation
Code
#9
OneFormer (Swin-L, single-scale, 1280x1280)
51.4
PQ
· 2022-11-10
OneFormer: One Transformer to Rule Universal Image Segmentation
Code
#10
kMaX-DeepLab (ConvNeXt-L, single-scale, 1281x1281)
SOTA
50.9
PQ
· 2022-07-08
kMaX-DeepLab: k-means Mask Transformer
Code
#11
OneFormer (DiNAT-L, single-scale, 640x640)
50.5
PQ
· 2022-11-10
OneFormer: One Transformer to Rule Universal Image Segmentation
Code
#12
OneFormer (ConvNeXt-XL, single-scale, 640x640)
50.1
PQ
· 2022-11-10
OneFormer: One Transformer to Rule Universal Image Segmentation
Code
#13
OneFormer (ConvNeXt-L, single-scale, 640x640)
50
PQ
· 2022-11-10
OneFormer: One Transformer to Rule Universal Image Segmentation
Code
#14
OneFormer (Swin-L, single-scale, 640x640)
49.8
PQ
· 2022-11-10
OneFormer: One Transformer to Rule Universal Image Segmentation
Code
#15
X-Decoder (L)
49.6
PQ
· Extra Data
· 2022-12-21
Generalized Decoding for Pixel, Image, and Language
Code
#16
DiNAT-L (Mask2Former, 640x640)
49.4
PQ
· 2022-09-29
Dilated Neighborhood Attention Transformer
Code
#17
kMaX-DeepLab (ConvNeXt-L, single-scale, 641x641)
48.7
PQ
· 2022-07-08
kMaX-DeepLab: k-means Mask Transformer
Code
#18
Mask2Former (Swin-L)
SOTA
48.1
PQ
· 2021-12-02
Masked-attention Mask Transformer for Universal Image Segmentation
Code
#19
Mask2Former (Swin-L + FAPN, 640x640)
46.2
PQ
· 2021-12-02
Masked-attention Mask Transformer for Universal Image Segmentation
Code
#20
kMaX-DeepLab (ResNet50, single-scale, 1281x1281)
42.3
PQ
· 2022-07-08
kMaX-DeepLab: k-means Mask Transformer
Code
#21
kMaX-DeepLab (ResNet50, single-scale, 641x641)
41.5
PQ
· 2022-07-08
kMaX-DeepLab: k-means Mask Transformer
Code
#22
Mask2Former (ResNet-50, 640x640)
39.7
PQ
· 2021-12-02
Masked-attention Mask Transformer for Universal Image Segmentation
Code
#23
Panoptic-DeepLab (SwideRNet)
37.9
PQ
· 2021-12-02
Masked-attention Mask Transformer for Universal Image Segmentation
Code
#24
MaskFormer (R101 + 6 Enc)
SOTA
35.7
PQ
· 2021-07-13
Per-Pixel Classification is Not All You Need for Semantic Segmentation
Code