Tasks
SotA
Datasets
Papers
Methods
Submit
About
SotA
/
Computer Vision
/
Panoptic Segmentation
/
COCO minival
Panoptic Segmentation on COCO minival
Metric: PQ (higher is better)
Leaderboard
Dataset
Loading chart...
Results
Submit a result
Hide extra data
Export CSV
Sort:
PQ (best first)
PQ (worst first)
Date (newest first)
Date (oldest first)
Model name (A→Z)
#
Model
↕
PQ
▼
Extra Data
Paper
Date
↕
Code
1
HyperSeg (Swin-B)
61.2
Yes
HyperSeg: Towards Universal Visual Segmentation ...
2024-11-26
Code
2
OneFormer (InternImage-H,single-scale)
60
No
OneFormer: One Transformer to Rule Universal Ima...
2022-11-10
Code
3
OpenSeeD (SwinL, single-scale)
59.5
Yes
A Simple Framework for Open-Vocabulary Segmentat...
2023-03-14
Code
4
UMG-CLIP-E/14
59.5
Yes
UMG-CLIP: A Unified Multi-Granularity Vision Gen...
2024-01-12
Code
5
MasK DINO (SwinL,single-scale)
59.4
Yes
Mask DINO: Towards A Unified Transformer-based F...
2022-06-06
Code
6
EoMT (DINOv2-g, single-scale, 1280x1280)
59.2
No
Your ViT is Secretly an Image Segmentation Model
2025-03-24
Code
7
UMG-CLIP-L/14
58.9
Yes
UMG-CLIP: A Unified Multi-Granularity Vision Gen...
2024-01-12
Code
8
DiNAT-L (single-scale, Mask2Former)
58.5
No
Dilated Neighborhood Attention Transformer
2022-09-29
Code
9
ViT-Adapter-L (single-scale, BEiTv2 pretrain, Mask2Former)
58.4
No
Vision Transformer Adapter for Dense Predictions
2022-05-17
Code
10
Visual Attention Network (VAN-B6 + Mask2Former)
58.2
No
Visual Attention Network
2022-02-20
Code
11
kMaX-DeepLab (single-scale, pseudo-labels)
58.1
Yes
kMaX-DeepLab: k-means Mask Transformer
2022-07-08
Code
12
HIPIE (ViT-H, single-scale)
58.1
Yes
Hierarchical Open-vocabulary Universal Image Seg...
2023-07-03
Code
13
kMaX-DeepLab (single-scale, drop query with 256 queries)
58
No
kMaX-DeepLab: k-means Mask Transformer
2022-07-08
Code
14
OneFormer (DiNAT-L, single-scale)
58
No
OneFormer: One Transformer to Rule Universal Ima...
2022-11-10
Code
15
kMaX-DeepLab (single-scale)
57.9
No
kMaX-DeepLab: k-means Mask Transformer
2022-07-08
Code
16
OneFormer (Swin-L, single-scale)
57.9
No
OneFormer: One Transformer to Rule Universal Ima...
2022-11-10
Code
17
FocalNet-L (Mask2Former (200 queries))
57.9
No
Focal Modulation Networks
2022-03-22
Code
18
Mask2Former (single-scale)
57.8
No
Masked-attention Mask Transformer for Universal ...
2021-12-02
Code
19
Panoptic SegFormer (single-scale)
55.8
No
Panoptic SegFormer: Delving Deeper into Panoptic...
2021-09-08
Code
20
CMT-DeepLab (single-scale)
55.3
No
CMT-DeepLab: Clustering Mask Transformers for Pa...
2022-06-17
Code
21
MaskFormer (single-scale)
52.7
No
Per-Pixel Classification is Not All You Need for...
2021-07-13
Code
22
MaX-DeepLab-L (single-scale)
51.1
No
MaX-DeepLab: End-to-End Panoptic Segmentation wi...
2020-12-01
Code
23
Panoptic SegFormer (ResNet-101)
50.6
No
Panoptic SegFormer: Delving Deeper into Panoptic...
2021-09-08
Code
24
PanopticFPN+ResNeSt(single-scale)
47.9
No
ResNeSt: Split-Attention Networks
2020-04-19
Code
25
DETR-R101 (ResNet-101)
45.1
No
End-to-End Object Detection with Transformers
2020-05-26
Code
26
Panoptic FCN* (ResNet-50-FPN)
44.3
No
Fully Convolutional Networks for Panoptic Segmen...
2020-12-01
Code
27
PanopticFPN++
44.1
No
End-to-End Object Detection with Transformers
2020-05-26
Code
28
Axial-DeepLab-L (multi-scale)
43.9
No
Axial-DeepLab: Stand-Alone Axial-Attention for P...
2020-03-17
Code
29
Axial-DeepLab-L (single-scale)
43.4
No
Axial-DeepLab: Stand-Alone Axial-Attention for P...
2020-03-17
Code
#1
HyperSeg (Swin-B)
SOTA
61.2
PQ
· Extra Data
· 2024-11-26
HyperSeg: Towards Universal Visual Segmentation with Large Language Model
Code
#2
OneFormer (InternImage-H,single-scale)
SOTA
60
PQ
· 2022-11-10
OneFormer: One Transformer to Rule Universal Image Segmentation
Code
#3
OpenSeeD (SwinL, single-scale)
59.5
PQ
· Extra Data
· 2023-03-14
A Simple Framework for Open-Vocabulary Segmentation and Detection
Code
#4
UMG-CLIP-E/14
59.5
PQ
· Extra Data
· 2024-01-12
UMG-CLIP: A Unified Multi-Granularity Vision Generalist for Open-World Understanding
Code
#5
MasK DINO (SwinL,single-scale)
SOTA
59.4
PQ
· Extra Data
· 2022-06-06
Mask DINO: Towards A Unified Transformer-based Framework for Object Detection and Segmentation
Code
#6
EoMT (DINOv2-g, single-scale, 1280x1280)
59.2
PQ
· 2025-03-24
Your ViT is Secretly an Image Segmentation Model
Code
#7
UMG-CLIP-L/14
58.9
PQ
· Extra Data
· 2024-01-12
UMG-CLIP: A Unified Multi-Granularity Vision Generalist for Open-World Understanding
Code
#8
DiNAT-L (single-scale, Mask2Former)
58.5
PQ
· 2022-09-29
Dilated Neighborhood Attention Transformer
Code
#9
ViT-Adapter-L (single-scale, BEiTv2 pretrain, Mask2Former)
SOTA
58.4
PQ
· 2022-05-17
Vision Transformer Adapter for Dense Predictions
Code
#10
Visual Attention Network (VAN-B6 + Mask2Former)
SOTA
58.2
PQ
· 2022-02-20
Visual Attention Network
Code
#11
kMaX-DeepLab (single-scale, pseudo-labels)
58.1
PQ
· Extra Data
· 2022-07-08
kMaX-DeepLab: k-means Mask Transformer
Code
#12
HIPIE (ViT-H, single-scale)
58.1
PQ
· Extra Data
· 2023-07-03
Hierarchical Open-vocabulary Universal Image Segmentation
Code
#13
kMaX-DeepLab (single-scale, drop query with 256 queries)
58
PQ
· 2022-07-08
kMaX-DeepLab: k-means Mask Transformer
Code
#14
OneFormer (DiNAT-L, single-scale)
58
PQ
· 2022-11-10
OneFormer: One Transformer to Rule Universal Image Segmentation
Code
#15
kMaX-DeepLab (single-scale)
57.9
PQ
· 2022-07-08
kMaX-DeepLab: k-means Mask Transformer
Code
#16
OneFormer (Swin-L, single-scale)
57.9
PQ
· 2022-11-10
OneFormer: One Transformer to Rule Universal Image Segmentation
Code
#17
FocalNet-L (Mask2Former (200 queries))
57.9
PQ
· 2022-03-22
Focal Modulation Networks
Code
#18
Mask2Former (single-scale)
SOTA
57.8
PQ
· 2021-12-02
Masked-attention Mask Transformer for Universal Image Segmentation
Code
#19
Panoptic SegFormer (single-scale)
SOTA
55.8
PQ
· 2021-09-08
Panoptic SegFormer: Delving Deeper into Panoptic Segmentation with Transformers
Code
#20
CMT-DeepLab (single-scale)
55.3
PQ
· 2022-06-17
CMT-DeepLab: Clustering Mask Transformers for Panoptic Segmentation
Code
#21
MaskFormer (single-scale)
SOTA
52.7
PQ
· 2021-07-13
Per-Pixel Classification is Not All You Need for Semantic Segmentation
Code
#22
MaX-DeepLab-L (single-scale)
SOTA
51.1
PQ
· 2020-12-01
MaX-DeepLab: End-to-End Panoptic Segmentation with Mask Transformers
Code
#23
Panoptic SegFormer (ResNet-101)
50.6
PQ
· 2021-09-08
Panoptic SegFormer: Delving Deeper into Panoptic Segmentation with Transformers
Code
#24
PanopticFPN+ResNeSt(single-scale)
SOTA
47.9
PQ
· 2020-04-19
ResNeSt: Split-Attention Networks
Code
#25
DETR-R101 (ResNet-101)
45.1
PQ
· 2020-05-26
End-to-End Object Detection with Transformers
Code
#26
Panoptic FCN* (ResNet-50-FPN)
44.3
PQ
· 2020-12-01
Fully Convolutional Networks for Panoptic Segmentation
Code
#27
PanopticFPN++
44.1
PQ
· 2020-05-26
End-to-End Object Detection with Transformers
Code
#28
Axial-DeepLab-L (multi-scale)
SOTA
43.9
PQ
· 2020-03-17
Axial-DeepLab: Stand-Alone Axial-Attention for Panoptic Segmentation
Code
#29
Axial-DeepLab-L (single-scale)
43.4
PQ
· 2020-03-17
Axial-DeepLab: Stand-Alone Axial-Attention for Panoptic Segmentation
Code