TasksSotADatasetsPapersMethodsSubmitAbout
Papers With Code 2

A community resource for machine learning research: papers, code, benchmarks, and state-of-the-art results.

Explore

Notable BenchmarksAll SotADatasetsPapersMethods

Community

Submit ResultsAbout

Data sourced from the PWC Archive (CC-BY-SA 4.0). Built by the community, for the community.

SotA/Computer Vision/Panoptic Segmentation/COCO minival

Panoptic Segmentation on COCO minival

Metric: PQ (higher is better)

LeaderboardDataset
Loading chart...

Results

Submit a result
#Model↕PQ▼Extra DataPaperDate↕Code
1HyperSeg (Swin-B)61.2YesHyperSeg: Towards Universal Visual Segmentation ...2024-11-26Code
2OneFormer (InternImage-H,single-scale)60NoOneFormer: One Transformer to Rule Universal Ima...2022-11-10Code
3OpenSeeD (SwinL, single-scale)59.5YesA Simple Framework for Open-Vocabulary Segmentat...2023-03-14Code
4UMG-CLIP-E/1459.5YesUMG-CLIP: A Unified Multi-Granularity Vision Gen...2024-01-12Code
5MasK DINO (SwinL,single-scale)59.4YesMask DINO: Towards A Unified Transformer-based F...2022-06-06Code
6EoMT (DINOv2-g, single-scale, 1280x1280)59.2NoYour ViT is Secretly an Image Segmentation Model2025-03-24Code
7UMG-CLIP-L/1458.9YesUMG-CLIP: A Unified Multi-Granularity Vision Gen...2024-01-12Code
8DiNAT-L (single-scale, Mask2Former)58.5NoDilated Neighborhood Attention Transformer2022-09-29Code
9ViT-Adapter-L (single-scale, BEiTv2 pretrain, Mask2Former)58.4NoVision Transformer Adapter for Dense Predictions2022-05-17Code
10Visual Attention Network (VAN-B6 + Mask2Former)58.2NoVisual Attention Network2022-02-20Code
11kMaX-DeepLab (single-scale, pseudo-labels)58.1YeskMaX-DeepLab: k-means Mask Transformer2022-07-08Code
12HIPIE (ViT-H, single-scale)58.1YesHierarchical Open-vocabulary Universal Image Seg...2023-07-03Code
13kMaX-DeepLab (single-scale, drop query with 256 queries)58NokMaX-DeepLab: k-means Mask Transformer2022-07-08Code
14OneFormer (DiNAT-L, single-scale)58NoOneFormer: One Transformer to Rule Universal Ima...2022-11-10Code
15kMaX-DeepLab (single-scale)57.9NokMaX-DeepLab: k-means Mask Transformer2022-07-08Code
16OneFormer (Swin-L, single-scale)57.9NoOneFormer: One Transformer to Rule Universal Ima...2022-11-10Code
17FocalNet-L (Mask2Former (200 queries))57.9NoFocal Modulation Networks2022-03-22Code
18Mask2Former (single-scale)57.8NoMasked-attention Mask Transformer for Universal ...2021-12-02Code
19Panoptic SegFormer (single-scale)55.8NoPanoptic SegFormer: Delving Deeper into Panoptic...2021-09-08Code
20CMT-DeepLab (single-scale)55.3NoCMT-DeepLab: Clustering Mask Transformers for Pa...2022-06-17Code
21MaskFormer (single-scale)52.7NoPer-Pixel Classification is Not All You Need for...2021-07-13Code
22MaX-DeepLab-L (single-scale)51.1NoMaX-DeepLab: End-to-End Panoptic Segmentation wi...2020-12-01Code
23Panoptic SegFormer (ResNet-101)50.6NoPanoptic SegFormer: Delving Deeper into Panoptic...2021-09-08Code
24PanopticFPN+ResNeSt(single-scale)47.9NoResNeSt: Split-Attention Networks2020-04-19Code
25DETR-R101 (ResNet-101)45.1NoEnd-to-End Object Detection with Transformers2020-05-26Code
26Panoptic FCN* (ResNet-50-FPN)44.3NoFully Convolutional Networks for Panoptic Segmen...2020-12-01Code
27PanopticFPN++44.1NoEnd-to-End Object Detection with Transformers2020-05-26Code
28Axial-DeepLab-L (multi-scale)43.9NoAxial-DeepLab: Stand-Alone Axial-Attention for P...2020-03-17Code
29Axial-DeepLab-L (single-scale)43.4NoAxial-DeepLab: Stand-Alone Axial-Attention for P...2020-03-17Code