Metric: PQ (higher is better)
| # | Model↕ | PQ▼ | Extra Data | Paper | Date↕ | Code |
|---|---|---|---|---|---|---|
| 1 | ViT-P (OneFormer, InternImage-H) | 70.8 | No | The Missing Point in Vision Transformers for Uni... | 2025-05-26 | Code |
| 2 | OneFormer (ConvNeXt-L, single-scale, 512x1024, Mapillary Vistas-pretrained) | 70.1 | Yes | OneFormer: One Transformer to Rule Universal Ima... | 2022-11-10 | Code |
| 3 | Panoptic-DeepLab (SWideRNet [1, 1, 4.5], Mapillary Vistas, multi-scale) | 69.6 | Yes | Scaling Wide Residual Networks for Panoptic Segm... | 2020-11-23 | - |
| 4 | OneFormer (ConvNeXt-L, single-scale) | 68.51 | No | OneFormer: One Transformer to Rule Universal Ima... | 2022-11-10 | Code |
| 5 | Axial-DeepLab-XL (Mapillary Vistas, multi-scale) | 68.5 | Yes | Axial-DeepLab: Stand-Alone Axial-Attention for P... | 2020-03-17 | Code |
| 6 | Panoptic-DeepLab (SWideRNet [1, 1, 4.5], Mapillary Vistas, single-scale) | 68.5 | Yes | Scaling Wide Residual Networks for Panoptic Segm... | 2020-11-23 | - |
| 7 | OneFormer (ConvNeXt-XL, single-scale) | 68.4 | No | OneFormer: One Transformer to Rule Universal Ima... | 2022-11-10 | Code |
| 8 | kMaX-DeepLab (single-scale) | 68.4 | No | kMaX-DeepLab: k-means Mask Transformer | 2022-07-08 | Code |
| 9 | AFF-Base (single-scale, point-based Mask2Former) | 67.7 | No | AutoFocusFormer: Image Segmentation off the Grid | 2023-04-24 | Code |
| 10 | OneFormer (DiNAT-L, single-scale) | 67.6 | No | OneFormer: One Transformer to Rule Universal Ima... | 2022-11-10 | Code |
| 11 | EfficientPS | 67.5 | Yes | EfficientPS: Efficient Panoptic Segmentation | 2020-04-05 | Code |
| 12 | DiNAT-L (Mask2Former) | 67.2 | No | Dilated Neighborhood Attention Transformer | 2022-09-29 | Code |
| 13 | OneFormer (Swin-L, single-scale) | 67.2 | No | OneFormer: One Transformer to Rule Universal Ima... | 2022-11-10 | Code |
| 14 | AFF-Small (single-scale, point-based Mask2Former) | 66.9 | No | AutoFocusFormer: Image Segmentation off the Grid | 2023-04-24 | Code |
| 15 | Mask2Former (Swin-L) | 66.6 | No | Masked-attention Mask Transformer for Universal ... | 2021-12-02 | Code |
| 16 | EfficientPS (Cityscapes-fine) | 64.9 | No | EfficientPS: Efficient Panoptic Segmentation | 2020-04-05 | Code |
| 17 | CMT-DeepLab (MaX-S, single-scale, IN-1K) | 64.6 | No | CMT-DeepLab: Clustering Mask Transformers for Pa... | 2022-06-17 | Code |
| 18 | Panoptic-DeepLab (X71) | 64.1 | Yes | Panoptic-DeepLab: A Simple, Strong, and Fast Bas... | 2019-11-22 | Code |
| 19 | Mask2Former + Intra-Batch Supervision (ResNet-50) | 62.4 | No | Intra-Batch Supervision for Panoptic Segmentatio... | 2023-04-17 | Code |
| 20 | COPS (ResNet-50) | 62.1 | No | Combinatorial Optimization for Panoptic Segmenta... | 2021-06-06 | Code |
| 21 | AdaptIS (ResNeXt-101) | 62 | No | AdaptIS: Adaptive Instance Selection Network | 2019-09-17 | - |
| 22 | UPSNet (ResNet-101, multiscale) | 61.8 | Yes | UPSNet: A Unified Panoptic Segmentation Network | 2019-01-12 | Code |
| 23 | Panoptic FCN* (ResNet-FPN) | 61.4 | No | Fully Convolutional Networks for Panoptic Segmen... | 2020-12-01 | Code |
| 24 | MRCNN + PSPNet (ResNet-101) | 61.2 | Yes | Panoptic Segmentation | 2018-01-03 | Code |
| 25 | AdaptIS (ResNet-101) | 60.6 | No | AdaptIS: Adaptive Instance Selection Network | 2019-09-17 | - |
| 26 | UPSNet (ResNet-101) | 60.5 | Yes | UPSNet: A Unified Panoptic Segmentation Network | 2019-01-12 | Code |
| 27 | TASCNet (ResNet-50, multi-scale) | 60.4 | Yes | Learning to Fuse Things and Stuff | 2018-12-04 | - |
| 28 | UPSNet (ResNet-50) | 59.3 | No | UPSNet: A Unified Panoptic Segmentation Network | 2019-01-12 | Code |
| 29 | TASCNet (ResNet-50) | 59.2 | Yes | Learning to Fuse Things and Stuff | 2018-12-04 | - |
| 30 | AUNet (ResNet-101-FPN) | 59 | No | Attention-guided Unified Network for Panoptic Se... | 2018-12-10 | - |
| 31 | AdaptIS (ResNet-50) | 59 | No | AdaptIS: Adaptive Instance Selection Network | 2019-09-17 | - |
| 32 | Panoptic FPN (ResNet-101) | 58.1 | No | Panoptic Feature Pyramid Networks | 2019-01-08 | Code |
| 33 | DeeperLab (Xception-71) | 56.5 | No | DeeperLab: Single-Shot Image Parser | 2019-02-13 | - |
| 34 | Dynamically Instantiated Network (ResNet-101) | 53.8 | No | Weakly- and Semi-Supervised Panoptic Segmentation | 2018-08-10 | Code |