Metric: mIoU (higher is better)
| # | Model↕ | mIoU▼ | Extra Data | Paper | Date↕ | Code |
|---|---|---|---|---|---|---|
| 1 | EfficientPS (Cityscapes-fine) | 90.3 | No | EfficientPS: Efficient Panoptic Segmentation | 2020-04-05 | Code |
| 2 | ViT-P (OneFormer, InternImage-H) | 85.4 | No | The Missing Point in Vision Transformers for Uni... | 2025-05-26 | Code |
| 3 | Panoptic-DeepLab (SWideRNet [1, 1, 4.5], Mapillary Vistas, multi-scale) | 85.3 | Yes | Scaling Wide Residual Networks for Panoptic Segm... | 2020-11-23 | - |
| 4 | OneFormer (ConvNeXt-L, single-scale, 512x1024, Mapillary Vistas-pretrained) | 84.6 | Yes | OneFormer: One Transformer to Rule Universal Ima... | 2022-11-10 | Code |
| 5 | Axial-DeepLab-XL (Mapillary Vistas, multi-scale) | 84.6 | Yes | Axial-DeepLab: Stand-Alone Axial-Attention for P... | 2020-03-17 | Code |
| 6 | Panoptic-DeepLab (SWideRNet [1, 1, 4.5], Mapillary Vistas, single-scale) | 84.6 | Yes | Scaling Wide Residual Networks for Panoptic Segm... | 2020-11-23 | - |
| 7 | OneFormer (ConvNeXt-XL, single-scale) | 83.6 | No | OneFormer: One Transformer to Rule Universal Ima... | 2022-11-10 | Code |
| 8 | kMaX-DeepLab (single-scale) | 83.5 | No | kMaX-DeepLab: k-means Mask Transformer | 2022-07-08 | Code |
| 9 | DiNAT-L (Mask2Former) | 83.4 | No | Dilated Neighborhood Attention Transformer | 2022-09-29 | Code |
| 10 | OneFormer (DiNAT-L, single-scale) | 83.1 | No | OneFormer: One Transformer to Rule Universal Ima... | 2022-11-10 | Code |
| 11 | OneFormer (ConvNeXt-L, single-scale) | 83 | No | OneFormer: One Transformer to Rule Universal Ima... | 2022-11-10 | Code |
| 12 | AFF-Base (single-scale, point-based Mask2Former) | 83 | No | AutoFocusFormer: Image Segmentation off the Grid | 2023-04-24 | Code |
| 13 | OneFormer (Swin-L, single-scale) | 83 | No | OneFormer: One Transformer to Rule Universal Ima... | 2022-11-10 | Code |
| 14 | Mask2Former (Swin-L) | 82.9 | No | Masked-attention Mask Transformer for Universal ... | 2021-12-02 | Code |
| 15 | AFF-Small (single-scale, point-based Mask2Former) | 82.2 | No | AutoFocusFormer: Image Segmentation off the Grid | 2023-04-24 | Code |
| 16 | EfficientPS | 82.1 | Yes | EfficientPS: Efficient Panoptic Segmentation | 2020-04-05 | Code |
| 17 | Panoptic-DeepLab (X71) | 81.5 | Yes | Panoptic-DeepLab: A Simple, Strong, and Fast Bas... | 2019-11-22 | Code |
| 18 | CMT-DeepLab (MaX-S, single-scale, IN-1K) | 81.4 | No | CMT-DeepLab: Clustering Mask Transformers for Pa... | 2022-06-17 | Code |
| 19 | Dynamically Instantiated Network (ResNet-101) | 79.8 | No | Weakly- and Semi-Supervised Panoptic Segmentation | 2018-08-10 | Code |
| 20 | COPS (ResNet-50) | 79.3 | No | Combinatorial Optimization for Panoptic Segmenta... | 2021-06-06 | Code |
| 21 | AdaptIS (ResNeXt-101) | 79.2 | No | AdaptIS: Adaptive Instance Selection Network | 2019-09-17 | - |
| 22 | UPSNet (ResNet-101, multiscale) | 79.2 | Yes | UPSNet: A Unified Panoptic Segmentation Network | 2019-01-12 | Code |
| 23 | TASCNet (ResNet-50, multi-scale) | 78 | Yes | Learning to Fuse Things and Stuff | 2018-12-04 | - |
| 24 | UPSNet (ResNet-101) | 77.8 | Yes | UPSNet: A Unified Panoptic Segmentation Network | 2019-01-12 | Code |
| 25 | TASCNet (ResNet-50) | 77.8 | Yes | Learning to Fuse Things and Stuff | 2018-12-04 | - |
| 26 | AdaptIS (ResNet-101) | 77.2 | No | AdaptIS: Adaptive Instance Selection Network | 2019-09-17 | - |
| 27 | Panoptic FPN (ResNet-101) | 75.7 | No | Panoptic Feature Pyramid Networks | 2019-01-08 | Code |
| 28 | AUNet (ResNet-101-FPN) | 75.6 | No | Attention-guided Unified Network for Panoptic Se... | 2018-12-10 | - |
| 29 | AdaptIS (ResNet-50) | 75.3 | No | AdaptIS: Adaptive Instance Selection Network | 2019-09-17 | - |
| 30 | UPSNet (ResNet-50) | 75.2 | No | UPSNet: A Unified Panoptic Segmentation Network | 2019-01-12 | Code |