| 1 | HyperSeg (Swin-B) | 61.2 | Yes | HyperSeg: Towards Universal Visual Segmentation ... | 2024-11-26 | Code |
| 2 | OneFormer (InternImage-H,single-scale) | 60 | No | OneFormer: One Transformer to Rule Universal Ima... | 2022-11-10 | Code |
| 3 | OpenSeeD (SwinL, single-scale) | 59.5 | Yes | A Simple Framework for Open-Vocabulary Segmentat... | 2023-03-14 | Code |
| 4 | UMG-CLIP-E/14 | 59.5 | Yes | UMG-CLIP: A Unified Multi-Granularity Vision Gen... | 2024-01-12 | Code |
| 5 | MasK DINO (SwinL,single-scale) | 59.4 | Yes | Mask DINO: Towards A Unified Transformer-based F... | 2022-06-06 | Code |
| 6 | EoMT (DINOv2-g, single-scale, 1280x1280) | 59.2 | No | Your ViT is Secretly an Image Segmentation Model | 2025-03-24 | Code |
| 7 | UMG-CLIP-L/14 | 58.9 | Yes | UMG-CLIP: A Unified Multi-Granularity Vision Gen... | 2024-01-12 | Code |
| 8 | DiNAT-L (single-scale, Mask2Former) | 58.5 | No | Dilated Neighborhood Attention Transformer | 2022-09-29 | Code |
| 9 | ViT-Adapter-L (single-scale, BEiTv2 pretrain, Mask2Former) | 58.4 | No | Vision Transformer Adapter for Dense Predictions | 2022-05-17 | Code |
| 10 | Visual Attention Network (VAN-B6 + Mask2Former) | 58.2 | No | Visual Attention Network | 2022-02-20 | Code |
| 11 | kMaX-DeepLab (single-scale, pseudo-labels) | 58.1 | Yes | kMaX-DeepLab: k-means Mask Transformer | 2022-07-08 | Code |
| 12 | HIPIE (ViT-H, single-scale) | 58.1 | Yes | Hierarchical Open-vocabulary Universal Image Seg... | 2023-07-03 | Code |
| 13 | kMaX-DeepLab (single-scale, drop query with 256 queries) | 58 | No | kMaX-DeepLab: k-means Mask Transformer | 2022-07-08 | Code |
| 14 | OneFormer (DiNAT-L, single-scale) | 58 | No | OneFormer: One Transformer to Rule Universal Ima... | 2022-11-10 | Code |
| 15 | kMaX-DeepLab (single-scale) | 57.9 | No | kMaX-DeepLab: k-means Mask Transformer | 2022-07-08 | Code |
| 16 | OneFormer (Swin-L, single-scale) | 57.9 | No | OneFormer: One Transformer to Rule Universal Ima... | 2022-11-10 | Code |
| 17 | FocalNet-L (Mask2Former (200 queries)) | 57.9 | No | Focal Modulation Networks | 2022-03-22 | Code |
| 18 | Mask2Former (single-scale) | 57.8 | No | Masked-attention Mask Transformer for Universal ... | 2021-12-02 | Code |
| 19 | Panoptic SegFormer (single-scale) | 55.8 | No | Panoptic SegFormer: Delving Deeper into Panoptic... | 2021-09-08 | Code |
| 20 | CMT-DeepLab (single-scale) | 55.3 | No | CMT-DeepLab: Clustering Mask Transformers for Pa... | 2022-06-17 | Code |
| 21 | MaskFormer (single-scale) | 52.7 | No | Per-Pixel Classification is Not All You Need for... | 2021-07-13 | Code |
| 22 | MaX-DeepLab-L (single-scale) | 51.1 | No | MaX-DeepLab: End-to-End Panoptic Segmentation wi... | 2020-12-01 | Code |
| 23 | Panoptic SegFormer (ResNet-101) | 50.6 | No | Panoptic SegFormer: Delving Deeper into Panoptic... | 2021-09-08 | Code |
| 24 | PanopticFPN+ResNeSt(single-scale) | 47.9 | No | ResNeSt: Split-Attention Networks | 2020-04-19 | Code |
| 25 | DETR-R101 (ResNet-101) | 45.1 | No | End-to-End Object Detection with Transformers | 2020-05-26 | Code |
| 26 | Panoptic FCN* (ResNet-50-FPN) | 44.3 | No | Fully Convolutional Networks for Panoptic Segmen... | 2020-12-01 | Code |
| 27 | PanopticFPN++ | 44.1 | No | End-to-End Object Detection with Transformers | 2020-05-26 | Code |
| 28 | Axial-DeepLab-L (multi-scale) | 43.9 | No | Axial-DeepLab: Stand-Alone Axial-Attention for P... | 2020-03-17 | Code |
| 29 | Axial-DeepLab-L (single-scale) | 43.4 | No | Axial-DeepLab: Stand-Alone Axial-Attention for P... | 2020-03-17 | Code |