| 1 | EfficientPS (Cityscapes-fine) | 90.3 | No | EfficientPS: Efficient Panoptic Segmentation | 2020-04-05 | Code |
| 2 | ViT-P (InternImage-H) | 87.4 | Yes | The Missing Point in Vision Transformers for Uni... | 2025-05-26 | Code |
| 3 | SERNet-Former | 87.35 | No | SERNet-Former: Semantic Segmentation by Efficien... | 2024-01-28 | Code |
| 4 | MetaPrompt-SD | 87.1 | Yes | Harnessing Diffusion Models for Visual Perceptio... | 2023-12-22 | Code |
| 5 | InternImage-H | 87 | Yes | InternImage: Exploring Large-Scale Vision Founda... | 2022-11-10 | Code |
| 6 | HRNetV2-OCR+PSA | 86.93 | Yes | Polarized Self-Attention: Towards High-quality P... | 2021-07-02 | Code |
| 7 | InternImage-XL | 86.4 | Yes | InternImage: Exploring Large-Scale Vision Founda... | 2022-11-10 | Code |
| 8 | HRNet-OCR | 86.3 | Yes | Hierarchical Multi-Scale Attention for Semantic ... | 2020-05-21 | Code |
| 9 | Depth Anything | 86.2 | No | Depth Anything: Unleashing the Power of Large-Sc... | 2024-01-19 | Code |
| 10 | OneFormer (ConvNeXt-XL, Mapillary, multi-scale) | 85.8 | Yes | OneFormer: One Transformer to Rule Universal Ima... | 2022-11-10 | Code |
| 11 | ViT-Adapter-L | 85.8 | Yes | Vision Transformer Adapter for Dense Predictions | 2022-05-17 | Code |
| 12 | ViT-P (OneFormer, InternImage-H) | 85.4 | No | The Missing Point in Vision Transformers for Uni... | 2025-05-26 | Code |
| 13 | Panoptic-DeepLab (SWideRNet [1, 1, 4.5], Mapillary Vistas, multi-scale) | 85.3 | Yes | Scaling Wide Residual Networks for Panoptic Segm... | 2020-11-23 | - |
| 14 | SeMask (SeMask Swin-L Mask2Former) | 84.98 | No | SeMask: Semantically Masked Transformers for Sem... | 2021-12-23 | Code |
| 15 | Sequential Ensemble (MiT-B5 + HRNet) | 84.8 | No | Sequential Ensembling for Semantic Segmentation | 2022-10-08 | - |
| 16 | Soft Labells (HRnet) | 84.8 | No | Soft labelling for semantic segmentation: Bringi... | 2023-02-27 | Code |
| 17 | OneFormer (ConvNeXt-XL, multi-scale) | 84.6 | No | OneFormer: One Transformer to Rule Universal Ima... | 2022-11-10 | Code |
| 18 | OneFormer (ConvNeXt-L, single-scale, 512x1024, Mapillary Vistas-pretrained) | 84.6 | Yes | OneFormer: One Transformer to Rule Universal Ima... | 2022-11-10 | Code |
| 19 | Axial-DeepLab-XL (Mapillary Vistas, multi-scale) | 84.6 | Yes | Axial-DeepLab: Stand-Alone Axial-Attention for P... | 2020-03-17 | Code |
| 20 | Panoptic-DeepLab (SWideRNet [1, 1, 4.5], Mapillary Vistas, single-scale) | 84.6 | Yes | Scaling Wide Residual Networks for Panoptic Segm... | 2020-11-23 | - |
| 21 | DiNAT-L (Mask2Former) | 84.5 | No | Dilated Neighborhood Attention Transformer | 2022-09-29 | Code |
| 22 | OneFormer (Swin-L, multi-scale) | 84.4 | No | OneFormer: One Transformer to Rule Universal Ima... | 2022-11-10 | Code |
| 23 | VPNeXt | 84.4 | No | VPNeXt -- Rethinking Dense Decoding for Plain Vi... | 2025-02-23 | - |
| 24 | VOLO-D4 (MS, ImageNet1k pretrain) | 84.3 | No | VOLO: Vision Outlooker for Visual Recognition | 2021-06-24 | Code |
| 25 | Mask2Former (Swin-L) | 84.3 | No | Masked-attention Mask Transformer for Universal ... | 2021-12-02 | Code |
| 26 | EoMT (DINOv2-L, single-scale, 1024x1024) | 84.2 | No | Your ViT is Secretly an Image Segmentation Model | 2025-03-24 | Code |
| 27 | SegFormer (MiT-B5, Mapillary) | 84 | Yes | SegFormer: Simple and Efficient Design for Seman... | 2021-05-31 | Code |
| 28 | DDP (ConvNeXt-L, step-3) | 83.9 | No | DDP: Diffusion Model for Dense Visual Prediction | 2023-03-30 | Code |
| 29 | HRNetV2 + OCR + RMI (PaddleClas pretrained) | 83.6 | No | Segmentation Transformer: Object-Contextual Repr... | 2019-09-24 | Code |
| 30 | OneFormer (ConvNeXt-XL, single-scale) | 83.6 | No | OneFormer: One Transformer to Rule Universal Ima... | 2022-11-10 | Code |
| 31 | SynBoost | 83.5 | No | Pixel-wise Anomaly Detection in Complex Driving ... | 2021-03-09 | Code |
| 32 | kMaX-DeepLab (single-scale) | 83.5 | No | kMaX-DeepLab: k-means Mask Transformer | 2022-07-08 | Code |
| 33 | HRNetV2+OCR+CBL(ImageNet pretrained) | 83.4 | No | - | - | Code |
| 34 | DiNAT-L (Mask2Former) | 83.4 | No | Dilated Neighborhood Attention Transformer | 2022-09-29 | Code |
| 35 | EfficientViT-B3 (r1184x2368) | 83.2 | No | EfficientViT: Multi-Scale Linear Attention for H... | 2022-05-29 | Code |
| 36 | OneFormer (DiNAT-L, single-scale) | 83.1 | No | OneFormer: One Transformer to Rule Universal Ima... | 2022-11-10 | Code |
| 37 | OneFormer (ConvNeXt-L, single-scale) | 83 | No | OneFormer: One Transformer to Rule Universal Ima... | 2022-11-10 | Code |
| 38 | AFF-Base (single-scale, point-based Mask2Former) | 83 | No | AutoFocusFormer: Image Segmentation off the Grid | 2023-04-24 | Code |
| 39 | OneFormer (Swin-L, single-scale) | 83 | No | OneFormer: One Transformer to Rule Universal Ima... | 2022-11-10 | Code |
| 40 | Mask2Former (Swin-L) | 82.9 | No | Masked-attention Mask Transformer for Universal ... | 2021-12-02 | Code |
| 41 | FAN-L-Hybrid+STL | 82.8 | No | Fully Attentional Networks with Self-emerging To... | 2024-01-08 | Code |
| 42 | ResNeSt-200 | 82.7 | No | ResNeSt: Split-Attention Networks | 2020-04-19 | Code |
| 43 | WaveMix | 82.7 | No | WaveMix: A Resource-efficient Neural Network for... | 2022-05-28 | Code |
| 44 | CMX (B4) | 82.6 | No | CMX: Cross-Modal Fusion for RGB-X Semantic Segme... | 2022-03-09 | Code |
| 45 | WaveMix-256/16 (Level-4) | 82.6 | No | WaveMix: A Resource-efficient Neural Network for... | 2022-05-28 | Code |
| 46 | FAN-L-Hybrid | 82.3 | No | Understanding The Robustness in Vision Transform... | 2022-04-26 | Code |
| 47 | AFF-Small (single-scale, point-based Mask2Former) | 82.2 | No | AutoFocusFormer: Image Segmentation off the Grid | 2023-04-24 | Code |
| 48 | SETR-PUP (80k, MS) | 82.15 | No | Rethinking Semantic Segmentation from a Sequence... | 2020-12-31 | Code |
| 49 | EfficientPS | 82.1 | Yes | EfficientPS: Efficient Panoptic Segmentation | 2020-04-05 | Code |
| 50 | DSNet-Base(single-scale) | 82 | No | DSNet: A Novel Way to Use Atrous Convolutions in... | 2024-06-06 | Code |
| 51 | CMX (B2) | 81.6 | No | CMX: Cross-Modal Fusion for RGB-X Semantic Segme... | 2022-03-09 | Code |
| 52 | Soft Labells (Deeplab) | 81.5 | No | - | - | - |
| 53 | Panoptic-DeepLab (X71) | 81.5 | Yes | Panoptic-DeepLab: A Simple, Strong, and Fast Bas... | 2019-11-22 | Code |
| 54 | CMT-DeepLab (MaX-S, single-scale, IN-1K) | 81.4 | No | CMT-DeepLab: Clustering Mask Transformers for Pa... | 2022-06-17 | Code |
| 55 | HRNetV2 (HRNetV2-W48) | 81.1 | No | Deep High-Resolution Representation Learning for... | 2019-08-20 | Code |
| 56 | DEPICT-SA (ViT-L multi-scale) | 81 | No | Rethinking Decoders for Transformer-based Semant... | 2024-11-05 | Code |
| 57 | OCR (ResNet-101-FCN) | 80.6 | No | Segmentation Transformer: Object-Contextual Repr... | 2019-09-24 | Code |
| 58 | DSNet(single-scale) | 80.4 | No | DSNet: A Novel Way to Use Atrous Convolutions in... | 2024-06-06 | Code |
| 59 | SeMask (SeMask Swin-L FPN) | 80.39 | Yes | SeMask: Semantically Masked Transformers for Sem... | 2021-12-23 | Code |
| 60 | SML | 80.33 | No | Standardized Max Logits: A Simple yet Effective ... | 2021-07-23 | Code |
| 61 | HRNetV2 (HRNetV2-W40) | 80.2 | No | Deep High-Resolution Representation Learning for... | 2019-08-20 | Code |
| 62 | Dynamically Instantiated Network (ResNet-101) | 79.8 | No | Weakly- and Semi-Supervised Panoptic Segmentation | 2018-08-10 | Code |
| 63 | PSPNet (Dilated-ResNet-101) | 79.7 | No | Pyramid Scene Parsing Network | 2016-12-04 | Code |
| 64 | DeepLabv3+ (Dilated-Xception-71) | 79.6 | No | Encoder-Decoder with Atrous Separable Convolutio... | 2018-02-07 | Code |
| 65 | DDRNet23 | 79.4 | No | Deep Dual-resolution Networks for Real-time and ... | 2021-01-15 | Code |
| 66 | COPS (ResNet-50) | 79.3 | No | Combinatorial Optimization for Panoptic Segmenta... | 2021-06-06 | Code |
| 67 | AdaptIS (ResNeXt-101) | 79.2 | No | AdaptIS: Adaptive Instance Selection Network | 2019-09-17 | - |
| 68 | UPSNet (ResNet-101, multiscale) | 79.2 | Yes | UPSNet: A Unified Panoptic Segmentation Network | 2019-01-12 | Code |
| 69 | DEPICT-SA (ViT-L single-scale) | 78.8 | No | Rethinking Decoders for Transformer-based Semant... | 2024-11-05 | Code |
| 70 | SemanticFPN P2-P5 + PointRend | 78.6 | No | PointRend: Image Segmentation as Rendering | 2019-12-17 | Code |
| 71 | StreamDEQ (8 iterations) | 78.2 | No | Representation Recycling for Streaming Video Ana... | 2022-04-28 | Code |
| 72 | PP-LiteSeg-B2 | 78.2 | No | PP-LiteSeg: A Superior Real-Time Semantic Segmen... | 2022-04-06 | Code |
| 73 | TASCNet (ResNet-50, multi-scale) | 78 | Yes | Learning to Fuse Things and Stuff | 2018-12-04 | - |
| 74 | HALO | 77.8 | No | Hyperbolic Active Learning for Semantic Segmenta... | 2023-06-19 | Code |
| 75 | UPSNet (ResNet-101) | 77.8 | Yes | UPSNet: A Unified Panoptic Segmentation Network | 2019-01-12 | Code |
| 76 | TASCNet (ResNet-50) | 77.8 | Yes | Learning to Fuse Things and Stuff | 2018-12-04 | - |
| 77 | DDRNet23-slim | 77.4 | No | Deep Dual-resolution Networks for Real-time and ... | 2021-01-15 | Code |
| 78 | AdaptIS (ResNet-101) | 77.2 | No | AdaptIS: Adaptive Instance Selection Network | 2019-09-17 | - |
| 79 | EEEA-Net-C2 (ours) | 76.8 | No | EEEA-Net: An Early Exit Evolutionary Neural Arch... | 2021-08-13 | Code |
| 80 | WaveMixLite-256/16 | 76.79 | No | - | - | Code |
| 81 | SwinMTL | 76.41 | No | SwinMTL: A Shared Architecture for Simultaneous ... | 2024-03-15 | Code |
| 82 | CSFNet-2 | 76.36 | No | CSFNet: A Cosine Similarity Fusion Network for R... | 2024-07-01 | Code |
| 83 | CSFNet-2 | 76.36 | No | CSFNet: A Cosine Similarity Fusion Network for R... | 2024-07-01 | Code |
| 84 | RepMLPNet-D256 | 76.27 | No | RepMLPNet: Hierarchical Vision MLP with Re-param... | 2021-12-21 | Code |
| 85 | PP-LiteSeg-T2 | 76 | No | PP-LiteSeg: A Superior Real-Time Semantic Segmen... | 2022-04-06 | Code |
| 86 | Dilated-ResNet (Dilated-ResNet-101) | 75.7 | No | Deep Residual Learning for Image Recognition | 2015-12-10 | Code |
| 87 | Panoptic FPN (ResNet-101) | 75.7 | No | Panoptic Feature Pyramid Networks | 2019-01-08 | Code |
| 88 | AUNet (ResNet-101-FPN) | 75.6 | No | Attention-guided Unified Network for Panoptic Se... | 2018-12-10 | - |
| 89 | UNet++ (ResNet-101) | 75.5 | No | UNet++: A Nested U-Net Architecture for Medical ... | 2018-07-18 | Code |
| 90 | AdaptIS (ResNet-50) | 75.3 | No | AdaptIS: Adaptive Instance Selection Network | 2019-09-17 | - |
| 91 | PP-LiteSeg-B1 | 75.3 | No | PP-LiteSeg: A Superior Real-Time Semantic Segmen... | 2022-04-06 | Code |
| 92 | ReLICv2 | 75.2 | No | Pushing the limits of self-supervised ResNets: C... | 2022-01-13 | Code |
| 93 | UPSNet (ResNet-50) | 75.2 | No | UPSNet: A Unified Panoptic Segmentation Network | 2019-01-12 | Code |
| 94 | CSFNet-1 | 74.73 | No | CSFNet: A Cosine Similarity Fusion Network for R... | 2024-07-01 | Code |
| 95 | CSFNet-1 | 74.73 | No | CSFNet: A Cosine Similarity Fusion Network for R... | 2024-07-01 | Code |
| 96 | BYOL | 74.6 | Yes | Pushing the limits of self-supervised ResNets: C... | 2022-01-13 | Code |
| 97 | FasterSeg | 73.1 | No | FasterSeg: Searching for Faster Real-time Semant... | 2019-12-23 | Code |
| 98 | PP-LiteSeg-T1 | 73.1 | No | PP-LiteSeg: A Superior Real-Time Semantic Segmen... | 2022-04-06 | Code |
| 99 | StreamDEQ (4 iterations) | 71.5 | No | Representation Recycling for Streaming Video Ana... | 2022-04-28 | Code |
| 100 | Fast-SCNN + Coarse + ImageNet | 69.19 | No | Fast-SCNN: Fast Semantic Segmentation Network | 2019-02-12 | Code |
| 101 | DiCENet | 63.4 | No | DiCENet: Dimension-wise Convolutions for Efficie... | 2019-06-08 | Code |
| 102 | DCT-EDANet | 61.6 | No | Exploring Semantic Segmentation on the DCT Repre... | 2019-07-23 | - |
| 103 | StreamDEQ (2 iterations) | 57.9 | No | Representation Recycling for Streaming Video Ana... | 2022-04-28 | Code |
| 104 | CARB | 52.1 | No | Weakly Supervised Semantic Segmentation for Driv... | 2023-12-21 | Code |
| 105 | CorrCLIP | 51.1 | No | CorrCLIP: Reconstructing Correlations in CLIP wi... | 2024-11-15 | Code |
| 106 | Trident | 47.6 | No | Harnessing Vision Foundation Models for High-Per... | 2024-11-14 | Code |
| 107 | StreamDEQ (1 iterations) | 45.5 | No | Representation Recycling for Streaming Video Ana... | 2022-04-28 | Code |
| 108 | MRFP+(Ours) Resnet50 | 42.4 | No | MRFP: Learning Generalizable Semantic Segmentati... | 2023-11-30 | Code |
| 109 | ProxyCLIP | 42 | No | ProxyCLIP: Proxy Attention Improves CLIP for Ope... | 2024-08-09 | Code |
| 110 | COSMOS ViT-B/16 | 34.7 | No | COSMOS: Cross-Modality Self-Distillation for Vis... | 2024-12-02 | Code |
| 111 | Resnet50 | 34.66 | No | MRFP: Learning Generalizable Semantic Segmentati... | 2023-11-30 | Code |
| 112 | TTD (MaskCLIP) | 32 | No | TTD: Text-Tag Self-Distillation Enhancing Image-... | 2024-03-30 | Code |
| 113 | TagAlign | 27.5 | No | TagAlign: Improving Vision-Language Alignment wi... | 2023-12-21 | Code |
| 114 | TTD (TCL) | 27 | No | TTD: Text-Tag Self-Distillation Enhancing Image-... | 2024-03-30 | Code |
| 115 | ReCo+ | 24.2 | No | ReCo: Retrieve and Co-segment for Zero-shot Tran... | 2022-06-14 | Code |
| 116 | TCL | 24 | No | Learning to Generate Text-grounded Mask for Open... | 2022-12-01 | Code |
| 117 | Segmenter ViT-S/16 | 21.8 | No | Drive&Segment: Unsupervised Semantic Segmentatio... | 2022-03-21 | Code |
| 118 | ReCo | 19.3 | No | ReCo: Retrieve and Co-segment for Zero-shot Tran... | 2022-06-14 | Code |
| 119 | CLIPpy ViT-B | 18.1 | No | Perceptual Grouping in Contrastive Vision-Langua... | 2022-10-18 | Code |
| 120 | MaskCLIP | 10 | No | Extract Free Dense Labels from CLIP | 2021-12-02 | Code |