TasksSotADatasetsPapersMethodsSubmitAbout
Papers With Code 2

A community resource for machine learning research: papers, code, benchmarks, and state-of-the-art results.

Explore

Notable BenchmarksAll SotADatasetsPapersMethods

Community

Submit ResultsAbout

Data sourced from the PWC Archive (CC-BY-SA 4.0). Built by the community, for the community.

SotA/Audio/10-shot image generation/Cityscapes val

10-shot image generation on Cityscapes val

Metric: mIoU (higher is better)

LeaderboardDataset
Loading chart...

Results

Submit a result
#Model↕mIoU▼Extra DataPaperDate↕Code
1EfficientPS (Cityscapes-fine)90.3NoEfficientPS: Efficient Panoptic Segmentation2020-04-05Code
2ViT-P (InternImage-H)87.4YesThe Missing Point in Vision Transformers for Uni...2025-05-26Code
3SERNet-Former87.35NoSERNet-Former: Semantic Segmentation by Efficien...2024-01-28Code
4MetaPrompt-SD87.1YesHarnessing Diffusion Models for Visual Perceptio...2023-12-22Code
5InternImage-H87YesInternImage: Exploring Large-Scale Vision Founda...2022-11-10Code
6HRNetV2-OCR+PSA86.93YesPolarized Self-Attention: Towards High-quality P...2021-07-02Code
7InternImage-XL86.4YesInternImage: Exploring Large-Scale Vision Founda...2022-11-10Code
8HRNet-OCR86.3YesHierarchical Multi-Scale Attention for Semantic ...2020-05-21Code
9Depth Anything86.2NoDepth Anything: Unleashing the Power of Large-Sc...2024-01-19Code
10OneFormer (ConvNeXt-XL, Mapillary, multi-scale)85.8YesOneFormer: One Transformer to Rule Universal Ima...2022-11-10Code
11ViT-Adapter-L85.8YesVision Transformer Adapter for Dense Predictions2022-05-17Code
12ViT-P (OneFormer, InternImage-H)85.4NoThe Missing Point in Vision Transformers for Uni...2025-05-26Code
13Panoptic-DeepLab (SWideRNet [1, 1, 4.5], Mapillary Vistas, multi-scale)85.3YesScaling Wide Residual Networks for Panoptic Segm...2020-11-23-
14SeMask (SeMask Swin-L Mask2Former)84.98NoSeMask: Semantically Masked Transformers for Sem...2021-12-23Code
15Sequential Ensemble (MiT-B5 + HRNet)84.8NoSequential Ensembling for Semantic Segmentation2022-10-08-
16Soft Labells (HRnet)84.8NoSoft labelling for semantic segmentation: Bringi...2023-02-27Code
17OneFormer (ConvNeXt-XL, multi-scale)84.6NoOneFormer: One Transformer to Rule Universal Ima...2022-11-10Code
18OneFormer (ConvNeXt-L, single-scale, 512x1024, Mapillary Vistas-pretrained)84.6YesOneFormer: One Transformer to Rule Universal Ima...2022-11-10Code
19Axial-DeepLab-XL (Mapillary Vistas, multi-scale)84.6YesAxial-DeepLab: Stand-Alone Axial-Attention for P...2020-03-17Code
20Panoptic-DeepLab (SWideRNet [1, 1, 4.5], Mapillary Vistas, single-scale)84.6YesScaling Wide Residual Networks for Panoptic Segm...2020-11-23-
21DiNAT-L (Mask2Former)84.5NoDilated Neighborhood Attention Transformer2022-09-29Code
22OneFormer (Swin-L, multi-scale)84.4NoOneFormer: One Transformer to Rule Universal Ima...2022-11-10Code
23VPNeXt84.4NoVPNeXt -- Rethinking Dense Decoding for Plain Vi...2025-02-23-
24VOLO-D4 (MS, ImageNet1k pretrain)84.3NoVOLO: Vision Outlooker for Visual Recognition2021-06-24Code
25Mask2Former (Swin-L)84.3NoMasked-attention Mask Transformer for Universal ...2021-12-02Code
26EoMT (DINOv2-L, single-scale, 1024x1024)84.2NoYour ViT is Secretly an Image Segmentation Model2025-03-24Code
27SegFormer (MiT-B5, Mapillary)84YesSegFormer: Simple and Efficient Design for Seman...2021-05-31Code
28DDP (ConvNeXt-L, step-3)83.9NoDDP: Diffusion Model for Dense Visual Prediction2023-03-30Code
29HRNetV2 + OCR + RMI (PaddleClas pretrained)83.6NoSegmentation Transformer: Object-Contextual Repr...2019-09-24Code
30OneFormer (ConvNeXt-XL, single-scale)83.6NoOneFormer: One Transformer to Rule Universal Ima...2022-11-10Code
31SynBoost83.5NoPixel-wise Anomaly Detection in Complex Driving ...2021-03-09Code
32kMaX-DeepLab (single-scale)83.5NokMaX-DeepLab: k-means Mask Transformer2022-07-08Code
33HRNetV2+OCR+CBL(ImageNet pretrained)83.4No--Code
34DiNAT-L (Mask2Former)83.4NoDilated Neighborhood Attention Transformer2022-09-29Code
35EfficientViT-B3 (r1184x2368)83.2NoEfficientViT: Multi-Scale Linear Attention for H...2022-05-29Code
36OneFormer (DiNAT-L, single-scale)83.1NoOneFormer: One Transformer to Rule Universal Ima...2022-11-10Code
37OneFormer (ConvNeXt-L, single-scale)83NoOneFormer: One Transformer to Rule Universal Ima...2022-11-10Code
38AFF-Base (single-scale, point-based Mask2Former)83NoAutoFocusFormer: Image Segmentation off the Grid2023-04-24Code
39OneFormer (Swin-L, single-scale)83NoOneFormer: One Transformer to Rule Universal Ima...2022-11-10Code
40Mask2Former (Swin-L)82.9NoMasked-attention Mask Transformer for Universal ...2021-12-02Code
41FAN-L-Hybrid+STL82.8NoFully Attentional Networks with Self-emerging To...2024-01-08Code
42ResNeSt-20082.7NoResNeSt: Split-Attention Networks2020-04-19Code
43WaveMix82.7NoWaveMix: A Resource-efficient Neural Network for...2022-05-28Code
44CMX (B4)82.6NoCMX: Cross-Modal Fusion for RGB-X Semantic Segme...2022-03-09Code
45WaveMix-256/16 (Level-4)82.6NoWaveMix: A Resource-efficient Neural Network for...2022-05-28Code
46FAN-L-Hybrid82.3NoUnderstanding The Robustness in Vision Transform...2022-04-26Code
47AFF-Small (single-scale, point-based Mask2Former)82.2NoAutoFocusFormer: Image Segmentation off the Grid2023-04-24Code
48SETR-PUP (80k, MS)82.15NoRethinking Semantic Segmentation from a Sequence...2020-12-31Code
49EfficientPS82.1YesEfficientPS: Efficient Panoptic Segmentation2020-04-05Code
50DSNet-Base(single-scale)82NoDSNet: A Novel Way to Use Atrous Convolutions in...2024-06-06Code
51CMX (B2)81.6NoCMX: Cross-Modal Fusion for RGB-X Semantic Segme...2022-03-09Code
52Soft Labells (Deeplab)81.5No---
53Panoptic-DeepLab (X71)81.5YesPanoptic-DeepLab: A Simple, Strong, and Fast Bas...2019-11-22Code
54CMT-DeepLab (MaX-S, single-scale, IN-1K)81.4NoCMT-DeepLab: Clustering Mask Transformers for Pa...2022-06-17Code
55HRNetV2 (HRNetV2-W48)81.1NoDeep High-Resolution Representation Learning for...2019-08-20Code
56DEPICT-SA (ViT-L multi-scale)81NoRethinking Decoders for Transformer-based Semant...2024-11-05Code
57OCR (ResNet-101-FCN)80.6NoSegmentation Transformer: Object-Contextual Repr...2019-09-24Code
58DSNet(single-scale)80.4NoDSNet: A Novel Way to Use Atrous Convolutions in...2024-06-06Code
59SeMask (SeMask Swin-L FPN)80.39YesSeMask: Semantically Masked Transformers for Sem...2021-12-23Code
60SML80.33NoStandardized Max Logits: A Simple yet Effective ...2021-07-23Code
61HRNetV2 (HRNetV2-W40)80.2NoDeep High-Resolution Representation Learning for...2019-08-20Code
62Dynamically Instantiated Network (ResNet-101)79.8NoWeakly- and Semi-Supervised Panoptic Segmentation2018-08-10Code
63PSPNet (Dilated-ResNet-101)79.7NoPyramid Scene Parsing Network2016-12-04Code
64DeepLabv3+ (Dilated-Xception-71)79.6NoEncoder-Decoder with Atrous Separable Convolutio...2018-02-07Code
65DDRNet2379.4NoDeep Dual-resolution Networks for Real-time and ...2021-01-15Code
66COPS (ResNet-50)79.3NoCombinatorial Optimization for Panoptic Segmenta...2021-06-06Code
67AdaptIS (ResNeXt-101)79.2NoAdaptIS: Adaptive Instance Selection Network2019-09-17-
68UPSNet (ResNet-101, multiscale)79.2YesUPSNet: A Unified Panoptic Segmentation Network2019-01-12Code
69DEPICT-SA (ViT-L single-scale)78.8NoRethinking Decoders for Transformer-based Semant...2024-11-05Code
70SemanticFPN P2-P5 + PointRend78.6NoPointRend: Image Segmentation as Rendering2019-12-17Code
71StreamDEQ (8 iterations)78.2NoRepresentation Recycling for Streaming Video Ana...2022-04-28Code
72PP-LiteSeg-B278.2NoPP-LiteSeg: A Superior Real-Time Semantic Segmen...2022-04-06Code
73TASCNet (ResNet-50, multi-scale)78YesLearning to Fuse Things and Stuff2018-12-04-
74HALO77.8NoHyperbolic Active Learning for Semantic Segmenta...2023-06-19Code
75UPSNet (ResNet-101)77.8YesUPSNet: A Unified Panoptic Segmentation Network2019-01-12Code
76TASCNet (ResNet-50)77.8YesLearning to Fuse Things and Stuff2018-12-04-
77DDRNet23-slim77.4NoDeep Dual-resolution Networks for Real-time and ...2021-01-15Code
78AdaptIS (ResNet-101)77.2NoAdaptIS: Adaptive Instance Selection Network2019-09-17-
79EEEA-Net-C2 (ours)76.8NoEEEA-Net: An Early Exit Evolutionary Neural Arch...2021-08-13Code
80WaveMixLite-256/1676.79No--Code
81SwinMTL76.41NoSwinMTL: A Shared Architecture for Simultaneous ...2024-03-15Code
82CSFNet-276.36NoCSFNet: A Cosine Similarity Fusion Network for R...2024-07-01Code
83CSFNet-276.36NoCSFNet: A Cosine Similarity Fusion Network for R...2024-07-01Code
84RepMLPNet-D25676.27NoRepMLPNet: Hierarchical Vision MLP with Re-param...2021-12-21Code
85PP-LiteSeg-T276NoPP-LiteSeg: A Superior Real-Time Semantic Segmen...2022-04-06Code
86Dilated-ResNet (Dilated-ResNet-101)75.7NoDeep Residual Learning for Image Recognition2015-12-10Code
87Panoptic FPN (ResNet-101)75.7NoPanoptic Feature Pyramid Networks2019-01-08Code
88AUNet (ResNet-101-FPN)75.6NoAttention-guided Unified Network for Panoptic Se...2018-12-10-
89UNet++ (ResNet-101)75.5NoUNet++: A Nested U-Net Architecture for Medical ...2018-07-18Code
90AdaptIS (ResNet-50)75.3NoAdaptIS: Adaptive Instance Selection Network2019-09-17-
91PP-LiteSeg-B175.3NoPP-LiteSeg: A Superior Real-Time Semantic Segmen...2022-04-06Code
92ReLICv275.2NoPushing the limits of self-supervised ResNets: C...2022-01-13Code
93UPSNet (ResNet-50)75.2NoUPSNet: A Unified Panoptic Segmentation Network2019-01-12Code
94CSFNet-174.73NoCSFNet: A Cosine Similarity Fusion Network for R...2024-07-01Code
95CSFNet-174.73NoCSFNet: A Cosine Similarity Fusion Network for R...2024-07-01Code
96BYOL74.6YesPushing the limits of self-supervised ResNets: C...2022-01-13Code
97FasterSeg73.1NoFasterSeg: Searching for Faster Real-time Semant...2019-12-23Code
98PP-LiteSeg-T173.1NoPP-LiteSeg: A Superior Real-Time Semantic Segmen...2022-04-06Code
99StreamDEQ (4 iterations)71.5NoRepresentation Recycling for Streaming Video Ana...2022-04-28Code
100Fast-SCNN + Coarse + ImageNet69.19NoFast-SCNN: Fast Semantic Segmentation Network2019-02-12Code
101DiCENet63.4NoDiCENet: Dimension-wise Convolutions for Efficie...2019-06-08Code
102DCT-EDANet61.6NoExploring Semantic Segmentation on the DCT Repre...2019-07-23-
103StreamDEQ (2 iterations)57.9NoRepresentation Recycling for Streaming Video Ana...2022-04-28Code
104CARB52.1NoWeakly Supervised Semantic Segmentation for Driv...2023-12-21Code
105CorrCLIP51.1NoCorrCLIP: Reconstructing Correlations in CLIP wi...2024-11-15Code
106Trident47.6NoHarnessing Vision Foundation Models for High-Per...2024-11-14Code
107StreamDEQ (1 iterations)45.5NoRepresentation Recycling for Streaming Video Ana...2022-04-28Code
108MRFP+(Ours) Resnet5042.4NoMRFP: Learning Generalizable Semantic Segmentati...2023-11-30Code
109ProxyCLIP42NoProxyCLIP: Proxy Attention Improves CLIP for Ope...2024-08-09Code
110COSMOS ViT-B/1634.7NoCOSMOS: Cross-Modality Self-Distillation for Vis...2024-12-02Code
111Resnet5034.66NoMRFP: Learning Generalizable Semantic Segmentati...2023-11-30Code
112TTD (MaskCLIP)32NoTTD: Text-Tag Self-Distillation Enhancing Image-...2024-03-30Code
113TagAlign27.5NoTagAlign: Improving Vision-Language Alignment wi...2023-12-21Code
114TTD (TCL)27NoTTD: Text-Tag Self-Distillation Enhancing Image-...2024-03-30Code
115ReCo+24.2NoReCo: Retrieve and Co-segment for Zero-shot Tran...2022-06-14Code
116TCL24NoLearning to Generate Text-grounded Mask for Open...2022-12-01Code
117Segmenter ViT-S/1621.8NoDrive&Segment: Unsupervised Semantic Segmentatio...2022-03-21Code
118ReCo19.3NoReCo: Retrieve and Co-segment for Zero-shot Tran...2022-06-14Code
119CLIPpy ViT-B18.1NoPerceptual Grouping in Contrastive Vision-Langua...2022-10-18Code
120MaskCLIP10NoExtract Free Dense Labels from CLIP2021-12-02Code