TasksSotADatasetsPapersMethodsSubmitAbout
Papers With Code 2

A community resource for machine learning research: papers, code, benchmarks, and state-of-the-art results.

Explore

Notable BenchmarksAll SotADatasetsPapersMethods

Community

Submit ResultsAbout

Data sourced from the PWC Archive (CC-BY-SA 4.0). Built by the community, for the community.

SotA/Audio/10-shot image generation/ADE20K

10-shot image generation on ADE20K

Metric: Validation mIoU (higher is better)

LeaderboardDataset
Loading chart...

Results

Submit a result
#Model↕Validation mIoU▼Extra DataPaperDate↕Code
1ViT-P (InternImage-H)63.6YesThe Missing Point in Vision Transformers for Uni...2025-05-26Code
2ONE-PEACE63YesONE-PEACE: Exploring One General Representation ...2023-05-18Code
3InternImage-H62.9YesInternImage: Exploring Large-Scale Vision Founda...2022-11-10Code
4M3I Pre-training (InternImage-H)62.9YesTowards All-in-one Pre-training via Maximizing M...2022-11-17Code
5BEiT-362.8YesImage as a Foreign Language: BEiT Pretraining fo...2022-08-22Code
6EVA62.3YesEVA: Exploring the Limits of Masked Visual Repre...2022-11-14Code
7ViT-P (OneFormer, InternImage-H)61.6NoThe Missing Point in Vision Transformers for Uni...2025-05-26Code
8ViT-Adapter-L (Mask2Former, BEiTv2 pretrain)61.5YesVision Transformer Adapter for Dense Predictions2022-05-17Code
9FD-SwinV2-G61.4NoContrastive Learning Rivals Masked Image Modelin...2022-05-27Code
10RevCol-H (Mask2Former)61YesReversible Column Networks2022-12-22Code
11MasK DINO (SwinL, multi-scale)60.8YesMask DINO: Towards A Unified Transformer-based F...2022-06-06Code
12ViT-Adapter-L (Mask2Former, BEiT pretrain)60.5YesVision Transformer Adapter for Dense Predictions2022-05-17Code
13DINOv2 (ViT-g/14 frozen model, w/ ViT-Adapter + Mask2former)60.2NoDINOv2: Learning Robust Visual Features without ...2023-04-14Code
14ViT-P (OneFormer, DiNAT-L)59.9NoThe Missing Point in Vision Transformers for Uni...2025-05-26Code
15SwinV2-G(UperNet)59.9YesSwin Transformer V2: Scaling Up Capacity and Res...2021-11-18Code
16PIIP-LH6B(UperNet)59.9NoParameter-Inverted Image Pyramid Networks2024-06-06Code
17SERNet-Former59.35NoSERNet-Former: Semantic Segmentation by Efficien...2024-01-28Code
18FocalNet-L (Mask2Former)58.5YesFocal Modulation Networks2022-03-22Code
19ViT-Adapter-L (UperNet, BEiT pretrain)58.4NoVision Transformer Adapter for Dense Predictions2022-05-17Code
20RSSeg-ViT-L (BEiT pretrain)58.4NoRepresentation Separation for Semantic Segmentat...2022-12-28-
21EoMT (DINOv2-L, single-scale, 512x512)58.4NoYour ViT is Secretly an Image Segmentation Model2025-03-24Code
22SegViT-v2 (BEiT-v2-Large)58.2NoSegViTv2: Exploring Efficient and Continual Sema...2023-06-09Code
23SeMask (SeMask Swin-L FaPN-Mask2Former)58.2NoSeMask: Semantically Masked Transformers for Sem...2021-12-23Code
24SeMask (SeMask Swin-L MSFaPN-Mask2Former)58.2NoSeMask: Semantically Masked Transformers for Sem...2021-12-23Code
25DiNAT-L (Mask2Former)58.1NoDilated Neighborhood Attention Transformer2022-09-29Code
26HorNet-L (Mask2Former)57.9NoHorNet: Efficient High-Order Spatial Interaction...2022-07-28Code
27Mask2Former (SwinL-FaPN)57.7NoMasked-attention Mask Transformer for Universal ...2021-12-02Code
28FASeg (SwinL)57.7NoDynamic Focus-aware Positional Queries for Seman...2022-04-04Code
29RR (BEiT-L)57.7NoRegion Rebalance for Long-Tailed Semantic Segmen...2022-04-05Code
30MOAT-4 (IN-22K pretraining, single-scale)57.6NoMOAT: Alternating Mobile Convolution and Attenti...2022-10-04Code
31Frozen Backbone, SwinV2-G-ext22K (Mask2Former)57.6NoCould Giant Pretrained Image Models Extract Univ...2022-11-03-
32SeMask (SeMask Swin-L Mask2Former)57.5NoSeMask: Semantically Masked Transformers for Sem...2021-12-23Code
33Mask2Former (SwinL)57.3NoMasked-attention Mask Transformer for Universal ...2021-12-02Code
34SenFormer (BEiT-L)57.1YesEfficient Self-Ensemble for Semantic Segmentation2021-11-26Code
35BEiT-L (ViT+UperNet)57NoBEiT: BERT Pre-Training of Image Transformers2021-06-15Code
36SeMask(SeMask Swin-L MSFaPN-Mask2Former, single-scale)57NoSeMask: Semantically Masked Transformers for Sem...2021-12-23Code
37MetaPrompt-SD56.8NoHarnessing Diffusion Models for Visual Perceptio...2023-12-22Code
38FaPN (MaskFormer, Swin-L, ImageNet-22k pretrain)56.7NoFaPN: Feature-aligned Pyramid Network for Dense ...2021-08-16Code
39MOAT-3 (IN-22K pretraining, single-scale)56.5NoMOAT: Alternating Mobile Convolution and Attenti...2022-10-04Code
40Mask2Former (Swin-L-FaPN)56.4NoMasked-attention Mask Transformer for Universal ...2021-12-02Code
41SeMask (SeMask Swin-L MaskFormer)56.2NoSeMask: Semantically Masked Transformers for Sem...2021-12-23Code
42dBOT ViT-L (CLIP)56.2NoExploring Target Representations for Masked Auto...2022-09-08Code
43Mask2Former+CBL(Swin-B)56.1No--Code
44TADP55.9NoText-image Alignment for Diffusion-based Percept...2023-09-29Code
45CSWin-L (UperNet, ImageNet-22k pretrain)55.7NoCSWin Transformer: A General Vision Transformer ...2021-07-01Code
46UniRepLKNet-XL55.6NoUniRepLKNet: A Universal Perception Large-Kernel...2023-11-27Code
47Focal-L (UperNet, ImageNet-22k pretrain)55.4NoFocal Self-attention for Local-Global Interactio...2021-07-01Code
48InternImage-XL55.3NoInternImage: Exploring Large-Scale Vision Founda...2022-11-10Code
49dBOT ViT-L55.2NoExploring Target Representations for Masked Auto...2022-09-08Code
50Mask2Former(Swin-B)55.1NoMasked-attention Mask Transformer for Universal ...2021-12-02Code
51ConvNeXt V2-H (FCMAE)55NoConvNeXt V2: Co-designing and Scaling ConvNets w...2023-01-02Code
52UniRepLKNet-L++55NoUniRepLKNet: A Universal Perception Large-Kernel...2023-11-27Code
53DiNAT-Large (UperNet)54.9NoDilated Neighborhood Attention Transformer2022-09-29Code
54MaskFormer+CBL(Swin-B)54.9No--Code
55TransNeXt-Base (IN-1K pretrain, Mask2Former, 512)54.7NoTransNeXt: Robust Foveal Visual Perception for V...2023-11-28Code
56MOAT-2 (IN-22K pretraining, single-scale)54.7NoMOAT: Alternating Mobile Convolution and Attenti...2022-10-04Code
57CAE (ViT-L, UperNet)54.7NoContext Autoencoder for Self-Supervised Represen...2022-02-07Code
58VAN-B654.7NoVisual Attention Network2022-02-20Code
59DiNAT_s-Large (UperNet)54.6NoDilated Neighborhood Attention Transformer2022-09-29Code
60DDP (Swin-L, step-3)54.4NoDDP: Diffusion Model for Dense Visual Prediction2023-03-30Code
61PatchDiverse + Swin-L (multi-scale test, upernet, ImageNet22k pretrain)54.4NoVision Transformers with Patch Diversification2021-04-26Code
62VOLO-D554.3NoVOLO: Vision Outlooker for Visual Recognition2021-06-24Code
63K-Net54.3NoK-Net: Towards Unified Image Segmentation2021-06-28Code
64GPaCo (Swin-L)54.3NoGeneralized Parametric Contrastive Learning2022-09-26Code
65SenFormer (Swin-L)54.2YesEfficient Self-Ensemble for Semantic Segmentation2021-11-26Code
66Swin V2-H54.2NoConvNeXt V2: Co-designing and Scaling ConvNets w...2023-01-02Code
67InternImage-L54.1NoInternImage: Exploring Large-Scale Vision Founda...2022-11-10Code
68TransNeXt-Small (IN-1K pretrain, Mask2Former, 512)54.1NoTransNeXt: Robust Foveal Visual Perception for V...2023-11-28Code
69ConvNeXt-XL++54NoA ConvNet for the 2020s2022-01-10Code
70Sequential Ensemble (SegFormer)54NoSequential Ensembling for Semantic Segmentation2022-10-08-
71MogaNet-XL (UperNet)54NoMogaNet: Multi-order Gated Aggregation Network2022-11-07Code
72UniRepLKNet-B++53.9NoUniRepLKNet: A Universal Perception Large-Kernel...2023-11-27Code
73MaskFormer(Swin-B)53.8NoPer-Pixel Classification is Not All You Need for...2021-07-13Code
74ConvNeXt-L++53.7NoA ConvNet for the 2020s2022-01-10Code
75SwinV2-G-HTC++ Liu et al. ([2021a])53.7NoSwin Transformer V2: Scaling Up Capacity and Res...2021-11-18Code
76ConvNeXt V2-L53.7NoConvNeXt V2: Co-designing and Scaling ConvNets w...2023-01-02Code
77Seg-L-Mask/16 (MS)53.63NoSegmenter: Transformer for Semantic Segmentation2021-05-12Code
78MAE (ViT-L, UperNet)53.6NoMasked Autoencoders Are Scalable Vision Learners2021-11-11Code
79SeMask (SeMask Swin-L FPN)53.52NoSeMask: Semantically Masked Transformers for Sem...2021-12-23Code
80Swin-L (UperNet, ImageNet-22k pretrain)53.5NoSwin Transformer: Hierarchical Vision Transforme...2021-03-25Code
81Swin-L53.5NoConvNeXt V2: Co-designing and Scaling ConvNets w...2023-01-02Code
82TransNeXt-Tiny (IN-1K pretrain, Mask2Former, 512)53.4NoTransNeXt: Robust Foveal Visual Perception for V...2023-11-28Code
83ConvNeXt-B++53.1NoA ConvNet for the 2020s2022-01-10Code
84PatchConvNet-L120 (UperNet)52.9NoAugmenting Convolutional networks with attention...2021-12-27Code
85dBOT ViT-B (CLIP)52.9NoExploring Target Representations for Masked Auto...2022-09-08Code
86PatchConvNet-B120 (UperNet)52.8NoAugmenting Convolutional networks with attention...2021-12-27Code
87Swin-B52.8NoConvNeXt V2: Co-designing and Scaling ConvNets w...2023-01-02Code
88UniRepLKNet-S++52.7NoUniRepLKNet: A Universal Perception Large-Kernel...2023-11-27Code
89ConvNeXt V2-B52.1NoConvNeXt V2: Co-designing and Scaling ConvNets w...2023-01-02Code
90DeBiFormer-B (IN1k pretrain, Upernet 160k)52NoDeBiFormer: Vision Transformer with Deformable A...2024-10-11Code
91LV-ViT-L (UperNet, MS)51.8NoAll Tokens Matter: Token Labeling for Training B...2021-04-22Code
92SegFormer-B551.8YesSegFormer: Simple and Efficient Design for Seman...2021-05-31Code
93BiFormer-B (IN1k pretrain, Upernet 160k)51.7NoBiFormer: Vision Transformer with Bi-Level Routi...2023-03-15Code
94ConvNeXt V2-L (Supervised)51.6NoConvNeXt V2: Co-designing and Scaling ConvNets w...2023-01-02Code
95Light-Ham (VAN-Huge)51.5NoIs Attention Better Than Matrix Decomposition?2021-09-09Code
96DAT-B++51.5NoDAT++: Spatially Dynamic Vision Transformer with...2023-09-04Code
97CrossFormer (ImageNet1k-pretrain, UPerNet, multi-scale test)51.4NoCrossFormer: A Versatile Vision Transformer Hing...2021-07-31Code
98InternImage-B51.3NoInternImage: Exploring Large-Scale Vision Founda...2022-11-10Code
99DAT-S++51.2NoDAT++: Spatially Dynamic Vision Transformer with...2023-09-04Code
100ActiveMLP-L(UperNet)51.1NoActive Token Mixer2022-03-11Code
101SegFormer-B451.1YesSegFormer: Simple and Efficient Design for Seman...2021-05-31Code
102PatchConvNet-B60 (UperNet)51.1NoAugmenting Convolutional networks with attention...2021-12-27Code
103Light-Ham (VAN-Large)51NoIs Attention Better Than Matrix Decomposition?2021-09-09Code
104TEC (Vit-B, Upernet)51NoTowards Sustainable Self-supervised Learning2022-10-20Code
105UniRepLKNet-S51NoUniRepLKNet: A Universal Perception Large-Kernel...2023-11-27Code
106SeMask (SeMask Swin-B FPN)50.98NoSeMask: Semantically Masked Transformers for Sem...2021-12-23Code
107InternImage-S50.9NoInternImage: Exploring Large-Scale Vision Founda...2022-11-10Code
108MogaNet-L (UperNet)50.9NoMogaNet: Multi-order Gated Aggregation Network2022-11-07Code
109dBOT ViT-B50.8NoExploring Target Representations for Masked Auto...2022-09-08Code
110Upernet-BiFormer-S (IN1k pretrain, Upernet 160k)50.8NoBiFormer: Vision Transformer with Bi-Level Routi...2023-03-15Code
111UperNet Shuffle-B50.5NoShuffle Transformer: Rethinking Spatial Shuffle ...2021-06-07Code
112ConvNeXt V1-L50.5NoConvNeXt V2: Co-designing and Scaling ConvNets w...2023-01-02Code
113DiNAT-Base (UperNet)50.4NoDilated Neighborhood Attention Transformer2022-09-29Code
114ELSA-Swin-S50.3NoELSA: Enhanced Local Self-Attention for Vision T...2021-12-23Code
115DAT-T++50.3NoDAT++: Spatially Dynamic Vision Transformer with...2023-09-04Code
116SETR-MLA (160k, MS)50.28NoRethinking Semantic Segmentation from a Sequence...2020-12-31Code
117VAN-Large (HamNet)50.2NoVisual Attention Network2022-02-20Code
118HRViT-b3 (SegFormer, SS)50.2NoMulti-Scale High-Resolution Vision Transformer f...2021-11-01Code
119Twins-SVT-L (UperNet, ImageNet-1k pretrain)50.2NoTwins: Revisiting the Design of Spatial Attentio...2021-04-28Code
120MogaNet-B (UperNet)50.1NoMogaNet: Multi-order Gated Aggregation Network2022-11-07Code
121Seg-B-Mask/16(MS, ViT-B)50NoSegmenter: Transformer for Semantic Segmentation2021-05-12Code
122iBOT (ViT-B/16)50NoiBOT: Image BERT Pre-Training with Online Tokeni...2021-11-15Code
123ConvNeXt-B49.9NoA ConvNet for the 2020s2022-01-10Code
124DiNAT-Small (UperNet)49.9NoDilated Neighborhood Attention Transformer2022-09-29Code
125ConvNeXt V1-B49.9NoConvNeXt V2: Co-designing and Scaling ConvNets w...2023-01-02Code
126NAT-Base49.7NoNeighborhood Attention Transformer2022-04-14Code
127Swin-B (UperNet, ImageNet-1k pretrain)49.7NoSwin Transformer: Hierarchical Vision Transforme...2021-03-25Code
128Seg-B/8 (MS, ViT-B)49.61NoSegmenter: Transformer for Semantic Segmentation2021-05-12Code
129ConvNeXt-S49.6NoA ConvNet for the 2020s2022-01-10Code
130Light-Ham (VAN-Base)49.6NoIs Attention Better Than Matrix Decomposition?2021-09-09Code
131NAT-Small49.5NoNeighborhood Attention Transformer2022-04-14Code
132DaViT-B49.4NoDaViT: Dual Attention Vision Transformers2022-04-07Code
133DAT-B (UperNet)49.38NoVision Transformer with Deformable Attention2022-01-03Code
134PatchConvNet-S60 (UperNet)49.3NoAugmenting Convolutional networks with attention...2021-12-27Code
135ColorMAE-Green-ViTB-160049.3NoColorMAE: Exploring data-independent masking str...2024-07-17Code
136MogaNet-S (UperNet)49.2NoMogaNet: Multi-order Gated Aggregation Network2022-11-07Code
137Shift-B (UperNet)49.2NoWhen Shift Operation Meets Vision Transformer: A...2022-01-26Code
138UniRepLKNet-T49.1NoUniRepLKNet: A Universal Perception Large-Kernel...2023-11-27Code
139DPT-Hybrid49.02NoVision Transformers for Dense Prediction2021-03-24Code
140GC ViT-B49NoGlobal Context Vision Transformers2022-06-20Code
141A2MIM (ViT-B)49NoArchitecture-Agnostic Masked Image Modeling -- F...2022-05-27Code
142EfficientViT-B3 (r512)49NoEfficientViT: Multi-Scale Linear Attention for H...2022-05-29Code
143DiNAT-Tiny (UperNet)48.8NoDilated Neighborhood Attention Transformer2022-09-29Code
144HRViT-b2 (SegFormer, SS)48.76NoMulti-Scale High-Resolution Vision Transformer f...2021-11-01Code
145NAT-Tiny48.4NoNeighborhood Attention Transformer2022-04-14Code
146XCiT-M24/8 (UperNet)48.4NoXCiT: Cross-Covariance Image Transformers2021-06-17Code
147ResNeSt-20048.36NoResNeSt: Split-Attention Networks2020-04-19Code
148DAT-S (UperNet)48.31NoVision Transformer with Deformable Attention2022-01-03Code
149GC ViT-S48.3NoGlobal Context Vision Transformers2022-06-20Code
150InternImage-T48.1NoInternImage: Exploring Large-Scale Vision Founda...2022-11-10Code
151VAN-Large48.1NoVisual Attention Network2022-02-20Code
152XCiT-S24/8 (UperNet)48.1NoXCiT: Cross-Covariance Image Transformers2021-06-17Code
153MaskFormer(ResNet-101)48.1NoPer-Pixel Classification is Not All You Need for...2021-07-13Code
154MAE (ViT-B, UperNet)48.1NoMasked Autoencoders Are Scalable Vision Learners2021-11-11Code
155HRNetV2 + OCR + RMI (PaddleClas pretrained)47.98NoSegmentation Transformer: Object-Contextual Repr...2019-09-24Code
156Shift-B47.9NoWhen Shift Operation Meets Vision Transformer: A...2022-01-26Code
157Shift-S47.8NoWhen Shift Operation Meets Vision Transformer: A...2022-01-26Code
158MogaNet-S (Semantic FPN)47.7NoMogaNet: Multi-order Gated Aggregation Network2022-11-07Code
159SeMask (SeMask Swin-S FPN)47.63NoSeMask: Semantically Masked Transformers for Sem...2021-12-23Code
160ResNeSt-26947.6NoResNeSt: Split-Attention Networks2020-04-19Code
161UperNet Shuffle-T47.6NoShuffle Transformer: Rethinking Spatial Shuffle ...2021-06-07Code
162CondNet(ResNest-101)47.54NoCondNet: Conditional Classifier for Scene Segmen...2021-09-21Code
163tiny-MOAT-3 (IN-1K pretraining, single scale)47.5NoMOAT: Alternating Mobile Convolution and Attenti...2022-10-04Code
164CondNet(ResNet-101)47.38NoCondNet: Conditional Classifier for Scene Segmen...2021-09-21Code
165DiNAT-Mini (UperNet)47.2NoDilated Neighborhood Attention Transformer2022-09-29Code
166DCNAS47.12NoDCNAS: Densely Connected Neural Architecture Sea...2020-03-26-
167XCiT-S24/8 (Semantic-FPN)47.1NoXCiT: Cross-Covariance Image Transformers2021-06-17Code
168ResNeSt-10146.91NoResNeSt: Split-Attention Networks2020-04-19Code
169XCiT-M24/8 (Semantic-FPN)46.9NoXCiT: Cross-Covariance Image Transformers2021-06-17Code
170HamNet (ResNet-101)46.8NoIs Attention Better Than Matrix Decomposition?2021-09-09Code
171Sequential Ensemble (DeepLabv3+)46.8NoSequential Ensembling for Semantic Segmentation2022-10-08-
172ConvNeXt-T46.7NoA ConvNet for the 2020s2022-01-10Code
173VAN-Base (Semantic-FPN)46.7NoVisual Attention Network2022-02-20Code
174XCiT-S12/8 (UperNet)46.6NoXCiT: Cross-Covariance Image Transformers2021-06-17Code
175GC ViT-T46.5NoGlobal Context Vision Transformers2022-06-20Code
176NAT-Mini46.4NoNeighborhood Attention Transformer2022-04-14Code
177Shift-T46.3NoWhen Shift Operation Meets Vision Transformer: A...2022-01-26Code
178DaViT-T46.3NoDaViT: Dual Attention Vision Transformers2022-04-07Code
179CPN(ResNet-101)46.27NoContext Prior for Scene Segmentation2020-04-03Code
180MultiMAE (ViT-B)46.2NoMultiMAE: Multi-modal Multi-task Masked Autoenco...2022-04-04Code
181DRAN(ResNet-101)46.18No--Code
182PyConvSegNet-15245.99NoPyramidal Convolution: Rethinking Convolutional ...2020-06-20Code
183DNL45.97NoDisentangled Non-Local Neural Networks2020-06-11Code
184ACNet (ResNet-101)45.9NoAdaptive Context Network for Scene Parsing2019-11-05-
185ACNet (ResNet-101)45.9NoAdaptive Context Network for Scene Parsing2019-11-05-
186HRViT-b1 (SegFormer, SS)45.88NoMulti-Scale High-Resolution Vision Transformer f...2021-11-01Code
187OCR(HRNetV2-W48)45.66NoSegmentation Transformer: Object-Contextual Repr...2019-09-24Code
188SPNet (ResNet-101)45.6NoStrip Pooling: Rethinking Spatial Pooling for Sc...2020-03-30Code
189Swin-T (UPerNet) MoBY45.58NoSelf-Supervised Learning with Swin Transformers2021-05-10Code
190DAT-T (UperNet)45.54NoVision Transformer with Deformable Attention2022-01-03Code
191iBOT (ViT-S/16)45.4NoiBOT: Image BERT Pre-Training with Online Tokeni...2021-11-15Code
192EANet (ResNet-101)45.33NoBeyond Self-attention: External Attention using ...2021-05-05Code
193OCR (ResNet-101)45.28NoSegmentation Transformer: Object-Contextual Repr...2019-09-24Code
194Asymmetric ALNN45.24NoAsymmetric Non-local Neural Networks for Semanti...2019-08-21Code
195Light-Ham (VAN-Small, D=256)45.2NoIs Attention Better Than Matrix Decomposition?2021-09-09Code
196LaU-regression-loss45.02NoLocation-aware Upsampling for Semantic Segmentat...2019-11-13Code
197PSPNet44.94NoPyramid Scene Parsing Network2016-12-04Code
198tiny-MOAT-2 (IN-1K pretraining, single scale)44.9NoMOAT: Alternating Mobile Convolution and Attenti...2022-10-04Code
199CFNet(ResNet-101)44.89No--Code
200EncNet44.65NoContext Encoding for Semantic Segmentation2018-03-23Code
201LaU-offset-loss44.55NoLocation-aware Upsampling for Semantic Segmentat...2019-11-13Code
202EncNet + JPU44.34NoFastFCN: Rethinking Dilated Convolution in the B...2019-03-28Code
203SGR (ResNet-101)44.32No--Code
204XCiT-S12/8 (Semantic-FPN)44.2NoXCiT: Cross-Covariance Image Transformers2021-06-17Code
205Auto-DeepLab-L43.98NoAuto-DeepLab: Hierarchical Neural Architecture S...2019-01-10Code
206PSANet (ResNet-101)43.77No--Code
207DSSPN (ResNet-101)43.68NoDynamic-structured Semantic Propagation Network2018-03-16-
208PSPNet (ResNet-152)43.51NoPyramid Scene Parsing Network2016-12-04Code
209PSPNet (ResNet-101)43.29NoPyramid Scene Parsing Network2016-12-04Code
210HRNetV243.2NoHigh-Resolution Representations for Labeling Pix...2019-04-09Code
211SeMask (SeMask Swin-T FPN)43.16NoSeMask: Semantically Masked Transformers for Sem...2021-12-23Code
212tiny-MOAT-1 (IN-1K pretraining, single scale)43.1NoMOAT: Alternating Mobile Convolution and Attenti...2022-10-04Code
213VAN-Small42.9NoVisual Attention Network2022-02-20Code
214PoolFormer-M4842.7NoMetaFormer Is Actually What You Need for Vision2021-11-22Code
215UperNet (ResNet-101)42.66NoUnified Perceptual Parsing for Scene Understanding2018-07-26Code
216tiny-MOAT-0 (IN-1K pretraining, single scale)41.2NoMOAT: Alternating Mobile Convolution and Attenti...2022-10-04Code
217RefineNet40.7NoRefineNet: Multi-Path Refinement Networks for Hi...2016-11-20Code
218FBNetV540.4NoFBNetV5: Neural Architecture Search for Multiple...2021-11-19-
219ConvMLP-L40NoConvMLP: Hierarchical Convolutional MLPs for Vis...2021-09-09Code
220ConvMLP-M38.6NoConvMLP: Hierarchical Convolutional MLPs for Vis...2021-09-09Code
221VAN-Tiny38.5NoVisual Attention Network2022-02-20Code
222A2MIM (ResNet-50)38.3NoArchitecture-Agnostic Masked Image Modeling -- F...2022-05-27Code
223iBOT (ViT-B/16) (linear head)38.3NoiBOT: Image BERT Pre-Training with Online Tokeni...2021-11-15Code
224SegFormer-B037.4YesSegFormer: Simple and Efficient Design for Seman...2021-05-31Code
225MUXNet-m + PPM35.8NoMUXConv: Information Multiplexing in Convolution...2020-03-31Code
226ConvMLP-S35.8NoConvMLP: Hierarchical Convolutional MLPs for Vis...2021-09-09Code
227MUXNet-m + C132.42NoMUXConv: Information Multiplexing in Convolution...2020-03-31Code
228DilatedNet32.31NoMulti-Scale Context Aggregation by Dilated Convo...2015-11-23Code
229FCN29.39YesFully Convolutional Networks for Semantic Segmen...2014-11-14Code
230SegNet21.64NoSegNet: A Deep Convolutional Encoder-Decoder Arc...2015-11-02Code