TasksSotADatasetsPapersMethodsSubmitAbout
Papers With Code 2

A community resource for machine learning research: papers, code, benchmarks, and state-of-the-art results.

Explore

Notable BenchmarksAll SotADatasetsPapersMethods

Community

Submit ResultsAbout

Data sourced from the PWC Archive (CC-BY-SA 4.0). Built by the community, for the community.

SotA/Audio/10-shot image generation/ADE20K

10-shot image generation on ADE20K

Metric: Params (M) (higher is better)

LeaderboardDataset
Loading chart...

Results

Submit a result
#Model↕Params (M)▼Extra DataPaperDate↕Code
1FD-SwinV2-G3000NoContrastive Learning Rivals Masked Image Modelin...2022-05-27Code
2RevCol-H (Mask2Former)2439YesReversible Column Networks2022-12-22Code
3BEiT-31900YesImage as a Foreign Language: BEiT Pretraining fo...2022-08-22Code
4ViT-P (InternImage-H)1610YesThe Missing Point in Vision Transformers for Uni...2025-05-26Code
5ONE-PEACE1500YesONE-PEACE: Exploring One General Representation ...2023-05-18Code
6ViT-P (OneFormer, InternImage-H)1400NoThe Missing Point in Vision Transformers for Uni...2025-05-26Code
7InternImage-H1310YesInternImage: Exploring Large-Scale Vision Founda...2022-11-10Code
8M3I Pre-training (InternImage-H)1310YesTowards All-in-one Pre-training via Maximizing M...2022-11-17Code
9InternImage-H (M3I Pre-training)1310NoInternImage: Exploring Large-Scale Vision Founda...2022-11-10Code
10DINOv2 (ViT-g/14 frozen model, w/ ViT-Adapter + Mask2former)1080NoDINOv2: Learning Robust Visual Features without ...2023-04-14Code
11EVA1074YesEVA: Exploring the Limits of Masked Visual Repre...2022-11-14Code
12ViT-Adapter-L (Mask2Former, BEiTv2 pretrain)571YesVision Transformer Adapter for Dense Predictions2022-05-17Code
13ViT-Adapter-L (Mask2Former, BEiT pretrain)571YesVision Transformer Adapter for Dense Predictions2022-05-17Code
14MOAT-4 (IN-22K pretraining, single-scale)496NoMOAT: Alternating Mobile Convolution and Attenti...2022-10-04Code
15ViT-Adapter-L (UperNet, BEiT pretrain)451NoVision Transformer Adapter for Dense Predictions2022-05-17Code
16ConvNeXt-XL++391NoA ConvNet for the 2020s2022-01-10Code
17InternImage-XL368NoInternImage: Exploring Large-Scale Vision Founda...2022-11-10Code
18RSSeg-ViT-L (BEiT pretrain)330NoRepresentation Separation for Semantic Segmentat...2022-12-28-
19EoMT (DINOv2-L, single-scale, 512x512)316NoYour ViT is Secretly an Image Segmentation Model2025-03-24Code
20ViT-P (OneFormer, DiNAT-L)309NoThe Missing Point in Vision Transformers for Uni...2025-05-26Code
21InternImage-L256NoInternImage: Exploring Large-Scale Vision Founda...2022-11-10Code
22ConvNeXt-L++235NoA ConvNet for the 2020s2022-01-10Code
23MasK DINO (SwinL, multi-scale)223YesMask DINO: Towards A Unified Transformer-based F...2022-06-06Code
24Sequential Ensemble (SegFormer)216.3NoSequential Ensembling for Semantic Segmentation2022-10-08-
25LV-ViT-L (UperNet, MS)209NoAll Tokens Matter: Token Labeling for Training B...2021-04-22Code
26DDP (Swin-L, step-3)207NoDDP: Diffusion Model for Dense Visual Prediction2023-03-30Code
27MOAT-3 (IN-22K pretraining, single-scale)198NoMOAT: Alternating Mobile Convolution and Attenti...2022-10-04Code
28InternImage-B128NoInternImage: Exploring Large-Scale Vision Founda...2022-11-10Code
29GC ViT-B125NoGlobal Context Vision Transformers2022-06-20Code
30NAT-Base123NoNeighborhood Attention Transformer2022-04-14Code
31ConvNeXt-B++122NoA ConvNet for the 2020s2022-01-10Code
32ConvNeXt-B122NoA ConvNet for the 2020s2022-01-10Code
33DAT-B (UperNet)121NoVision Transformer with Deformable Attention2022-01-03Code
34TransNeXt-Base (IN-1K pretrain, Mask2Former, 512)109NoTransNeXt: Robust Foveal Visual Perception for V...2023-11-28Code
35ActiveMLP-L(UperNet)108NoActive Token Mixer2022-03-11Code
36SeMask (SeMask Swin-B FPN)96NoSeMask: Semantically Masked Transformers for Sem...2021-12-23Code
37SegFormer-B584.7YesSegFormer: Simple and Efficient Design for Seman...2021-05-31Code
38GC ViT-S84NoGlobal Context Vision Transformers2022-06-20Code
39ConvNeXt-S82NoA ConvNet for the 2020s2022-01-10Code
40NAT-Small82NoNeighborhood Attention Transformer2022-04-14Code
41MOAT-2 (IN-22K pretraining, single-scale)81NoMOAT: Alternating Mobile Convolution and Attenti...2022-10-04Code
42DAT-S (UperNet)81NoVision Transformer with Deformable Attention2022-01-03Code
43InternImage-S80NoInternImage: Exploring Large-Scale Vision Founda...2022-11-10Code
44TransNeXt-Small (IN-1K pretrain, Mask2Former, 512)69NoTransNeXt: Robust Foveal Visual Perception for V...2023-11-28Code
45SegFormer-B464.1YesSegFormer: Simple and Efficient Design for Seman...2021-05-31Code
46Light-Ham (VAN-Huge)61.1NoIs Attention Better Than Matrix Decomposition?2021-09-09Code
47ConvNeXt-T60NoA ConvNet for the 2020s2022-01-10Code
48DAT-T (UperNet)60NoVision Transformer with Deformable Attention2022-01-03Code
49InternImage-T59NoInternImage: Exploring Large-Scale Vision Founda...2022-11-10Code
50NAT-Tiny58NoNeighborhood Attention Transformer2022-04-14Code
51GC ViT-T58NoGlobal Context Vision Transformers2022-06-20Code
52SeMask (SeMask Swin-S FPN)56NoSeMask: Semantically Masked Transformers for Sem...2021-12-23Code
53VAN-Large (HamNet)55NoVisual Attention Network2022-02-20Code
54NAT-Mini50NoNeighborhood Attention Transformer2022-04-14Code
55VAN-Large49NoVisual Attention Network2022-02-20Code
56TransNeXt-Tiny (IN-1K pretrain, Mask2Former, 512)47.5NoTransNeXt: Robust Foveal Visual Perception for V...2023-11-28Code
57Light-Ham (VAN-Large)45.6NoIs Attention Better Than Matrix Decomposition?2021-09-09Code
58SeMask (SeMask Swin-T FPN)35NoSeMask: Semantically Masked Transformers for Sem...2021-12-23Code
59HRViT-b3 (SegFormer, SS)28.7NoMulti-Scale High-Resolution Vision Transformer f...2021-11-01Code
60Light-Ham (VAN-Base)27.4NoIs Attention Better Than Matrix Decomposition?2021-09-09Code
61tiny-MOAT-3 (IN-1K pretraining, single scale)24NoMOAT: Alternating Mobile Convolution and Attenti...2022-10-04Code
62HRViT-b2 (SegFormer, SS)20.8NoMulti-Scale High-Resolution Vision Transformer f...2021-11-01Code
63VAN-Small18NoVisual Attention Network2022-02-20Code
64Light-Ham (VAN-Small, D=256)13.8NoIs Attention Better Than Matrix Decomposition?2021-09-09Code
65tiny-MOAT-2 (IN-1K pretraining, single scale)13NoMOAT: Alternating Mobile Convolution and Attenti...2022-10-04Code
66HRViT-b1 (SegFormer, SS)8.2NoMulti-Scale High-Resolution Vision Transformer f...2021-11-01Code
67tiny-MOAT-1 (IN-1K pretraining, single scale)8NoMOAT: Alternating Mobile Convolution and Attenti...2022-10-04Code
68VAN-Tiny8NoVisual Attention Network2022-02-20Code
69tiny-MOAT-0 (IN-1K pretraining, single scale)6NoMOAT: Alternating Mobile Convolution and Attenti...2022-10-04Code
70SegFormer-B03.8YesSegFormer: Simple and Efficient Design for Seman...2021-05-31Code