TasksSotADatasetsPapersMethodsSubmitAbout
Papers With Code 2

A community resource for machine learning research: papers, code, benchmarks, and state-of-the-art results.

Explore

Notable BenchmarksAll SotADatasetsPapersMethods

Community

Submit ResultsAbout

Data sourced from the PWC Archive (CC-BY-SA 4.0). Built by the community, for the community.

SotA/Audio/1 Image, 2*2 Stitchi/COCO (Common Objects in Context)

1 Image, 2*2 Stitchi on COCO (Common Objects in Context)

Metric: FID (lower is better)

LeaderboardDataset
Loading chart...

Results

Submit a result
#Model↕FID▲Extra DataPaperDate↕Code
1RAT-Diffusion5YesData Extrapolation for Text-to-image Generation ...2024-10-02Code
2Re-Imagen (Finetuned)5.25NoRe-Imagen: Retrieval-Augmented Text-to-Image Gen...2022-09-29-
3U-ViT-S/2-Deep5.48NoAll are Worth Words: A ViT Backbone for Diffusio...2022-09-25Code
4GLIGEN (fine-tuned, Detection + Caption data)5.61NoGLIGEN: Open-Set Grounded Text-to-Image Generation2023-01-17Code
5GLIGEN (fine-tuned, Detection data only)5.82NoGLIGEN: Open-Set Grounded Text-to-Image Generation2023-01-17Code
6U-ViT-S/25.95NoAll are Worth Words: A ViT Backbone for Diffusio...2022-09-25Code
7ConPreDiff6.21NoImproving Diffusion-Based Image Synthesis with C...2024-01-04-
8TLDM6.29NoTruncated Diffusion Probabilistic Models and Dif...2022-02-19Code
9GLIGEN (fine-tuned, Grounding data)6.38NoGLIGEN: Open-Set Grounded Text-to-Image Generation2023-01-17Code
10RAPHAEL (zero-shot)6.61NoRAPHAEL: Text-to-Image Generation via Large Mixt...2023-05-29Code
11ERNIE-ViLG 2.0 (zero-shot)6.75NoERNIE-ViLG 2.0: Improving Text-to-Image Diffusio...2022-10-27Code
12Re-Imagen6.88NoRe-Imagen: Retrieval-Augmented Text-to-Image Gen...2022-09-29-
13eDiff-I (zero-shot)6.95NoeDiff-I: Text-to-Image Diffusion Models with an ...2022-11-02Code
14Swinv2-Imagen7.21YesSwinv2-Imagen: Hierarchical Vision Transformer D...2022-10-18-
15Imagen (zero-shot)7.27YesPhotorealistic Text-to-Image Diffusion Models wi...2022-05-23Code
16GigaGAN (Zero-shot, 64x64)7.28NoScaling up GANs for Text-to-Image Synthesis2023-03-09Code
17StyleGAN-T (Zero-shot, 64x64)7.3NoStyleGAN-T: Unlocking the Power of GANs for Fast...2023-01-23Code
18Make-a-Scene (unfiltered)7.55YesMake-A-Scene: Scene-Based Text-to-Image Generati...2022-03-24Code
19Kandinsky8.03NoKandinsky: an Improved Text-to-Image Synthesis w...2023-10-05Code
20Lafite8.12NoLAFITE: Towards Language-Free Training for Text-...2021-11-27Code
21SiD-LSG (Data-free distillation, zero-shot FID)8.15NoLong and Short Guidance in Score identity Distil...2024-06-03Code
22simple diffusion (U-ViT)8.3NoSimple diffusion: End-to-end diffusion for high ...2023-01-26Code
23GigaGAN (Zero-shot, 256x256)9.09NoScaling up GANs for Text-to-Image Synthesis2023-03-09Code
24XMC-GAN (256 x 256)9.3NoNÜWA: Visual Synthesis Pre-training for Neural v...2021-11-24Code
25XMC-GAN9.33YesCross-Modal Contrastive Learning for Text-to-Ima...2021-01-12Code
26DALL-E 210.39YesHierarchical Text-Conditional Image Generation w...2022-04-13Code
27Corgi-Semi10.6NoShifted Diffusion for Text-to-image Generation2022-11-24Code
28Corgi10.88NoShifted Diffusion for Text-to-image Generation2022-11-24Code
29TR0N (StyleGAN-XL, LAION2BCLIP, BLIP-2, zero-shot)10.9NoTR0N: Translator Networks for 0-Shot Plug-and-Pl...2023-04-26Code
30Make-a-Scene (unfiltered)11.84YesMake-A-Scene: Scene-Based Text-to-Image Generati...2022-03-24Code
31GLIDE (zero-shot)12.24YesGLIDE: Towards Photorealistic Image Generation a...2021-12-20Code
32KNN-Diffusion12.5NoKNN-Diffusion: Image Generation via Large-Scale ...2022-04-06-
33GALIP (CC12m)12.54NoGALIP: Generative Adversarial CLIPs for Text-to-...2023-01-30Code
34Latent Diffusion (LDM-KL-8-G)12.63YesHigh-Resolution Image Synthesis with Latent Diff...2021-12-20Code
35Stable Diffusion12.63NoRetrieval-Augmented Multimodal Language Modeling2022-11-22-
36NÜWA (256 x 256)12.9NoNÜWA: Visual Synthesis Pre-training for Neural v...2021-11-24Code
37VQ-Diffusion-F13.86YesVector Quantized Diffusion Model for Text-to-Ima...2021-11-29Code
38StyleGAN-T (Zero-shot, 256x256)13.9NoStyleGAN-T: Unlocking the Power of GANs for Fast...2023-01-23Code
39RAT-GAN14.6NoRecurrent Affine Transformation for Text-to-imag...2022-04-22Code
40ERNIE-ViLG14.7NoERNIE-ViLG: Unified Generative Pre-training for ...2021-12-31Code
41RA-CM3 (2.7B)15.7NoRetrieval-Augmented Multimodal Language Modeling2022-11-22-
42CogView2(6B, Finetuned)17.7NoCogView2: Faster and Better Text-to-Image Genera...2022-04-28Code
43VQ-Diffusion-B19.75YesVector Quantized Diffusion Model for Text-to-Ima...2021-11-29Code
44DM-GAN+CL20.79NoImproving Text-to-Image Synthesis Using Contrast...2021-07-06Code
45FuseDream (few-shot, k=5)21.16NoFuseDream: Training-Free Text-to-Image Generatio...2021-12-02Code
46FuseDream (k=5, 256)21.16NoFuseDream: Training-Free Text-to-Image Generatio...2021-12-02Code
47FuseDream (k=10, 256)21.89NoFuseDream: Training-Free Text-to-Image Generatio...2021-12-02Code
48AttnGAN+CL23.93NoImproving Text-to-Image Synthesis Using Contrast...2021-07-06Code
49CogView2(6B, Finetuned)24NoCogView2: Faster and Better Text-to-Image Genera...2022-04-28Code
50OP-GAN24.7NoSemantic Object Accuracy for Generative Text-to-...2019-10-29Code
51DM-GAN (256 x 256)26NoNÜWA: Visual Synthesis Pre-training for Neural v...2021-11-24Code
52Lafite (zero-shot)26.94NoLAFITE: Towards Language-Free Training for Text-...2021-11-27Code
53CogView27.1YesCogView: Mastering Text-to-Image Generation via ...2021-05-26Code
54CogView (256 x 256)27.1NoNÜWA: Visual Synthesis Pre-training for Neural v...2021-11-24Code
55DALL-E (256 x 256)27.5NoNÜWA: Visual Synthesis Pre-training for Neural v...2021-11-24Code
56DALL-E (12B)28NoRetrieval-Augmented Multimodal Language Modeling2022-11-22-
57AttnGAN + VICTR29.26NoVICTR: Visual Information Captured Text Represen...2020-10-07Code
58Vanilla CM329.5NoRetrieval-Augmented Multimodal Language Modeling2022-11-22-
59DM-GAN + VICTR32.37NoVICTR: Visual Information Captured Text Represen...2020-10-07Code
60DM-GAN32.64NoDM-GAN: Dynamic Memory Generative Adversarial Ne...2019-04-02Code
61AttnGAN + OP33.35NoGenerating Multiple Objects at Spatially Distinc...2019-01-03Code
62AttnGAN (256 x 256)35.2NoNÜWA: Visual Synthesis Pre-training for Neural v...2021-11-24Code
63L-Verse-CC37.2NoL-Verse: Bidirectional Generation Between Image ...2021-11-22Code
64L-Verse45.8NoL-Verse: Bidirectional Generation Between Image ...2021-11-22Code
65StackGAN + OP55.3NoGenerating Multiple Objects at Spatially Distinc...2019-01-03Code
66StackGAN-v174.05NoStackGAN++: Realistic Image Synthesis with Stack...2017-10-19Code