| 1 | RAT-Diffusion | 5 | Yes | Data Extrapolation for Text-to-image Generation ... | 2024-10-02 | Code |
| 2 | Re-Imagen (Finetuned) | 5.25 | No | Re-Imagen: Retrieval-Augmented Text-to-Image Gen... | 2022-09-29 | - |
| 3 | U-ViT-S/2-Deep | 5.48 | No | All are Worth Words: A ViT Backbone for Diffusio... | 2022-09-25 | Code |
| 4 | GLIGEN (fine-tuned, Detection + Caption data) | 5.61 | No | GLIGEN: Open-Set Grounded Text-to-Image Generation | 2023-01-17 | Code |
| 5 | GLIGEN (fine-tuned, Detection data only) | 5.82 | No | GLIGEN: Open-Set Grounded Text-to-Image Generation | 2023-01-17 | Code |
| 6 | U-ViT-S/2 | 5.95 | No | All are Worth Words: A ViT Backbone for Diffusio... | 2022-09-25 | Code |
| 7 | ConPreDiff | 6.21 | No | Improving Diffusion-Based Image Synthesis with C... | 2024-01-04 | - |
| 8 | TLDM | 6.29 | No | Truncated Diffusion Probabilistic Models and Dif... | 2022-02-19 | Code |
| 9 | GLIGEN (fine-tuned, Grounding data) | 6.38 | No | GLIGEN: Open-Set Grounded Text-to-Image Generation | 2023-01-17 | Code |
| 10 | RAPHAEL (zero-shot) | 6.61 | No | RAPHAEL: Text-to-Image Generation via Large Mixt... | 2023-05-29 | Code |
| 11 | ERNIE-ViLG 2.0 (zero-shot) | 6.75 | No | ERNIE-ViLG 2.0: Improving Text-to-Image Diffusio... | 2022-10-27 | Code |
| 12 | Re-Imagen | 6.88 | No | Re-Imagen: Retrieval-Augmented Text-to-Image Gen... | 2022-09-29 | - |
| 13 | eDiff-I (zero-shot) | 6.95 | No | eDiff-I: Text-to-Image Diffusion Models with an ... | 2022-11-02 | Code |
| 14 | Swinv2-Imagen | 7.21 | Yes | Swinv2-Imagen: Hierarchical Vision Transformer D... | 2022-10-18 | - |
| 15 | Imagen (zero-shot) | 7.27 | Yes | Photorealistic Text-to-Image Diffusion Models wi... | 2022-05-23 | Code |
| 16 | GigaGAN (Zero-shot, 64x64) | 7.28 | No | Scaling up GANs for Text-to-Image Synthesis | 2023-03-09 | Code |
| 17 | StyleGAN-T (Zero-shot, 64x64) | 7.3 | No | StyleGAN-T: Unlocking the Power of GANs for Fast... | 2023-01-23 | Code |
| 18 | Make-a-Scene (unfiltered) | 7.55 | Yes | Make-A-Scene: Scene-Based Text-to-Image Generati... | 2022-03-24 | Code |
| 19 | Kandinsky | 8.03 | No | Kandinsky: an Improved Text-to-Image Synthesis w... | 2023-10-05 | Code |
| 20 | Lafite | 8.12 | No | LAFITE: Towards Language-Free Training for Text-... | 2021-11-27 | Code |
| 21 | SiD-LSG (Data-free distillation, zero-shot FID) | 8.15 | No | Long and Short Guidance in Score identity Distil... | 2024-06-03 | Code |
| 22 | simple diffusion (U-ViT) | 8.3 | No | Simple diffusion: End-to-end diffusion for high ... | 2023-01-26 | Code |
| 23 | GigaGAN (Zero-shot, 256x256) | 9.09 | No | Scaling up GANs for Text-to-Image Synthesis | 2023-03-09 | Code |
| 24 | XMC-GAN (256 x 256) | 9.3 | No | NÜWA: Visual Synthesis Pre-training for Neural v... | 2021-11-24 | Code |
| 25 | XMC-GAN | 9.33 | Yes | Cross-Modal Contrastive Learning for Text-to-Ima... | 2021-01-12 | Code |
| 26 | DALL-E 2 | 10.39 | Yes | Hierarchical Text-Conditional Image Generation w... | 2022-04-13 | Code |
| 27 | Corgi-Semi | 10.6 | No | Shifted Diffusion for Text-to-image Generation | 2022-11-24 | Code |
| 28 | Corgi | 10.88 | No | Shifted Diffusion for Text-to-image Generation | 2022-11-24 | Code |
| 29 | TR0N (StyleGAN-XL, LAION2BCLIP, BLIP-2, zero-shot) | 10.9 | No | TR0N: Translator Networks for 0-Shot Plug-and-Pl... | 2023-04-26 | Code |
| 30 | Make-a-Scene (unfiltered) | 11.84 | Yes | Make-A-Scene: Scene-Based Text-to-Image Generati... | 2022-03-24 | Code |
| 31 | GLIDE (zero-shot) | 12.24 | Yes | GLIDE: Towards Photorealistic Image Generation a... | 2021-12-20 | Code |
| 32 | KNN-Diffusion | 12.5 | No | KNN-Diffusion: Image Generation via Large-Scale ... | 2022-04-06 | - |
| 33 | GALIP (CC12m) | 12.54 | No | GALIP: Generative Adversarial CLIPs for Text-to-... | 2023-01-30 | Code |
| 34 | Latent Diffusion (LDM-KL-8-G) | 12.63 | Yes | High-Resolution Image Synthesis with Latent Diff... | 2021-12-20 | Code |
| 35 | Stable Diffusion | 12.63 | No | Retrieval-Augmented Multimodal Language Modeling | 2022-11-22 | - |
| 36 | NÜWA (256 x 256) | 12.9 | No | NÜWA: Visual Synthesis Pre-training for Neural v... | 2021-11-24 | Code |
| 37 | VQ-Diffusion-F | 13.86 | Yes | Vector Quantized Diffusion Model for Text-to-Ima... | 2021-11-29 | Code |
| 38 | StyleGAN-T (Zero-shot, 256x256) | 13.9 | No | StyleGAN-T: Unlocking the Power of GANs for Fast... | 2023-01-23 | Code |
| 39 | RAT-GAN | 14.6 | No | Recurrent Affine Transformation for Text-to-imag... | 2022-04-22 | Code |
| 40 | ERNIE-ViLG | 14.7 | No | ERNIE-ViLG: Unified Generative Pre-training for ... | 2021-12-31 | Code |
| 41 | RA-CM3 (2.7B) | 15.7 | No | Retrieval-Augmented Multimodal Language Modeling | 2022-11-22 | - |
| 42 | CogView2(6B, Finetuned) | 17.7 | No | CogView2: Faster and Better Text-to-Image Genera... | 2022-04-28 | Code |
| 43 | VQ-Diffusion-B | 19.75 | Yes | Vector Quantized Diffusion Model for Text-to-Ima... | 2021-11-29 | Code |
| 44 | DM-GAN+CL | 20.79 | No | Improving Text-to-Image Synthesis Using Contrast... | 2021-07-06 | Code |
| 45 | FuseDream (few-shot, k=5) | 21.16 | No | FuseDream: Training-Free Text-to-Image Generatio... | 2021-12-02 | Code |
| 46 | FuseDream (k=5, 256) | 21.16 | No | FuseDream: Training-Free Text-to-Image Generatio... | 2021-12-02 | Code |
| 47 | FuseDream (k=10, 256) | 21.89 | No | FuseDream: Training-Free Text-to-Image Generatio... | 2021-12-02 | Code |
| 48 | AttnGAN+CL | 23.93 | No | Improving Text-to-Image Synthesis Using Contrast... | 2021-07-06 | Code |
| 49 | CogView2(6B, Finetuned) | 24 | No | CogView2: Faster and Better Text-to-Image Genera... | 2022-04-28 | Code |
| 50 | OP-GAN | 24.7 | No | Semantic Object Accuracy for Generative Text-to-... | 2019-10-29 | Code |
| 51 | DM-GAN (256 x 256) | 26 | No | NÜWA: Visual Synthesis Pre-training for Neural v... | 2021-11-24 | Code |
| 52 | Lafite (zero-shot) | 26.94 | No | LAFITE: Towards Language-Free Training for Text-... | 2021-11-27 | Code |
| 53 | CogView | 27.1 | Yes | CogView: Mastering Text-to-Image Generation via ... | 2021-05-26 | Code |
| 54 | CogView (256 x 256) | 27.1 | No | NÜWA: Visual Synthesis Pre-training for Neural v... | 2021-11-24 | Code |
| 55 | DALL-E (256 x 256) | 27.5 | No | NÜWA: Visual Synthesis Pre-training for Neural v... | 2021-11-24 | Code |
| 56 | DALL-E (12B) | 28 | No | Retrieval-Augmented Multimodal Language Modeling | 2022-11-22 | - |
| 57 | AttnGAN + VICTR | 29.26 | No | VICTR: Visual Information Captured Text Represen... | 2020-10-07 | Code |
| 58 | Vanilla CM3 | 29.5 | No | Retrieval-Augmented Multimodal Language Modeling | 2022-11-22 | - |
| 59 | DM-GAN + VICTR | 32.37 | No | VICTR: Visual Information Captured Text Represen... | 2020-10-07 | Code |
| 60 | DM-GAN | 32.64 | No | DM-GAN: Dynamic Memory Generative Adversarial Ne... | 2019-04-02 | Code |
| 61 | AttnGAN + OP | 33.35 | No | Generating Multiple Objects at Spatially Distinc... | 2019-01-03 | Code |
| 62 | AttnGAN (256 x 256) | 35.2 | No | NÜWA: Visual Synthesis Pre-training for Neural v... | 2021-11-24 | Code |
| 63 | L-Verse-CC | 37.2 | No | L-Verse: Bidirectional Generation Between Image ... | 2021-11-22 | Code |
| 64 | L-Verse | 45.8 | No | L-Verse: Bidirectional Generation Between Image ... | 2021-11-22 | Code |
| 65 | StackGAN + OP | 55.3 | No | Generating Multiple Objects at Spatially Distinc... | 2019-01-03 | Code |
| 66 | StackGAN-v1 | 74.05 | No | StackGAN++: Realistic Image Synthesis with Stack... | 2017-10-19 | Code |