1 Image, 2*2 Stitchi on COCO (Common Objects in Context)

Metric: FID (lower is better)

LeaderboardDataset

Loading chart...

Results

Hide extra data

Sort:

#	Model↕	FID▲	Extra Data	Paper	Date↕	Code
1	RAT-Diffusion	5	Yes	Data Extrapolation for Text-to-image Generation ...	2024-10-02	Code
2	Re-Imagen (Finetuned)	5.25	No	Re-Imagen: Retrieval-Augmented Text-to-Image Gen...	2022-09-29	-
3	U-ViT-S/2-Deep	5.48	No	All are Worth Words: A ViT Backbone for Diffusio...	2022-09-25	Code
4	GLIGEN (fine-tuned, Detection + Caption data)	5.61	No	GLIGEN: Open-Set Grounded Text-to-Image Generation	2023-01-17	Code
5	GLIGEN (fine-tuned, Detection data only)	5.82	No	GLIGEN: Open-Set Grounded Text-to-Image Generation	2023-01-17	Code
6	U-ViT-S/2	5.95	No	All are Worth Words: A ViT Backbone for Diffusio...	2022-09-25	Code
7	ConPreDiff	6.21	No	Improving Diffusion-Based Image Synthesis with C...	2024-01-04	-
8	TLDM	6.29	No	Truncated Diffusion Probabilistic Models and Dif...	2022-02-19	Code
9	GLIGEN (fine-tuned, Grounding data)	6.38	No	GLIGEN: Open-Set Grounded Text-to-Image Generation	2023-01-17	Code
10	RAPHAEL (zero-shot)	6.61	No	RAPHAEL: Text-to-Image Generation via Large Mixt...	2023-05-29	Code
11	ERNIE-ViLG 2.0 (zero-shot)	6.75	No	ERNIE-ViLG 2.0: Improving Text-to-Image Diffusio...	2022-10-27	Code
12	Re-Imagen	6.88	No	Re-Imagen: Retrieval-Augmented Text-to-Image Gen...	2022-09-29	-
13	eDiff-I (zero-shot)	6.95	No	eDiff-I: Text-to-Image Diffusion Models with an ...	2022-11-02	Code
14	Swinv2-Imagen	7.21	Yes	Swinv2-Imagen: Hierarchical Vision Transformer D...	2022-10-18	-
15	Imagen (zero-shot)	7.27	Yes	Photorealistic Text-to-Image Diffusion Models wi...	2022-05-23	Code
16	GigaGAN (Zero-shot, 64x64)	7.28	No	Scaling up GANs for Text-to-Image Synthesis	2023-03-09	Code
17	StyleGAN-T (Zero-shot, 64x64)	7.3	No	StyleGAN-T: Unlocking the Power of GANs for Fast...	2023-01-23	Code
18	Make-a-Scene (unfiltered)	7.55	Yes	Make-A-Scene: Scene-Based Text-to-Image Generati...	2022-03-24	Code
19	Kandinsky	8.03	No	Kandinsky: an Improved Text-to-Image Synthesis w...	2023-10-05	Code
20	Lafite	8.12	No	LAFITE: Towards Language-Free Training for Text-...	2021-11-27	Code
21	SiD-LSG (Data-free distillation, zero-shot FID)	8.15	No	Long and Short Guidance in Score identity Distil...	2024-06-03	Code
22	simple diffusion (U-ViT)	8.3	No	Simple diffusion: End-to-end diffusion for high ...	2023-01-26	Code
23	GigaGAN (Zero-shot, 256x256)	9.09	No	Scaling up GANs for Text-to-Image Synthesis	2023-03-09	Code
24	XMC-GAN (256 x 256)	9.3	No	NÜWA: Visual Synthesis Pre-training for Neural v...	2021-11-24	Code
25	XMC-GAN	9.33	Yes	Cross-Modal Contrastive Learning for Text-to-Ima...	2021-01-12	Code
26	DALL-E 2	10.39	Yes	Hierarchical Text-Conditional Image Generation w...	2022-04-13	Code
27	Corgi-Semi	10.6	No	Shifted Diffusion for Text-to-image Generation	2022-11-24	Code
28	Corgi	10.88	No	Shifted Diffusion for Text-to-image Generation	2022-11-24	Code
29	TR0N (StyleGAN-XL, LAION2BCLIP, BLIP-2, zero-shot)	10.9	No	TR0N: Translator Networks for 0-Shot Plug-and-Pl...	2023-04-26	Code
30	Make-a-Scene (unfiltered)	11.84	Yes	Make-A-Scene: Scene-Based Text-to-Image Generati...	2022-03-24	Code
31	GLIDE (zero-shot)	12.24	Yes	GLIDE: Towards Photorealistic Image Generation a...	2021-12-20	Code
32	KNN-Diffusion	12.5	No	KNN-Diffusion: Image Generation via Large-Scale ...	2022-04-06	-
33	GALIP (CC12m)	12.54	No	GALIP: Generative Adversarial CLIPs for Text-to-...	2023-01-30	Code
34	Latent Diffusion (LDM-KL-8-G)	12.63	Yes	High-Resolution Image Synthesis with Latent Diff...	2021-12-20	Code
35	Stable Diffusion	12.63	No	Retrieval-Augmented Multimodal Language Modeling	2022-11-22	-
36	NÜWA (256 x 256)	12.9	No	NÜWA: Visual Synthesis Pre-training for Neural v...	2021-11-24	Code
37	VQ-Diffusion-F	13.86	Yes	Vector Quantized Diffusion Model for Text-to-Ima...	2021-11-29	Code
38	StyleGAN-T (Zero-shot, 256x256)	13.9	No	StyleGAN-T: Unlocking the Power of GANs for Fast...	2023-01-23	Code
39	RAT-GAN	14.6	No	Recurrent Affine Transformation for Text-to-imag...	2022-04-22	Code
40	ERNIE-ViLG	14.7	No	ERNIE-ViLG: Unified Generative Pre-training for ...	2021-12-31	Code
41	RA-CM3 (2.7B)	15.7	No	Retrieval-Augmented Multimodal Language Modeling	2022-11-22	-
42	CogView2(6B, Finetuned)	17.7	No	CogView2: Faster and Better Text-to-Image Genera...	2022-04-28	Code
43	VQ-Diffusion-B	19.75	Yes	Vector Quantized Diffusion Model for Text-to-Ima...	2021-11-29	Code
44	DM-GAN+CL	20.79	No	Improving Text-to-Image Synthesis Using Contrast...	2021-07-06	Code
45	FuseDream (few-shot, k=5)	21.16	No	FuseDream: Training-Free Text-to-Image Generatio...	2021-12-02	Code
46	FuseDream (k=5, 256)	21.16	No	FuseDream: Training-Free Text-to-Image Generatio...	2021-12-02	Code
47	FuseDream (k=10, 256)	21.89	No	FuseDream: Training-Free Text-to-Image Generatio...	2021-12-02	Code
48	AttnGAN+CL	23.93	No	Improving Text-to-Image Synthesis Using Contrast...	2021-07-06	Code
49	CogView2(6B, Finetuned)	24	No	CogView2: Faster and Better Text-to-Image Genera...	2022-04-28	Code
50	OP-GAN	24.7	No	Semantic Object Accuracy for Generative Text-to-...	2019-10-29	Code
51	DM-GAN (256 x 256)	26	No	NÜWA: Visual Synthesis Pre-training for Neural v...	2021-11-24	Code
52	Lafite (zero-shot)	26.94	No	LAFITE: Towards Language-Free Training for Text-...	2021-11-27	Code
53	CogView	27.1	Yes	CogView: Mastering Text-to-Image Generation via ...	2021-05-26	Code
54	CogView (256 x 256)	27.1	No	NÜWA: Visual Synthesis Pre-training for Neural v...	2021-11-24	Code
55	DALL-E (256 x 256)	27.5	No	NÜWA: Visual Synthesis Pre-training for Neural v...	2021-11-24	Code
56	DALL-E (12B)	28	No	Retrieval-Augmented Multimodal Language Modeling	2022-11-22	-
57	AttnGAN + VICTR	29.26	No	VICTR: Visual Information Captured Text Represen...	2020-10-07	Code
58	Vanilla CM3	29.5	No	Retrieval-Augmented Multimodal Language Modeling	2022-11-22	-
59	DM-GAN + VICTR	32.37	No	VICTR: Visual Information Captured Text Represen...	2020-10-07	Code
60	DM-GAN	32.64	No	DM-GAN: Dynamic Memory Generative Adversarial Ne...	2019-04-02	Code
61	AttnGAN + OP	33.35	No	Generating Multiple Objects at Spatially Distinc...	2019-01-03	Code
62	AttnGAN (256 x 256)	35.2	No	NÜWA: Visual Synthesis Pre-training for Neural v...	2021-11-24	Code
63	L-Verse-CC	37.2	No	L-Verse: Bidirectional Generation Between Image ...	2021-11-22	Code
64	L-Verse	45.8	No	L-Verse: Bidirectional Generation Between Image ...	2021-11-22	Code
65	StackGAN + OP	55.3	No	Generating Multiple Objects at Spatially Distinc...	2019-01-03	Code
66	StackGAN-v1	74.05	No	StackGAN++: Realistic Image Synthesis with Stack...	2017-10-19	Code

#1RAT-DiffusionSOTA
5
FID· Extra Data· 2024-10-02
Data Extrapolation for Text-to-image Generation on Small Datasets Code
#2Re-Imagen (Finetuned)SOTA
5.25
FID· 2022-09-29
Re-Imagen: Retrieval-Augmented Text-to-Image Generator
#3U-ViT-S/2-DeepSOTA
5.48
FID· 2022-09-25
All are Worth Words: A ViT Backbone for Diffusion Models Code
#4GLIGEN (fine-tuned, Detection + Caption data)
5.61
FID· 2023-01-17
GLIGEN: Open-Set Grounded Text-to-Image Generation Code
#5GLIGEN (fine-tuned, Detection data only)
5.82
FID· 2023-01-17
GLIGEN: Open-Set Grounded Text-to-Image Generation Code
#6U-ViT-S/2SOTA
5.95
FID· 2022-09-25
All are Worth Words: A ViT Backbone for Diffusion Models Code
#7ConPreDiff
6.21
FID· 2024-01-04
Improving Diffusion-Based Image Synthesis with Context Prediction
#8TLDMSOTA
6.29
FID· 2022-02-19
Truncated Diffusion Probabilistic Models and Diffusion-based Adversarial Auto-Encoders Code
#9GLIGEN (fine-tuned, Grounding data)
6.38
FID· 2023-01-17
GLIGEN: Open-Set Grounded Text-to-Image Generation Code
#10RAPHAEL (zero-shot)
6.61
FID· 2023-05-29
RAPHAEL: Text-to-Image Generation via Large Mixture of Diffusion Paths Code
#11ERNIE-ViLG 2.0 (zero-shot)
6.75
FID· 2022-10-27
ERNIE-ViLG 2.0: Improving Text-to-Image Diffusion Model with Knowledge-Enhanced Mixture-of-Denoising-Experts Code
#12Re-Imagen
6.88
FID· 2022-09-29
Re-Imagen: Retrieval-Augmented Text-to-Image Generator
#13eDiff-I (zero-shot)
6.95
FID· 2022-11-02
eDiff-I: Text-to-Image Diffusion Models with an Ensemble of Expert Denoisers Code
#14Swinv2-Imagen
7.21
FID· Extra Data· 2022-10-18
Swinv2-Imagen: Hierarchical Vision Transformer Diffusion Models for Text-to-Image Generation
#15Imagen (zero-shot)
7.27
FID· Extra Data· 2022-05-23
Photorealistic Text-to-Image Diffusion Models with Deep Language Understanding Code
#16GigaGAN (Zero-shot, 64x64)
7.28
FID· 2023-03-09
Scaling up GANs for Text-to-Image Synthesis Code
#17StyleGAN-T (Zero-shot, 64x64)
7.3
FID· 2023-01-23
StyleGAN-T: Unlocking the Power of GANs for Fast Large-Scale Text-to-Image Synthesis Code
#18Make-a-Scene (unfiltered)
7.55
FID· Extra Data· 2022-03-24
Make-A-Scene: Scene-Based Text-to-Image Generation with Human Priors Code
#19Kandinsky
8.03
FID· 2023-10-05
Kandinsky: an Improved Text-to-Image Synthesis with Image Prior and Latent Diffusion Code
#20LafiteSOTA
8.12
FID· 2021-11-27
LAFITE: Towards Language-Free Training for Text-to-Image Generation Code
#21SiD-LSG (Data-free distillation, zero-shot FID)
8.15
FID· 2024-06-03
Long and Short Guidance in Score identity Distillation for One-Step Text-to-Image Generation Code
#22simple diffusion (U-ViT)
8.3
FID· 2023-01-26
Simple diffusion: End-to-end diffusion for high resolution images Code
#23GigaGAN (Zero-shot, 256x256)
9.09
FID· 2023-03-09
Scaling up GANs for Text-to-Image Synthesis Code
#24XMC-GAN (256 x 256)SOTA
9.3
FID· 2021-11-24
NÜWA: Visual Synthesis Pre-training for Neural visUal World creAtion Code
#25XMC-GANSOTA
9.33
FID· Extra Data· 2021-01-12
Cross-Modal Contrastive Learning for Text-to-Image Generation Code
#26DALL-E 2
10.39
FID· Extra Data· 2022-04-13
Hierarchical Text-Conditional Image Generation with CLIP Latents Code
#27Corgi-Semi
10.6
FID· 2022-11-24
Shifted Diffusion for Text-to-image Generation Code
#28Corgi
10.88
FID· 2022-11-24
Shifted Diffusion for Text-to-image Generation Code
#29TR0N (StyleGAN-XL, LAION2BCLIP, BLIP-2, zero-shot)
10.9
FID· 2023-04-26
TR0N: Translator Networks for 0-Shot Plug-and-Play Conditional Generation Code
#30Make-a-Scene (unfiltered)
11.84
FID· Extra Data· 2022-03-24
Make-A-Scene: Scene-Based Text-to-Image Generation with Human Priors Code
#31GLIDE (zero-shot)
12.24
FID· Extra Data· 2021-12-20
GLIDE: Towards Photorealistic Image Generation and Editing with Text-Guided Diffusion Models Code
#32KNN-Diffusion
12.5
FID· 2022-04-06
KNN-Diffusion: Image Generation via Large-Scale Retrieval
#33GALIP (CC12m)
12.54
FID· 2023-01-30
GALIP: Generative Adversarial CLIPs for Text-to-Image Synthesis Code
#34Latent Diffusion (LDM-KL-8-G)
12.63
FID· Extra Data· 2021-12-20
High-Resolution Image Synthesis with Latent Diffusion Models Code
#35Stable Diffusion
12.63
FID· 2022-11-22
Retrieval-Augmented Multimodal Language Modeling
#36NÜWA (256 x 256)
12.9
FID· 2021-11-24
NÜWA: Visual Synthesis Pre-training for Neural visUal World creAtion Code
#37VQ-Diffusion-F
13.86
FID· Extra Data· 2021-11-29
Vector Quantized Diffusion Model for Text-to-Image Synthesis Code
#38StyleGAN-T (Zero-shot, 256x256)
13.9
FID· 2023-01-23
StyleGAN-T: Unlocking the Power of GANs for Fast Large-Scale Text-to-Image Synthesis Code
#39RAT-GAN
14.6
FID· 2022-04-22
Recurrent Affine Transformation for Text-to-image Synthesis Code
#40ERNIE-ViLG
14.7
FID· 2021-12-31
ERNIE-ViLG: Unified Generative Pre-training for Bidirectional Vision-Language Generation Code
#41RA-CM3 (2.7B)
15.7
FID· 2022-11-22
Retrieval-Augmented Multimodal Language Modeling
#42CogView2(6B, Finetuned)
17.7
FID· 2022-04-28
CogView2: Faster and Better Text-to-Image Generation via Hierarchical Transformers Code
#43VQ-Diffusion-B
19.75
FID· Extra Data· 2021-11-29
Vector Quantized Diffusion Model for Text-to-Image Synthesis Code
#44DM-GAN+CL
20.79
FID· 2021-07-06
Improving Text-to-Image Synthesis Using Contrastive Learning Code
#45FuseDream (few-shot, k=5)
21.16
FID· 2021-12-02
FuseDream: Training-Free Text-to-Image Generation with Improved CLIP+GAN Space Optimization Code
#46FuseDream (k=5, 256)
21.16
FID· 2021-12-02
FuseDream: Training-Free Text-to-Image Generation with Improved CLIP+GAN Space Optimization Code
#47FuseDream (k=10, 256)
21.89
FID· 2021-12-02
FuseDream: Training-Free Text-to-Image Generation with Improved CLIP+GAN Space Optimization Code
#48AttnGAN+CL
23.93
FID· 2021-07-06
Improving Text-to-Image Synthesis Using Contrastive Learning Code
#49CogView2(6B, Finetuned)
24
FID· 2022-04-28
CogView2: Faster and Better Text-to-Image Generation via Hierarchical Transformers Code
#50OP-GANSOTA
24.7
FID· 2019-10-29
Semantic Object Accuracy for Generative Text-to-Image Synthesis Code
#51DM-GAN (256 x 256)
26
FID· 2021-11-24
NÜWA: Visual Synthesis Pre-training for Neural visUal World creAtion Code
#52Lafite (zero-shot)
26.94
FID· 2021-11-27
LAFITE: Towards Language-Free Training for Text-to-Image Generation Code
#53CogView
27.1
FID· Extra Data· 2021-05-26
CogView: Mastering Text-to-Image Generation via Transformers Code
#54CogView (256 x 256)
27.1
FID· 2021-11-24
NÜWA: Visual Synthesis Pre-training for Neural visUal World creAtion Code
#55DALL-E (256 x 256)
27.5
FID· 2021-11-24
NÜWA: Visual Synthesis Pre-training for Neural visUal World creAtion Code
#56DALL-E (12B)
28
FID· 2022-11-22
Retrieval-Augmented Multimodal Language Modeling
#57AttnGAN + VICTR
29.26
FID· 2020-10-07
VICTR: Visual Information Captured Text Representation for Text-to-Image Multimodal Tasks Code
#58Vanilla CM3
29.5
FID· 2022-11-22
Retrieval-Augmented Multimodal Language Modeling
#59DM-GAN + VICTR
32.37
FID· 2020-10-07
VICTR: Visual Information Captured Text Representation for Text-to-Image Multimodal Tasks Code
#60DM-GANSOTA
32.64
FID· 2019-04-02
DM-GAN: Dynamic Memory Generative Adversarial Networks for Text-to-Image Synthesis Code
#61AttnGAN + OPSOTA
33.35
FID· 2019-01-03
Generating Multiple Objects at Spatially Distinct Locations Code
#62AttnGAN (256 x 256)
35.2
FID· 2021-11-24
NÜWA: Visual Synthesis Pre-training for Neural visUal World creAtion Code
#63L-Verse-CC
37.2
FID· 2021-11-22
L-Verse: Bidirectional Generation Between Image and Text Code
#64L-Verse
45.8
FID· 2021-11-22
L-Verse: Bidirectional Generation Between Image and Text Code
#65StackGAN + OPSOTA
55.3
FID· 2019-01-03
Generating Multiple Objects at Spatially Distinct Locations Code
#66StackGAN-v1SOTA
74.05
FID· 2017-10-19
StackGAN++: Realistic Image Synthesis with Stacked Generative Adversarial Networks Code