Tasks
SotA
Datasets
Papers
Methods
Submit
About
SotA
/
Audio
/
10-shot image generation
/
COCO (Common Objects in Context)
10-shot image generation on COCO (Common Objects in Context)
Metric: FID (lower is better)
Leaderboard
Dataset
Loading chart...
Results
Submit a result
Hide extra data
Export CSV
Sort:
FID (best first)
FID (worst first)
Date (newest first)
Date (oldest first)
Model name (A→Z)
#
Model
↕
FID
▲
Extra Data
Paper
Date
↕
Code
1
RAT-Diffusion
5
Yes
Data Extrapolation for Text-to-image Generation ...
2024-10-02
Code
2
Re-Imagen (Finetuned)
5.25
No
Re-Imagen: Retrieval-Augmented Text-to-Image Gen...
2022-09-29
-
3
U-ViT-S/2-Deep
5.48
No
All are Worth Words: A ViT Backbone for Diffusio...
2022-09-25
Code
4
GLIGEN (fine-tuned, Detection + Caption data)
5.61
No
GLIGEN: Open-Set Grounded Text-to-Image Generation
2023-01-17
Code
5
GLIGEN (fine-tuned, Detection data only)
5.82
No
GLIGEN: Open-Set Grounded Text-to-Image Generation
2023-01-17
Code
6
U-ViT-S/2
5.95
No
All are Worth Words: A ViT Backbone for Diffusio...
2022-09-25
Code
7
ConPreDiff
6.21
No
Improving Diffusion-Based Image Synthesis with C...
2024-01-04
-
8
TLDM
6.29
No
Truncated Diffusion Probabilistic Models and Dif...
2022-02-19
Code
9
GLIGEN (fine-tuned, Grounding data)
6.38
No
GLIGEN: Open-Set Grounded Text-to-Image Generation
2023-01-17
Code
10
RAPHAEL (zero-shot)
6.61
No
RAPHAEL: Text-to-Image Generation via Large Mixt...
2023-05-29
Code
11
ERNIE-ViLG 2.0 (zero-shot)
6.75
No
ERNIE-ViLG 2.0: Improving Text-to-Image Diffusio...
2022-10-27
Code
12
Re-Imagen
6.88
No
Re-Imagen: Retrieval-Augmented Text-to-Image Gen...
2022-09-29
-
13
eDiff-I (zero-shot)
6.95
No
eDiff-I: Text-to-Image Diffusion Models with an ...
2022-11-02
Code
14
Swinv2-Imagen
7.21
Yes
Swinv2-Imagen: Hierarchical Vision Transformer D...
2022-10-18
-
15
Imagen (zero-shot)
7.27
Yes
Photorealistic Text-to-Image Diffusion Models wi...
2022-05-23
Code
16
GigaGAN (Zero-shot, 64x64)
7.28
No
Scaling up GANs for Text-to-Image Synthesis
2023-03-09
Code
17
StyleGAN-T (Zero-shot, 64x64)
7.3
No
StyleGAN-T: Unlocking the Power of GANs for Fast...
2023-01-23
Code
18
Make-a-Scene (unfiltered)
7.55
Yes
Make-A-Scene: Scene-Based Text-to-Image Generati...
2022-03-24
Code
19
Kandinsky
8.03
No
Kandinsky: an Improved Text-to-Image Synthesis w...
2023-10-05
Code
20
Lafite
8.12
No
LAFITE: Towards Language-Free Training for Text-...
2021-11-27
Code
21
SiD-LSG (Data-free distillation, zero-shot FID)
8.15
No
Long and Short Guidance in Score identity Distil...
2024-06-03
Code
22
simple diffusion (U-ViT)
8.3
No
Simple diffusion: End-to-end diffusion for high ...
2023-01-26
Code
23
GigaGAN (Zero-shot, 256x256)
9.09
No
Scaling up GANs for Text-to-Image Synthesis
2023-03-09
Code
24
XMC-GAN (256 x 256)
9.3
No
NÜWA: Visual Synthesis Pre-training for Neural v...
2021-11-24
Code
25
XMC-GAN
9.33
Yes
Cross-Modal Contrastive Learning for Text-to-Ima...
2021-01-12
Code
26
DALL-E 2
10.39
Yes
Hierarchical Text-Conditional Image Generation w...
2022-04-13
Code
27
Corgi-Semi
10.6
No
Shifted Diffusion for Text-to-image Generation
2022-11-24
Code
28
Corgi
10.88
No
Shifted Diffusion for Text-to-image Generation
2022-11-24
Code
29
TR0N (StyleGAN-XL, LAION2BCLIP, BLIP-2, zero-shot)
10.9
No
TR0N: Translator Networks for 0-Shot Plug-and-Pl...
2023-04-26
Code
30
Make-a-Scene (unfiltered)
11.84
Yes
Make-A-Scene: Scene-Based Text-to-Image Generati...
2022-03-24
Code
31
GLIDE (zero-shot)
12.24
Yes
GLIDE: Towards Photorealistic Image Generation a...
2021-12-20
Code
32
KNN-Diffusion
12.5
No
KNN-Diffusion: Image Generation via Large-Scale ...
2022-04-06
-
33
GALIP (CC12m)
12.54
No
GALIP: Generative Adversarial CLIPs for Text-to-...
2023-01-30
Code
34
Latent Diffusion (LDM-KL-8-G)
12.63
Yes
High-Resolution Image Synthesis with Latent Diff...
2021-12-20
Code
35
Stable Diffusion
12.63
No
Retrieval-Augmented Multimodal Language Modeling
2022-11-22
-
36
NÜWA (256 x 256)
12.9
No
NÜWA: Visual Synthesis Pre-training for Neural v...
2021-11-24
Code
37
VQ-Diffusion-F
13.86
Yes
Vector Quantized Diffusion Model for Text-to-Ima...
2021-11-29
Code
38
StyleGAN-T (Zero-shot, 256x256)
13.9
No
StyleGAN-T: Unlocking the Power of GANs for Fast...
2023-01-23
Code
39
RAT-GAN
14.6
No
Recurrent Affine Transformation for Text-to-imag...
2022-04-22
Code
40
ERNIE-ViLG
14.7
No
ERNIE-ViLG: Unified Generative Pre-training for ...
2021-12-31
Code
41
RA-CM3 (2.7B)
15.7
No
Retrieval-Augmented Multimodal Language Modeling
2022-11-22
-
42
CogView2(6B, Finetuned)
17.7
No
CogView2: Faster and Better Text-to-Image Genera...
2022-04-28
Code
43
VQ-Diffusion-B
19.75
Yes
Vector Quantized Diffusion Model for Text-to-Ima...
2021-11-29
Code
44
DM-GAN+CL
20.79
No
Improving Text-to-Image Synthesis Using Contrast...
2021-07-06
Code
45
FuseDream (few-shot, k=5)
21.16
No
FuseDream: Training-Free Text-to-Image Generatio...
2021-12-02
Code
46
FuseDream (k=5, 256)
21.16
No
FuseDream: Training-Free Text-to-Image Generatio...
2021-12-02
Code
47
FuseDream (k=10, 256)
21.89
No
FuseDream: Training-Free Text-to-Image Generatio...
2021-12-02
Code
48
AttnGAN+CL
23.93
No
Improving Text-to-Image Synthesis Using Contrast...
2021-07-06
Code
49
CogView2(6B, Finetuned)
24
No
CogView2: Faster and Better Text-to-Image Genera...
2022-04-28
Code
50
OP-GAN
24.7
No
Semantic Object Accuracy for Generative Text-to-...
2019-10-29
Code
51
DM-GAN (256 x 256)
26
No
NÜWA: Visual Synthesis Pre-training for Neural v...
2021-11-24
Code
52
Lafite (zero-shot)
26.94
No
LAFITE: Towards Language-Free Training for Text-...
2021-11-27
Code
53
CogView
27.1
Yes
CogView: Mastering Text-to-Image Generation via ...
2021-05-26
Code
54
CogView (256 x 256)
27.1
No
NÜWA: Visual Synthesis Pre-training for Neural v...
2021-11-24
Code
55
DALL-E (256 x 256)
27.5
No
NÜWA: Visual Synthesis Pre-training for Neural v...
2021-11-24
Code
56
DALL-E (12B)
28
No
Retrieval-Augmented Multimodal Language Modeling
2022-11-22
-
57
AttnGAN + VICTR
29.26
No
VICTR: Visual Information Captured Text Represen...
2020-10-07
Code
58
Vanilla CM3
29.5
No
Retrieval-Augmented Multimodal Language Modeling
2022-11-22
-
59
DM-GAN + VICTR
32.37
No
VICTR: Visual Information Captured Text Represen...
2020-10-07
Code
60
DM-GAN
32.64
No
DM-GAN: Dynamic Memory Generative Adversarial Ne...
2019-04-02
Code
61
AttnGAN + OP
33.35
No
Generating Multiple Objects at Spatially Distinc...
2019-01-03
Code
62
AttnGAN (256 x 256)
35.2
No
NÜWA: Visual Synthesis Pre-training for Neural v...
2021-11-24
Code
63
L-Verse-CC
37.2
No
L-Verse: Bidirectional Generation Between Image ...
2021-11-22
Code
64
L-Verse
45.8
No
L-Verse: Bidirectional Generation Between Image ...
2021-11-22
Code
65
StackGAN + OP
55.3
No
Generating Multiple Objects at Spatially Distinc...
2019-01-03
Code
66
StackGAN-v1
74.05
No
StackGAN++: Realistic Image Synthesis with Stack...
2017-10-19
Code
#1
RAT-Diffusion
SOTA
5
FID
· Extra Data
· 2024-10-02
Data Extrapolation for Text-to-image Generation on Small Datasets
Code
#2
Re-Imagen (Finetuned)
SOTA
5.25
FID
· 2022-09-29
Re-Imagen: Retrieval-Augmented Text-to-Image Generator
#3
U-ViT-S/2-Deep
SOTA
5.48
FID
· 2022-09-25
All are Worth Words: A ViT Backbone for Diffusion Models
Code
#4
GLIGEN (fine-tuned, Detection + Caption data)
5.61
FID
· 2023-01-17
GLIGEN: Open-Set Grounded Text-to-Image Generation
Code
#5
GLIGEN (fine-tuned, Detection data only)
5.82
FID
· 2023-01-17
GLIGEN: Open-Set Grounded Text-to-Image Generation
Code
#6
U-ViT-S/2
SOTA
5.95
FID
· 2022-09-25
All are Worth Words: A ViT Backbone for Diffusion Models
Code
#7
ConPreDiff
6.21
FID
· 2024-01-04
Improving Diffusion-Based Image Synthesis with Context Prediction
#8
TLDM
SOTA
6.29
FID
· 2022-02-19
Truncated Diffusion Probabilistic Models and Diffusion-based Adversarial Auto-Encoders
Code
#9
GLIGEN (fine-tuned, Grounding data)
6.38
FID
· 2023-01-17
GLIGEN: Open-Set Grounded Text-to-Image Generation
Code
#10
RAPHAEL (zero-shot)
6.61
FID
· 2023-05-29
RAPHAEL: Text-to-Image Generation via Large Mixture of Diffusion Paths
Code
#11
ERNIE-ViLG 2.0 (zero-shot)
6.75
FID
· 2022-10-27
ERNIE-ViLG 2.0: Improving Text-to-Image Diffusion Model with Knowledge-Enhanced Mixture-of-Denoising-Experts
Code
#12
Re-Imagen
6.88
FID
· 2022-09-29
Re-Imagen: Retrieval-Augmented Text-to-Image Generator
#13
eDiff-I (zero-shot)
6.95
FID
· 2022-11-02
eDiff-I: Text-to-Image Diffusion Models with an Ensemble of Expert Denoisers
Code
#14
Swinv2-Imagen
7.21
FID
· Extra Data
· 2022-10-18
Swinv2-Imagen: Hierarchical Vision Transformer Diffusion Models for Text-to-Image Generation
#15
Imagen (zero-shot)
7.27
FID
· Extra Data
· 2022-05-23
Photorealistic Text-to-Image Diffusion Models with Deep Language Understanding
Code
#16
GigaGAN (Zero-shot, 64x64)
7.28
FID
· 2023-03-09
Scaling up GANs for Text-to-Image Synthesis
Code
#17
StyleGAN-T (Zero-shot, 64x64)
7.3
FID
· 2023-01-23
StyleGAN-T: Unlocking the Power of GANs for Fast Large-Scale Text-to-Image Synthesis
Code
#18
Make-a-Scene (unfiltered)
7.55
FID
· Extra Data
· 2022-03-24
Make-A-Scene: Scene-Based Text-to-Image Generation with Human Priors
Code
#19
Kandinsky
8.03
FID
· 2023-10-05
Kandinsky: an Improved Text-to-Image Synthesis with Image Prior and Latent Diffusion
Code
#20
Lafite
SOTA
8.12
FID
· 2021-11-27
LAFITE: Towards Language-Free Training for Text-to-Image Generation
Code
#21
SiD-LSG (Data-free distillation, zero-shot FID)
8.15
FID
· 2024-06-03
Long and Short Guidance in Score identity Distillation for One-Step Text-to-Image Generation
Code
#22
simple diffusion (U-ViT)
8.3
FID
· 2023-01-26
Simple diffusion: End-to-end diffusion for high resolution images
Code
#23
GigaGAN (Zero-shot, 256x256)
9.09
FID
· 2023-03-09
Scaling up GANs for Text-to-Image Synthesis
Code
#24
XMC-GAN (256 x 256)
SOTA
9.3
FID
· 2021-11-24
NÜWA: Visual Synthesis Pre-training for Neural visUal World creAtion
Code
#25
XMC-GAN
SOTA
9.33
FID
· Extra Data
· 2021-01-12
Cross-Modal Contrastive Learning for Text-to-Image Generation
Code
#26
DALL-E 2
10.39
FID
· Extra Data
· 2022-04-13
Hierarchical Text-Conditional Image Generation with CLIP Latents
Code
#27
Corgi-Semi
10.6
FID
· 2022-11-24
Shifted Diffusion for Text-to-image Generation
Code
#28
Corgi
10.88
FID
· 2022-11-24
Shifted Diffusion for Text-to-image Generation
Code
#29
TR0N (StyleGAN-XL, LAION2BCLIP, BLIP-2, zero-shot)
10.9
FID
· 2023-04-26
TR0N: Translator Networks for 0-Shot Plug-and-Play Conditional Generation
Code
#30
Make-a-Scene (unfiltered)
11.84
FID
· Extra Data
· 2022-03-24
Make-A-Scene: Scene-Based Text-to-Image Generation with Human Priors
Code
#31
GLIDE (zero-shot)
12.24
FID
· Extra Data
· 2021-12-20
GLIDE: Towards Photorealistic Image Generation and Editing with Text-Guided Diffusion Models
Code
#32
KNN-Diffusion
12.5
FID
· 2022-04-06
KNN-Diffusion: Image Generation via Large-Scale Retrieval
#33
GALIP (CC12m)
12.54
FID
· 2023-01-30
GALIP: Generative Adversarial CLIPs for Text-to-Image Synthesis
Code
#34
Latent Diffusion (LDM-KL-8-G)
12.63
FID
· Extra Data
· 2021-12-20
High-Resolution Image Synthesis with Latent Diffusion Models
Code
#35
Stable Diffusion
12.63
FID
· 2022-11-22
Retrieval-Augmented Multimodal Language Modeling
#36
NÜWA (256 x 256)
12.9
FID
· 2021-11-24
NÜWA: Visual Synthesis Pre-training for Neural visUal World creAtion
Code
#37
VQ-Diffusion-F
13.86
FID
· Extra Data
· 2021-11-29
Vector Quantized Diffusion Model for Text-to-Image Synthesis
Code
#38
StyleGAN-T (Zero-shot, 256x256)
13.9
FID
· 2023-01-23
StyleGAN-T: Unlocking the Power of GANs for Fast Large-Scale Text-to-Image Synthesis
Code
#39
RAT-GAN
14.6
FID
· 2022-04-22
Recurrent Affine Transformation for Text-to-image Synthesis
Code
#40
ERNIE-ViLG
14.7
FID
· 2021-12-31
ERNIE-ViLG: Unified Generative Pre-training for Bidirectional Vision-Language Generation
Code
#41
RA-CM3 (2.7B)
15.7
FID
· 2022-11-22
Retrieval-Augmented Multimodal Language Modeling
#42
CogView2(6B, Finetuned)
17.7
FID
· 2022-04-28
CogView2: Faster and Better Text-to-Image Generation via Hierarchical Transformers
Code
#43
VQ-Diffusion-B
19.75
FID
· Extra Data
· 2021-11-29
Vector Quantized Diffusion Model for Text-to-Image Synthesis
Code
#44
DM-GAN+CL
20.79
FID
· 2021-07-06
Improving Text-to-Image Synthesis Using Contrastive Learning
Code
#45
FuseDream (few-shot, k=5)
21.16
FID
· 2021-12-02
FuseDream: Training-Free Text-to-Image Generation with Improved CLIP+GAN Space Optimization
Code
#46
FuseDream (k=5, 256)
21.16
FID
· 2021-12-02
FuseDream: Training-Free Text-to-Image Generation with Improved CLIP+GAN Space Optimization
Code
#47
FuseDream (k=10, 256)
21.89
FID
· 2021-12-02
FuseDream: Training-Free Text-to-Image Generation with Improved CLIP+GAN Space Optimization
Code
#48
AttnGAN+CL
23.93
FID
· 2021-07-06
Improving Text-to-Image Synthesis Using Contrastive Learning
Code
#49
CogView2(6B, Finetuned)
24
FID
· 2022-04-28
CogView2: Faster and Better Text-to-Image Generation via Hierarchical Transformers
Code
#50
OP-GAN
SOTA
24.7
FID
· 2019-10-29
Semantic Object Accuracy for Generative Text-to-Image Synthesis
Code
#51
DM-GAN (256 x 256)
26
FID
· 2021-11-24
NÜWA: Visual Synthesis Pre-training for Neural visUal World creAtion
Code
#52
Lafite (zero-shot)
26.94
FID
· 2021-11-27
LAFITE: Towards Language-Free Training for Text-to-Image Generation
Code
#53
CogView
27.1
FID
· Extra Data
· 2021-05-26
CogView: Mastering Text-to-Image Generation via Transformers
Code
#54
CogView (256 x 256)
27.1
FID
· 2021-11-24
NÜWA: Visual Synthesis Pre-training for Neural visUal World creAtion
Code
#55
DALL-E (256 x 256)
27.5
FID
· 2021-11-24
NÜWA: Visual Synthesis Pre-training for Neural visUal World creAtion
Code
#56
DALL-E (12B)
28
FID
· 2022-11-22
Retrieval-Augmented Multimodal Language Modeling
#57
AttnGAN + VICTR
29.26
FID
· 2020-10-07
VICTR: Visual Information Captured Text Representation for Text-to-Image Multimodal Tasks
Code
#58
Vanilla CM3
29.5
FID
· 2022-11-22
Retrieval-Augmented Multimodal Language Modeling
#59
DM-GAN + VICTR
32.37
FID
· 2020-10-07
VICTR: Visual Information Captured Text Representation for Text-to-Image Multimodal Tasks
Code
#60
DM-GAN
SOTA
32.64
FID
· 2019-04-02
DM-GAN: Dynamic Memory Generative Adversarial Networks for Text-to-Image Synthesis
Code
#61
AttnGAN + OP
SOTA
33.35
FID
· 2019-01-03
Generating Multiple Objects at Spatially Distinct Locations
Code
#62
AttnGAN (256 x 256)
35.2
FID
· 2021-11-24
NÜWA: Visual Synthesis Pre-training for Neural visUal World creAtion
Code
#63
L-Verse-CC
37.2
FID
· 2021-11-22
L-Verse: Bidirectional Generation Between Image and Text
Code
#64
L-Verse
45.8
FID
· 2021-11-22
L-Verse: Bidirectional Generation Between Image and Text
Code
#65
StackGAN + OP
SOTA
55.3
FID
· 2019-01-03
Generating Multiple Objects at Spatially Distinct Locations
Code
#66
StackGAN-v1
SOTA
74.05
FID
· 2017-10-19
StackGAN++: Realistic Image Synthesis with Stacked Generative Adversarial Networks
Code