1 Image, 2*2 Stitchi on GenEval

Metric: Overall (higher is better)

LeaderboardDataset

Loading chart...

Results

Sort:

#	Model↕	Overall▼	Extra Data	Paper	Date↕	Code
1	SD3.5-Medium+Flow-GRPO	0.95	No	Flow-GRPO: Training Flow Matching Models via Onl...	2025-05-08	Code
2	UniWorld-V1 (Rewrite)	0.84	No	UniWorld-V1: High-Resolution Semantic Encoders f...	2025-06-03	Code
3	MindOmni	0.83	No	MindOmni: Unleashing Reasoning Generation in Vis...	2025-05-19	Code
4	UniWorld-V1	0.8	No	UniWorld-V1: High-Resolution Semantic Encoders f...	2025-06-03	Code
5	SANA-1.5 4.8B (+ Inference Scaling)	0.8	No	SANA 1.5: Efficient Scaling of Training-Time and...	2025-01-30	Code
6	Janus-Pro-7B	0.8	No	Janus-Pro: Unified Multimodal Understanding and ...	2025-01-29	Code
7	MetaQuery-XL (Rewrite)	0.8	No	Transfer between Modalities with MetaQueries	2025-04-08	-
8	Show-o [xie2024show] PARM It. DPO PARM	0.77	No	Can We Generate Images with CoT? Let's Verify an...	2025-01-23	Code
9	Show-o [xie2024show] Ft. ORM It. DPO Ft. ORM	0.75	No	Can We Generate Images with CoT? Let's Verify an...	2025-01-23	Code
10	Janus-Pro-1B	0.73	No	Janus-Pro: Unified Multimodal Understanding and ...	2025-01-29	Code
11	Lumina-Image 2.0	0.73	No	Lumina-Image 2.0: A Unified and Efficient Image ...	2025-03-27	Code
12	SANA-1.5 4.8B	0.72	No	SANA 1.5: Efficient Scaling of Training-Time and...	2025-01-30	Code
13	Fluid (10.5B)	0.69	No	Fluid: Scaling Autoregressive Text-to-image Gene...	2024-10-17	Code
14	Und. and Gen. Show-o (Ours)	0.68	No	Show-o: One Single Transformer to Unify Multimod...	2024-08-22	Code
15	Emu3	0.66	No	Emu3: Next-Token Prediction is All You Need	2024-09-27	Code
16	SnapGen	0.66	No	SnapGen: Taming High-Resolution Text-to-Image Mo...	2024-12-12	-
17	JanusFlow	0.63	No	JanusFlow: Harmonizing Autoregression and Rectif...	2024-11-12	Code
18	PixArt-Σ	0.53	No	PixArt-Σ: Weak-to-Strong Training of Diffusion T...	2024-03-07	Code
19	DiffMoE-E16-T2I-Flow (w SFT)	0.51	No	DiffMoE: Dynamic Token Selection for Scalable Di...	2025-03-18	-
20	PIXART-δ	0	No	-	-	Code

#1SD3.5-Medium+Flow-GRPOSOTA
0.95
Overall· 2025-05-08
Flow-GRPO: Training Flow Matching Models via Online RL Code
#2UniWorld-V1 (Rewrite)
0.84
Overall· 2025-06-03
UniWorld-V1: High-Resolution Semantic Encoders for Unified Visual Understanding and Generation Code
#3MindOmni
0.83
Overall· 2025-05-19
MindOmni: Unleashing Reasoning Generation in Vision Language Models with RGPO Code
#4UniWorld-V1
0.8
Overall· 2025-06-03
UniWorld-V1: High-Resolution Semantic Encoders for Unified Visual Understanding and Generation Code
#5SANA-1.5 4.8B (+ Inference Scaling)
0.8
Overall· 2025-01-30
SANA 1.5: Efficient Scaling of Training-Time and Inference-Time Compute in Linear Diffusion Transformer Code
#6Janus-Pro-7BSOTA
0.8
Overall· 2025-01-29
Janus-Pro: Unified Multimodal Understanding and Generation with Data and Model Scaling Code
#7MetaQuery-XL (Rewrite)
0.8
Overall· 2025-04-08
Transfer between Modalities with MetaQueries
#8Show-o [xie2024show] PARM It. DPO PARMSOTA
0.77
Overall· 2025-01-23
Can We Generate Images with CoT? Let's Verify and Reinforce Image Generation Step by Step Code
#9Show-o [xie2024show] Ft. ORM It. DPO Ft. ORM
0.75
Overall· 2025-01-23
Can We Generate Images with CoT? Let's Verify and Reinforce Image Generation Step by Step Code
#10Janus-Pro-1B
0.73
Overall· 2025-01-29
Janus-Pro: Unified Multimodal Understanding and Generation with Data and Model Scaling Code
#11Lumina-Image 2.0
0.73
Overall· 2025-03-27
Lumina-Image 2.0: A Unified and Efficient Image Generative Framework Code
#12SANA-1.5 4.8B
0.72
Overall· 2025-01-30
SANA 1.5: Efficient Scaling of Training-Time and Inference-Time Compute in Linear Diffusion Transformer Code
#13Fluid (10.5B)SOTA
0.69
Overall· 2024-10-17
Fluid: Scaling Autoregressive Text-to-image Generative Models with Continuous Tokens Code
#14Und. and Gen. Show-o (Ours)SOTA
0.68
Overall· 2024-08-22
Show-o: One Single Transformer to Unify Multimodal Understanding and Generation Code
#15Emu3
0.66
Overall· 2024-09-27
Emu3: Next-Token Prediction is All You Need Code
#16SnapGen
0.66
Overall· 2024-12-12
SnapGen: Taming High-Resolution Text-to-Image Models for Mobile Devices with Efficient Architectures and Training
#17JanusFlow
0.63
Overall· 2024-11-12
JanusFlow: Harmonizing Autoregression and Rectified Flow for Unified Multimodal Understanding and Generation Code
#18PixArt-ΣSOTA
0.53
Overall· 2024-03-07
PixArt-Σ: Weak-to-Strong Training of Diffusion Transformer for 4K Text-to-Image Generation Code
#19DiffMoE-E16-T2I-Flow (w SFT)
0.51
Overall· 2025-03-18
DiffMoE: Dynamic Token Selection for Scalable Diffusion Transformers
#20PIXART-δ
0
Overall
No paperCode