Tasks
SotA
Datasets
Papers
Methods
Submit
About
SotA
/
Audio
/
1 Image, 2*2 Stitchi
/
GenEval
1 Image, 2*2 Stitchi on GenEval
Metric: Overall (higher is better)
Leaderboard
Dataset
Loading chart...
Results
Submit a result
Export CSV
Sort:
Overall (best first)
Overall (worst first)
Date (newest first)
Date (oldest first)
Model name (A→Z)
#
Model
↕
Overall
▼
Extra Data
Paper
Date
↕
Code
1
SD3.5-Medium+Flow-GRPO
0.95
No
Flow-GRPO: Training Flow Matching Models via Onl...
2025-05-08
Code
2
UniWorld-V1 (Rewrite)
0.84
No
UniWorld-V1: High-Resolution Semantic Encoders f...
2025-06-03
Code
3
MindOmni
0.83
No
MindOmni: Unleashing Reasoning Generation in Vis...
2025-05-19
Code
4
UniWorld-V1
0.8
No
UniWorld-V1: High-Resolution Semantic Encoders f...
2025-06-03
Code
5
SANA-1.5 4.8B (+ Inference Scaling)
0.8
No
SANA 1.5: Efficient Scaling of Training-Time and...
2025-01-30
Code
6
Janus-Pro-7B
0.8
No
Janus-Pro: Unified Multimodal Understanding and ...
2025-01-29
Code
7
MetaQuery-XL (Rewrite)
0.8
No
Transfer between Modalities with MetaQueries
2025-04-08
-
8
Show-o [xie2024show] PARM It. DPO PARM
0.77
No
Can We Generate Images with CoT? Let's Verify an...
2025-01-23
Code
9
Show-o [xie2024show] Ft. ORM It. DPO Ft. ORM
0.75
No
Can We Generate Images with CoT? Let's Verify an...
2025-01-23
Code
10
Janus-Pro-1B
0.73
No
Janus-Pro: Unified Multimodal Understanding and ...
2025-01-29
Code
11
Lumina-Image 2.0
0.73
No
Lumina-Image 2.0: A Unified and Efficient Image ...
2025-03-27
Code
12
SANA-1.5 4.8B
0.72
No
SANA 1.5: Efficient Scaling of Training-Time and...
2025-01-30
Code
13
Fluid (10.5B)
0.69
No
Fluid: Scaling Autoregressive Text-to-image Gene...
2024-10-17
Code
14
Und. and Gen. Show-o (Ours)
0.68
No
Show-o: One Single Transformer to Unify Multimod...
2024-08-22
Code
15
Emu3
0.66
No
Emu3: Next-Token Prediction is All You Need
2024-09-27
Code
16
SnapGen
0.66
No
SnapGen: Taming High-Resolution Text-to-Image Mo...
2024-12-12
-
17
JanusFlow
0.63
No
JanusFlow: Harmonizing Autoregression and Rectif...
2024-11-12
Code
18
PixArt-Σ
0.53
No
PixArt-Σ: Weak-to-Strong Training of Diffusion T...
2024-03-07
Code
19
DiffMoE-E16-T2I-Flow (w SFT)
0.51
No
DiffMoE: Dynamic Token Selection for Scalable Di...
2025-03-18
-
20
PIXART-δ
0
No
-
-
Code
#1
SD3.5-Medium+Flow-GRPO
SOTA
0.95
Overall
· 2025-05-08
Flow-GRPO: Training Flow Matching Models via Online RL
Code
#2
UniWorld-V1 (Rewrite)
0.84
Overall
· 2025-06-03
UniWorld-V1: High-Resolution Semantic Encoders for Unified Visual Understanding and Generation
Code
#3
MindOmni
0.83
Overall
· 2025-05-19
MindOmni: Unleashing Reasoning Generation in Vision Language Models with RGPO
Code
#4
UniWorld-V1
0.8
Overall
· 2025-06-03
UniWorld-V1: High-Resolution Semantic Encoders for Unified Visual Understanding and Generation
Code
#5
SANA-1.5 4.8B (+ Inference Scaling)
0.8
Overall
· 2025-01-30
SANA 1.5: Efficient Scaling of Training-Time and Inference-Time Compute in Linear Diffusion Transformer
Code
#6
Janus-Pro-7B
SOTA
0.8
Overall
· 2025-01-29
Janus-Pro: Unified Multimodal Understanding and Generation with Data and Model Scaling
Code
#7
MetaQuery-XL (Rewrite)
0.8
Overall
· 2025-04-08
Transfer between Modalities with MetaQueries
#8
Show-o [xie2024show] PARM It. DPO PARM
SOTA
0.77
Overall
· 2025-01-23
Can We Generate Images with CoT? Let's Verify and Reinforce Image Generation Step by Step
Code
#9
Show-o [xie2024show] Ft. ORM It. DPO Ft. ORM
0.75
Overall
· 2025-01-23
Can We Generate Images with CoT? Let's Verify and Reinforce Image Generation Step by Step
Code
#10
Janus-Pro-1B
0.73
Overall
· 2025-01-29
Janus-Pro: Unified Multimodal Understanding and Generation with Data and Model Scaling
Code
#11
Lumina-Image 2.0
0.73
Overall
· 2025-03-27
Lumina-Image 2.0: A Unified and Efficient Image Generative Framework
Code
#12
SANA-1.5 4.8B
0.72
Overall
· 2025-01-30
SANA 1.5: Efficient Scaling of Training-Time and Inference-Time Compute in Linear Diffusion Transformer
Code
#13
Fluid (10.5B)
SOTA
0.69
Overall
· 2024-10-17
Fluid: Scaling Autoregressive Text-to-image Generative Models with Continuous Tokens
Code
#14
Und. and Gen. Show-o (Ours)
SOTA
0.68
Overall
· 2024-08-22
Show-o: One Single Transformer to Unify Multimodal Understanding and Generation
Code
#15
Emu3
0.66
Overall
· 2024-09-27
Emu3: Next-Token Prediction is All You Need
Code
#16
SnapGen
0.66
Overall
· 2024-12-12
SnapGen: Taming High-Resolution Text-to-Image Models for Mobile Devices with Efficient Architectures and Training
#17
JanusFlow
0.63
Overall
· 2024-11-12
JanusFlow: Harmonizing Autoregression and Rectified Flow for Unified Multimodal Understanding and Generation
Code
#18
PixArt-Σ
SOTA
0.53
Overall
· 2024-03-07
PixArt-Σ: Weak-to-Strong Training of Diffusion Transformer for 4K Text-to-Image Generation
Code
#19
DiffMoE-E16-T2I-Flow (w SFT)
0.51
Overall
· 2025-03-18
DiffMoE: Dynamic Token Selection for Scalable Diffusion Transformers
#20
PIXART-δ
0
Overall
No paper
Code