Image Generation on WISE

Metric: Time (higher is better)

LeaderboardDataset

Loading chart...

Results

Sort:

#	Model↕	Time▼	Extra Data	Paper	Date↕	Code
1	MindOmni (w/ cot)	0.7	No	MindOmni: Unleashing Reasoning Generation in Vis...	2025-05-19	Code
2	Bagel (w/ cot)	0.69	No	Emerging Properties in Unified Multimodal Pretra...	2025-05-20	Code
3	Playground-v2.5-1024px-aesthetic	0.58	No	Playground v2.5: Three Insights towards Enhancin...	2024-02-27	-
4	MetaQuery-XL	0.55	No	Transfer between Modalities with MetaQueries	2025-04-08	-
5	UniWorld-V1	0.55	No	UniWorld-V1: High-Resolution Semantic Encoders f...	2025-06-03	Code
6	Bagel	0.55	No	Emerging Properties in Unified Multimodal Pretra...	2025-05-20	Code
7	PixArt-XL-2-1024-MS	0.5	No	PixArt-$α$: Fast Training of Diffusion Transform...	2023-09-30	Code
8	stable-diffusion-3.5-large	0.5	No	Scaling Rectified Flow Transformers for High-Res...	2024-03-05	Code
9	stable-diffusion-xl-base-0.9	0.48	No	SDXL: Improving Latent Diffusion Models for High...	2023-07-04	Code
10	Emu3-gen	0.45	No	Emu3: Next-Token Prediction is All You Need	2024-09-27	Code
11	Show-o	0.4	No	Show-o: One Single Transformer to Unify Multimod...	2024-08-22	Code
12	MindOmni (w/o cot)	0.38	No	MindOmni: Unleashing Reasoning Generation in Vis...	2025-05-19	Code
13	Janus-pro	0.37	No	Janus-Pro: Unified Multimodal Understanding and ...	2025-01-29	Code
14	Janus	0.26	No	Janus: Decoupling Visual Encoding for Unified Mu...	2024-10-17	Code

#1MindOmni (w/ cot)SOTA
0.7
Time· 2025-05-19
MindOmni: Unleashing Reasoning Generation in Vision Language Models with RGPO Code
#2Bagel (w/ cot)
0.69
Time· 2025-05-20
Emerging Properties in Unified Multimodal Pretraining Code
#3Playground-v2.5-1024px-aestheticSOTA
0.58
Time· 2024-02-27
Playground v2.5: Three Insights towards Enhancing Aesthetic Quality in Text-to-Image Generation
#4MetaQuery-XL
0.55
Time· 2025-04-08
Transfer between Modalities with MetaQueries
#5UniWorld-V1
0.55
Time· 2025-06-03
UniWorld-V1: High-Resolution Semantic Encoders for Unified Visual Understanding and Generation Code
#6Bagel
0.55
Time· 2025-05-20
Emerging Properties in Unified Multimodal Pretraining Code
#7PixArt-XL-2-1024-MSSOTA
0.5
Time· 2023-09-30
PixArt-$α$: Fast Training of Diffusion Transformer for Photorealistic Text-to-Image Synthesis Code
#8stable-diffusion-3.5-large
0.5
Time· 2024-03-05
Scaling Rectified Flow Transformers for High-Resolution Image Synthesis Code
#9stable-diffusion-xl-base-0.9SOTA
0.48
Time· 2023-07-04
SDXL: Improving Latent Diffusion Models for High-Resolution Image Synthesis Code
#10Emu3-gen
0.45
Time· 2024-09-27
Emu3: Next-Token Prediction is All You Need Code
#11Show-o
0.4
Time· 2024-08-22
Show-o: One Single Transformer to Unify Multimodal Understanding and Generation Code
#12MindOmni (w/o cot)
0.38
Time· 2025-05-19
MindOmni: Unleashing Reasoning Generation in Vision Language Models with RGPO Code
#13Janus-pro
0.37
Time· 2025-01-29
Janus-Pro: Unified Multimodal Understanding and Generation with Data and Model Scaling Code
#14Janus
0.26
Time· 2024-10-17
Janus: Decoupling Visual Encoding for Unified Multimodal Understanding and Generation Code