Tasks
SotA
Datasets
Papers
Methods
Submit
About
SotA
/
Medical
/
Image Generation
/
WISE
Image Generation on WISE
Metric: Time (higher is better)
Leaderboard
Dataset
Loading chart...
Results
Submit a result
Export CSV
Sort:
Time (best first)
Time (worst first)
Date (newest first)
Date (oldest first)
Model name (A→Z)
#
Model
↕
Time
▼
Extra Data
Paper
Date
↕
Code
1
MindOmni (w/ cot)
0.7
No
MindOmni: Unleashing Reasoning Generation in Vis...
2025-05-19
Code
2
Bagel (w/ cot)
0.69
No
Emerging Properties in Unified Multimodal Pretra...
2025-05-20
Code
3
Playground-v2.5-1024px-aesthetic
0.58
No
Playground v2.5: Three Insights towards Enhancin...
2024-02-27
-
4
MetaQuery-XL
0.55
No
Transfer between Modalities with MetaQueries
2025-04-08
-
5
UniWorld-V1
0.55
No
UniWorld-V1: High-Resolution Semantic Encoders f...
2025-06-03
Code
6
Bagel
0.55
No
Emerging Properties in Unified Multimodal Pretra...
2025-05-20
Code
7
PixArt-XL-2-1024-MS
0.5
No
PixArt-$α$: Fast Training of Diffusion Transform...
2023-09-30
Code
8
stable-diffusion-3.5-large
0.5
No
Scaling Rectified Flow Transformers for High-Res...
2024-03-05
Code
9
stable-diffusion-xl-base-0.9
0.48
No
SDXL: Improving Latent Diffusion Models for High...
2023-07-04
Code
10
Emu3-gen
0.45
No
Emu3: Next-Token Prediction is All You Need
2024-09-27
Code
11
Show-o
0.4
No
Show-o: One Single Transformer to Unify Multimod...
2024-08-22
Code
12
MindOmni (w/o cot)
0.38
No
MindOmni: Unleashing Reasoning Generation in Vis...
2025-05-19
Code
13
Janus-pro
0.37
No
Janus-Pro: Unified Multimodal Understanding and ...
2025-01-29
Code
14
Janus
0.26
No
Janus: Decoupling Visual Encoding for Unified Mu...
2024-10-17
Code
#1
MindOmni (w/ cot)
SOTA
0.7
Time
· 2025-05-19
MindOmni: Unleashing Reasoning Generation in Vision Language Models with RGPO
Code
#2
Bagel (w/ cot)
0.69
Time
· 2025-05-20
Emerging Properties in Unified Multimodal Pretraining
Code
#3
Playground-v2.5-1024px-aesthetic
SOTA
0.58
Time
· 2024-02-27
Playground v2.5: Three Insights towards Enhancing Aesthetic Quality in Text-to-Image Generation
#4
MetaQuery-XL
0.55
Time
· 2025-04-08
Transfer between Modalities with MetaQueries
#5
UniWorld-V1
0.55
Time
· 2025-06-03
UniWorld-V1: High-Resolution Semantic Encoders for Unified Visual Understanding and Generation
Code
#6
Bagel
0.55
Time
· 2025-05-20
Emerging Properties in Unified Multimodal Pretraining
Code
#7
PixArt-XL-2-1024-MS
SOTA
0.5
Time
· 2023-09-30
PixArt-$α$: Fast Training of Diffusion Transformer for Photorealistic Text-to-Image Synthesis
Code
#8
stable-diffusion-3.5-large
0.5
Time
· 2024-03-05
Scaling Rectified Flow Transformers for High-Resolution Image Synthesis
Code
#9
stable-diffusion-xl-base-0.9
SOTA
0.48
Time
· 2023-07-04
SDXL: Improving Latent Diffusion Models for High-Resolution Image Synthesis
Code
#10
Emu3-gen
0.45
Time
· 2024-09-27
Emu3: Next-Token Prediction is All You Need
Code
#11
Show-o
0.4
Time
· 2024-08-22
Show-o: One Single Transformer to Unify Multimodal Understanding and Generation
Code
#12
MindOmni (w/o cot)
0.38
Time
· 2025-05-19
MindOmni: Unleashing Reasoning Generation in Vision Language Models with RGPO
Code
#13
Janus-pro
0.37
Time
· 2025-01-29
Janus-Pro: Unified Multimodal Understanding and Generation with Data and Model Scaling
Code
#14
Janus
0.26
Time
· 2024-10-17
Janus: Decoupling Visual Encoding for Unified Multimodal Understanding and Generation
Code