Tasks
SotA
Datasets
Papers
Methods
Submit
About
SotA
/
Natural Language Processing
/
Video Generation
/
UCF-101
Video Generation on UCF-101
Metric: Inception Score (higher is better)
Leaderboard
Dataset
Loading chart...
Results
Submit a result
Export CSV
Sort:
Inception Score (best first)
Inception Score (worst first)
Date (newest first)
Date (oldest first)
Model name (A→Z)
#
Model
↕
Inception Score
▼
Extra Data
Paper
Date
↕
Code
1
HPDM-L
87.68
No
Hierarchical Patch Diffusion Models for High-Res...
2024-06-12
-
2
Make-A-Video (Finetuning, 256x256, class-conditional)
82.55
No
Make-A-Video: Text-to-Video Generation without T...
2022-09-29
Code
3
VideoFusion (128x128, class-conditional)
80.03
No
VideoFusion: Decomposed Diffusion Models for Hig...
2023-03-15
Code
4
TATS (128x128, class-conditional)
79.28
No
Long Video Generation with Time-Agnostic VQGAN a...
2022-04-07
Code
5
FIFO-Diffusion
74.44
No
FIFO-Diffusion: Generating Infinite Videos from ...
2024-05-19
Code
6
MMVG (128x128, class-conditional)
73.7
No
Tell Me What Happened: Unifying Text-guided Vide...
2022-11-23
Code
7
VideoFusion (128x128, unconditional)
72.22
No
VideoFusion: Decomposed Diffusion Models for Hig...
2023-03-15
Code
8
MeBT (128x128, unconditional)
65.93
No
Towards End-to-End Generative Modeling of Long V...
2023-03-20
Code
9
GridDiff (Zero-shot)
62.88
No
Grid Diffusion Models for Text-to-Video Generation
2024-03-30
-
10
PYoCo (Zero-shot, 64x64, unconditional)
60.01
No
Preserve Your Own Correlation: A Noise Prior for...
2023-05-17
-
11
DIGAN (128x128, class-conditional)
59.68
No
Generating Videos with Dynamics-aware Implicit G...
2022-02-21
Code
12
MMVG (128x128, unconditional)
58.3
No
Tell Me What Happened: Unifying Text-guided Vide...
2022-11-23
Code
13
TATS (128x128, unconditional)
57.63
No
Long Video Generation with Time-Agnostic VQGAN a...
2022-04-07
Code
14
CogVideo (128x128, class-conditional)
51.11
No
CogVideo: Large-scale Pretraining for Text-to-Vi...
2022-05-29
Code
15
VideoAssembler (Zero-shot, 256x256, class-conditional)
48.01
No
MagDiff: Multi-Alignment Diffusion for High-Fide...
2023-11-29
Code
16
PYoCo (Zero-shot, 64x64, text-conditional)
47.76
No
Preserve Your Own Correlation: A Noise Prior for...
2023-05-17
-
17
Video-LaVIT
44.26
No
Video-LaVIT: Unified Video-Language Pre-training...
2024-02-05
Code
18
PixelDance (256x256, text-conditional)
42.1
No
Make Pixels Dance: High-Dynamic Video Generation
2023-11-18
-
19
VideoPoet (text-conditional)
38.44
No
VideoPoet: A Large Language Model for Zero-Shot ...
2023-12-21
-
20
Lumiere (Zero-shot. 1024x1024, text-conditional)
37.54
No
Lumiere: A Space-Time Diffusion Model for Video ...
2024-01-23
Code
21
W.A.L.T 3B (text-conditional)
35.1
No
Photorealistic Video Generation with Diffusion M...
2023-12-11
-
22
MoCoGAN-HD (256x256, unconditional)
33.95
No
A Good Image Generator Is What You Need for High...
2021-04-30
Code
23
Video LDM (320x512, text-conditional)
33.45
No
Align your Latents: High-Resolution Video Synthe...
2023-04-18
Code
24
Make-A-Video (Zero-shot, 256x256, class-conditional)
33
No
Make-A-Video: Text-to-Video Generation without T...
2022-09-29
Code
25
DIGAN (128x128, unconditional)
32.7
No
Generating Videos with Dynamics-aware Implicit G...
2022-02-21
Code
#1
HPDM-L
SOTA
87.68
Inception Score
· 2024-06-12
Hierarchical Patch Diffusion Models for High-Resolution Video Generation
#2
Make-A-Video (Finetuning, 256x256, class-conditional)
SOTA
82.55
Inception Score
· 2022-09-29
Make-A-Video: Text-to-Video Generation without Text-Video Data
Code
#3
VideoFusion (128x128, class-conditional)
80.03
Inception Score
· 2023-03-15
VideoFusion: Decomposed Diffusion Models for High-Quality Video Generation
Code
#4
TATS (128x128, class-conditional)
SOTA
79.28
Inception Score
· 2022-04-07
Long Video Generation with Time-Agnostic VQGAN and Time-Sensitive Transformer
Code
#5
FIFO-Diffusion
74.44
Inception Score
· 2024-05-19
FIFO-Diffusion: Generating Infinite Videos from Text without Training
Code
#6
MMVG (128x128, class-conditional)
73.7
Inception Score
· 2022-11-23
Tell Me What Happened: Unifying Text-guided Video Completion via Multimodal Masked Video Generation
Code
#7
VideoFusion (128x128, unconditional)
72.22
Inception Score
· 2023-03-15
VideoFusion: Decomposed Diffusion Models for High-Quality Video Generation
Code
#8
MeBT (128x128, unconditional)
65.93
Inception Score
· 2023-03-20
Towards End-to-End Generative Modeling of Long Videos with Memory-Efficient Bidirectional Transformers
Code
#9
GridDiff (Zero-shot)
62.88
Inception Score
· 2024-03-30
Grid Diffusion Models for Text-to-Video Generation
#10
PYoCo (Zero-shot, 64x64, unconditional)
60.01
Inception Score
· 2023-05-17
Preserve Your Own Correlation: A Noise Prior for Video Diffusion Models
#11
DIGAN (128x128, class-conditional)
SOTA
59.68
Inception Score
· 2022-02-21
Generating Videos with Dynamics-aware Implicit Generative Adversarial Networks
Code
#12
MMVG (128x128, unconditional)
58.3
Inception Score
· 2022-11-23
Tell Me What Happened: Unifying Text-guided Video Completion via Multimodal Masked Video Generation
Code
#13
TATS (128x128, unconditional)
57.63
Inception Score
· 2022-04-07
Long Video Generation with Time-Agnostic VQGAN and Time-Sensitive Transformer
Code
#14
CogVideo (128x128, class-conditional)
51.11
Inception Score
· 2022-05-29
CogVideo: Large-scale Pretraining for Text-to-Video Generation via Transformers
Code
#15
VideoAssembler (Zero-shot, 256x256, class-conditional)
48.01
Inception Score
· 2023-11-29
MagDiff: Multi-Alignment Diffusion for High-Fidelity Video Generation and Editing
Code
#16
PYoCo (Zero-shot, 64x64, text-conditional)
47.76
Inception Score
· 2023-05-17
Preserve Your Own Correlation: A Noise Prior for Video Diffusion Models
#17
Video-LaVIT
44.26
Inception Score
· 2024-02-05
Video-LaVIT: Unified Video-Language Pre-training with Decoupled Visual-Motional Tokenization
Code
#18
PixelDance (256x256, text-conditional)
42.1
Inception Score
· 2023-11-18
Make Pixels Dance: High-Dynamic Video Generation
#19
VideoPoet (text-conditional)
38.44
Inception Score
· 2023-12-21
VideoPoet: A Large Language Model for Zero-Shot Video Generation
#20
Lumiere (Zero-shot. 1024x1024, text-conditional)
37.54
Inception Score
· 2024-01-23
Lumiere: A Space-Time Diffusion Model for Video Generation
Code
#21
W.A.L.T 3B (text-conditional)
35.1
Inception Score
· 2023-12-11
Photorealistic Video Generation with Diffusion Models
#22
MoCoGAN-HD (256x256, unconditional)
SOTA
33.95
Inception Score
· 2021-04-30
A Good Image Generator Is What You Need for High-Resolution Video Synthesis
Code
#23
Video LDM (320x512, text-conditional)
33.45
Inception Score
· 2023-04-18
Align your Latents: High-Resolution Video Synthesis with Latent Diffusion Models
Code
#24
Make-A-Video (Zero-shot, 256x256, class-conditional)
33
Inception Score
· 2022-09-29
Make-A-Video: Text-to-Video Generation without Text-Video Data
Code
#25
DIGAN (128x128, unconditional)
32.7
Inception Score
· 2022-02-21
Generating Videos with Dynamics-aware Implicit Generative Adversarial Networks
Code