Tasks
SotA
Datasets
Papers
Methods
Submit
About
SotA
/
Natural Language Processing
/
Text-to-Video Generation
/
MSR-VTT
Text-to-Video Generation on MSR-VTT
Metric: CLIPSIM (higher is better)
Leaderboard
Dataset
Loading chart...
Results
Submit a result
Export CSV
Sort:
CLIPSIM (best first)
CLIPSIM (worst first)
Date (newest first)
Date (oldest first)
Model name (A→Z)
#
Model
↕
CLIPSIM
▼
Extra Data
Paper
Date
↕
Code
1
PixelDance
0.3125
No
Make Pixels Dance: High-Dynamic Video Generation
2023-11-18
-
2
VideoPoet
0.3123
No
VideoPoet: A Large Language Model for Zero-Shot ...
2023-12-21
-
3
Show-1
0.3072
No
Show-1: Marrying Pixel and Latent Diffusion Mode...
2023-09-27
Code
4
Make-A-Video
0.3049
No
Make-A-Video: Text-to-Video Generation without T...
2022-09-29
Code
5
Video-LaVIT
0.3012
No
Video-LaVIT: Unified Video-Language Pre-training...
2024-02-05
Code
6
TF-T2V
0.2991
No
A Recipe for Scaling up Text-to-Video Generation...
2023-12-25
Code
7
HiGen
0.2947
No
Hierarchical Spatio-temporal Decoupling for Text...
2023-12-07
Code
8
VideoComposer
0.2932
No
VideoComposer: Compositional Video Synthesis wit...
2023-06-03
Code
9
ModelScopeT2V
0.293
No
ModelScope Text-to-Video Technical Report
2023-08-12
Code
10
Video LDM
0.2929
No
Align your Latents: High-Resolution Video Synthe...
2023-04-18
Code
11
Snap Video (512x288)
0.2793
No
Snap Video: Scaled Spatiotemporal Transformers f...
2024-02-22
-
12
Snap Video (288×288)
0.2793
No
Snap Video: Scaled Spatiotemporal Transformers f...
2024-02-22
-
13
MMVG
0.2644
No
Tell Me What Happened: Unifying Text-guided Vide...
2022-11-23
Code
14
CogVideo (English)
0.2631
No
Make-A-Video: Text-to-Video Generation without T...
2022-09-29
Code
15
CogVideo (Chinese)
0.2614
No
Align your Latents: High-Resolution Video Synthe...
2023-04-18
Code
16
NUWA
0.2439
No
NÜWA: Visual Synthesis Pre-training for Neural v...
2021-11-24
Code
17
GODIVA
0.2402
No
GODIVA: Generating Open-DomaIn Videos from nAtur...
2021-04-30
Code
#1
PixelDance
SOTA
0.3125
CLIPSIM
· 2023-11-18
Make Pixels Dance: High-Dynamic Video Generation
#2
VideoPoet
0.3123
CLIPSIM
· 2023-12-21
VideoPoet: A Large Language Model for Zero-Shot Video Generation
#3
Show-1
SOTA
0.3072
CLIPSIM
· 2023-09-27
Show-1: Marrying Pixel and Latent Diffusion Models for Text-to-Video Generation
Code
#4
Make-A-Video
SOTA
0.3049
CLIPSIM
· 2022-09-29
Make-A-Video: Text-to-Video Generation without Text-Video Data
Code
#5
Video-LaVIT
0.3012
CLIPSIM
· 2024-02-05
Video-LaVIT: Unified Video-Language Pre-training with Decoupled Visual-Motional Tokenization
Code
#6
TF-T2V
0.2991
CLIPSIM
· 2023-12-25
A Recipe for Scaling up Text-to-Video Generation with Text-free Videos
Code
#7
HiGen
0.2947
CLIPSIM
· 2023-12-07
Hierarchical Spatio-temporal Decoupling for Text-to-Video Generation
Code
#8
VideoComposer
0.2932
CLIPSIM
· 2023-06-03
VideoComposer: Compositional Video Synthesis with Motion Controllability
Code
#9
ModelScopeT2V
0.293
CLIPSIM
· 2023-08-12
ModelScope Text-to-Video Technical Report
Code
#10
Video LDM
0.2929
CLIPSIM
· 2023-04-18
Align your Latents: High-Resolution Video Synthesis with Latent Diffusion Models
Code
#11
Snap Video (512x288)
0.2793
CLIPSIM
· 2024-02-22
Snap Video: Scaled Spatiotemporal Transformers for Text-to-Video Synthesis
#12
Snap Video (288×288)
0.2793
CLIPSIM
· 2024-02-22
Snap Video: Scaled Spatiotemporal Transformers for Text-to-Video Synthesis
#13
MMVG
0.2644
CLIPSIM
· 2022-11-23
Tell Me What Happened: Unifying Text-guided Video Completion via Multimodal Masked Video Generation
Code
#14
CogVideo (English)
0.2631
CLIPSIM
· 2022-09-29
Make-A-Video: Text-to-Video Generation without Text-Video Data
Code
#15
CogVideo (Chinese)
0.2614
CLIPSIM
· 2023-04-18
Align your Latents: High-Resolution Video Synthesis with Latent Diffusion Models
Code
#16
NUWA
SOTA
0.2439
CLIPSIM
· 2021-11-24
NÜWA: Visual Synthesis Pre-training for Neural visUal World creAtion
Code
#17
GODIVA
SOTA
0.2402
CLIPSIM
· 2021-04-30
GODIVA: Generating Open-DomaIn Videos from nAtural Descriptions
Code