TasksSotADatasetsPapersMethodsSubmitAbout
Papers With Code 2

A community resource for machine learning research: papers, code, benchmarks, and state-of-the-art results.

Explore

Notable BenchmarksAll SotADatasetsPapersMethods

Community

Submit ResultsAbout

Data sourced from the PWC Archive (CC-BY-SA 4.0). Built by the community, for the community.

SotA/Natural Language Processing/Text-to-Video Generation/MSR-VTT

Text-to-Video Generation on MSR-VTT

Metric: CLIP-FID (lower is better)

LeaderboardDataset
Loading chart...

Results

Submit a result
#Model↕CLIP-FID▲Extra DataPaperDate↕Code
1Snap Video (288×288)8.48NoSnap Video: Scaled Spatiotemporal Transformers f...2024-02-22-
2Snap Video (512x288)9.35NoSnap Video: Scaled Spatiotemporal Transformers f...2024-02-22-
3Make-A-Video13.17NoMake-A-Video: Text-to-Video Generation without T...2022-09-29Code
4CogVideo (English)23.59NoMake-A-Video: Text-to-Video Generation without T...2022-09-29Code
5CogVideo (Chinese)24.78NoAlign your Latents: High-Resolution Video Synthe...2023-04-18Code
6NUWA47.68NoNÜWA: Visual Synthesis Pre-training for Neural v...2021-11-24Code