TasksSotADatasetsPapersMethodsSubmitAbout
Papers With Code 2

A community resource for machine learning research: papers, code, benchmarks, and state-of-the-art results.

Explore

Notable BenchmarksAll SotADatasetsPapersMethods

Community

Submit ResultsAbout

Data sourced from the PWC Archive (CC-BY-SA 4.0). Built by the community, for the community.

SotA/Computer Vision/Video/UCF-101

Video on UCF-101

Metric: FVD16 (higher is better)

LeaderboardDataset
Loading chart...

Results

Submit a result
#Model↕FVD16▼Extra DataPaperDate↕Code
1MCVD2460NoLatent Video Diffusion Models for High-Fidelity ...2022-11-23Code
2VDM1396NoLatent Video Diffusion Models for High-Fidelity ...2022-11-23Code
3TGAN-v2 (128x128)1209NoLatent Video Diffusion Models for High-Fidelity ...2022-11-23Code
4MCVD (64x64)1143NoMCVD: Masked Conditional Video Diffusion for Pre...2022-05-19Code
5MoCoGAN-HD (256x256, unconditional)700NoA Good Image Generator Is What You Need for High...2021-04-30Code
6MagicVideo (256x256, text-conditional)699NoMagicVideo: Efficient Video Generation With Late...2022-11-20-
7TATS (256x256)635NoLong Video Generation with Time-Agnostic VQGAN a...2022-04-07Code
8DIGAN (128x128, unconditional)577NoGenerating Videos with Dynamics-aware Implicit G...2022-02-21Code
9LVDM (256x256, unconditional)552NoLatent Video Diffusion Models for High-Fidelity ...2022-11-23Code
10Video LDM (320x512, text-conditional)550.61NoAlign your Latents: High-Resolution Video Synthe...2023-04-18Code
11LAVIE (320x512, text-conditional)526.3NoLAVIE: High-Quality Video Generation with Cascad...2023-09-26Code
12DIGAN (128x128, class-conditional)465NoGenerating Videos with Dynamics-aware Implicit G...2022-02-21Code
13MeBT (128x128, unconditional)438NoTowards End-to-End Generative Modeling of Long V...2023-03-20Code
14TATS (128x128, unconditional)420NoLong Video Generation with Time-Agnostic VQGAN a...2022-04-07Code
15MMVG (128x128, unconditional)395NoTell Me What Happened: Unifying Text-guided Vide...2022-11-23Code
16LVDM (256x256, unconditional)372NoLatent Video Diffusion Models for High-Fidelity ...2022-11-23Code
17Make-A-Video (Zero-shot, 256x256, class-conditional)367.23NoMake-A-Video: Text-to-Video Generation without T...2022-09-29Code
18PYoCo (Zero-shot, 64x64, text-conditional)355.19NoPreserve Your Own Correlation: A Noise Prior for...2023-05-17-
19VideoPoet (text-conditional)355NoVideoPoet: A Large Language Model for Zero-Shot ...2023-12-21-
20VideoAssembler (Zero-shot, 256x256, class-conditional)346.84NoMagDiff: Multi-Alignment Diffusion for High-Fide...2023-11-29Code
21GridDiff (Zero-shot)340NoGrid Diffusion Models for Text-to-Video Generation2024-03-30-
22Lumiere (Zero-shot. 1024x1024, text-conditional)332.49NoLumiere: A Space-Time Diffusion Model for Video ...2024-01-23Code
23TATS (128x128, class-conditional)332NoLong Video Generation with Time-Agnostic VQGAN a...2022-04-07Code
24MMVG (128x128, class-conditional)328NoTell Me What Happened: Unifying Text-guided Vide...2022-11-23Code
25PYoCo (Zero-shot, 64x64, unconditional)310NoPreserve Your Own Correlation: A Noise Prior for...2023-05-17-
26CogVideo (128x128, class-conditional)305NoCogVideo: Large-scale Pretraining for Text-to-Vi...2022-05-29Code
27VIDM (256x256, unconditional)294.7NoVIDM: Video Implicit Diffusion Models2022-12-01Code
28Video-LaVIT280.57NoVideo-LaVIT: Unified Video-Language Pre-training...2024-02-05Code
29MAGVIT (AR)265NoMAGVIT: Masked Generative Video Transformer2022-12-10Code
30W.A.L.T 3B (text-conditional)258.1NoPhotorealistic Video Generation with Diffusion M...2023-12-11-
31PixelDance (256x256, text-conditional)242.82NoMake Pixels Dance: High-Dynamic Video Generation2023-11-18-
32VideoFusion (128x128, unconditional)220NoVideoFusion: Decomposed Diffusion Models for Hig...2023-03-15Code
33OmniTokenizer-AR191NoOmniTokenizer: A Joint Image-Video Tokenizer for...2024-06-13Code
34VideoFusion (128x128, class-conditional)173NoVideoFusion: Decomposed Diffusion Models for Hig...2023-03-15Code
35Latte + LeanVAE164.45NoLeanVAE: An Ultra-Efficient Reconstruction VAE f...2025-03-18Code
36REGIS-Fuse (Finetuning, 128x128, text-conditional)141No--Code
37MAGVIT-v2 (AR)109NoLanguage Model Beats Diffusion -- Tokenizer is K...2023-10-09Code
38ACDiT90NoACDiT: Interpolating Autoregressive Conditional ...2024-12-10Code
39Make-A-Video (Finetuning, 256x256, class-conditional)81.25NoMake-A-Video: Text-to-Video Generation without T...2022-09-29Code
40HPDM-L66.32NoHierarchical Patch Diffusion Models for High-Res...2024-06-12-
41LARP57NoLARP: Tokenizing Videos with a Learned Autoregre...2024-10-28Code
42FAR57NoLong-Context Autoregressive Video Modeling with ...2025-03-25Code
43Video-GPT53YesVideo-GPT via Next Clip Diffusion2025-05-18Code