| 1 | MCVD | 2460 | No | Latent Video Diffusion Models for High-Fidelity ... | 2022-11-23 | Code |
| 2 | VDM | 1396 | No | Latent Video Diffusion Models for High-Fidelity ... | 2022-11-23 | Code |
| 3 | TGAN-v2 (128x128) | 1209 | No | Latent Video Diffusion Models for High-Fidelity ... | 2022-11-23 | Code |
| 4 | MCVD (64x64) | 1143 | No | MCVD: Masked Conditional Video Diffusion for Pre... | 2022-05-19 | Code |
| 5 | MoCoGAN-HD (256x256, unconditional) | 700 | No | A Good Image Generator Is What You Need for High... | 2021-04-30 | Code |
| 6 | MagicVideo (256x256, text-conditional) | 699 | No | MagicVideo: Efficient Video Generation With Late... | 2022-11-20 | - |
| 7 | TATS (256x256) | 635 | No | Long Video Generation with Time-Agnostic VQGAN a... | 2022-04-07 | Code |
| 8 | DIGAN (128x128, unconditional) | 577 | No | Generating Videos with Dynamics-aware Implicit G... | 2022-02-21 | Code |
| 9 | LVDM (256x256, unconditional) | 552 | No | Latent Video Diffusion Models for High-Fidelity ... | 2022-11-23 | Code |
| 10 | Video LDM (320x512, text-conditional) | 550.61 | No | Align your Latents: High-Resolution Video Synthe... | 2023-04-18 | Code |
| 11 | LAVIE (320x512, text-conditional) | 526.3 | No | LAVIE: High-Quality Video Generation with Cascad... | 2023-09-26 | Code |
| 12 | DIGAN (128x128, class-conditional) | 465 | No | Generating Videos with Dynamics-aware Implicit G... | 2022-02-21 | Code |
| 13 | MeBT (128x128, unconditional) | 438 | No | Towards End-to-End Generative Modeling of Long V... | 2023-03-20 | Code |
| 14 | TATS (128x128, unconditional) | 420 | No | Long Video Generation with Time-Agnostic VQGAN a... | 2022-04-07 | Code |
| 15 | MMVG (128x128, unconditional) | 395 | No | Tell Me What Happened: Unifying Text-guided Vide... | 2022-11-23 | Code |
| 16 | LVDM (256x256, unconditional) | 372 | No | Latent Video Diffusion Models for High-Fidelity ... | 2022-11-23 | Code |
| 17 | Make-A-Video (Zero-shot, 256x256, class-conditional) | 367.23 | No | Make-A-Video: Text-to-Video Generation without T... | 2022-09-29 | Code |
| 18 | PYoCo (Zero-shot, 64x64, text-conditional) | 355.19 | No | Preserve Your Own Correlation: A Noise Prior for... | 2023-05-17 | - |
| 19 | VideoPoet (text-conditional) | 355 | No | VideoPoet: A Large Language Model for Zero-Shot ... | 2023-12-21 | - |
| 20 | VideoAssembler (Zero-shot, 256x256, class-conditional) | 346.84 | No | MagDiff: Multi-Alignment Diffusion for High-Fide... | 2023-11-29 | Code |
| 21 | GridDiff (Zero-shot) | 340 | No | Grid Diffusion Models for Text-to-Video Generation | 2024-03-30 | - |
| 22 | Lumiere (Zero-shot. 1024x1024, text-conditional) | 332.49 | No | Lumiere: A Space-Time Diffusion Model for Video ... | 2024-01-23 | Code |
| 23 | TATS (128x128, class-conditional) | 332 | No | Long Video Generation with Time-Agnostic VQGAN a... | 2022-04-07 | Code |
| 24 | MMVG (128x128, class-conditional) | 328 | No | Tell Me What Happened: Unifying Text-guided Vide... | 2022-11-23 | Code |
| 25 | PYoCo (Zero-shot, 64x64, unconditional) | 310 | No | Preserve Your Own Correlation: A Noise Prior for... | 2023-05-17 | - |
| 26 | CogVideo (128x128, class-conditional) | 305 | No | CogVideo: Large-scale Pretraining for Text-to-Vi... | 2022-05-29 | Code |
| 27 | VIDM (256x256, unconditional) | 294.7 | No | VIDM: Video Implicit Diffusion Models | 2022-12-01 | Code |
| 28 | Video-LaVIT | 280.57 | No | Video-LaVIT: Unified Video-Language Pre-training... | 2024-02-05 | Code |
| 29 | MAGVIT (AR) | 265 | No | MAGVIT: Masked Generative Video Transformer | 2022-12-10 | Code |
| 30 | W.A.L.T 3B (text-conditional) | 258.1 | No | Photorealistic Video Generation with Diffusion M... | 2023-12-11 | - |
| 31 | PixelDance (256x256, text-conditional) | 242.82 | No | Make Pixels Dance: High-Dynamic Video Generation | 2023-11-18 | - |
| 32 | VideoFusion (128x128, unconditional) | 220 | No | VideoFusion: Decomposed Diffusion Models for Hig... | 2023-03-15 | Code |
| 33 | OmniTokenizer-AR | 191 | No | OmniTokenizer: A Joint Image-Video Tokenizer for... | 2024-06-13 | Code |
| 34 | VideoFusion (128x128, class-conditional) | 173 | No | VideoFusion: Decomposed Diffusion Models for Hig... | 2023-03-15 | Code |
| 35 | Latte + LeanVAE | 164.45 | No | LeanVAE: An Ultra-Efficient Reconstruction VAE f... | 2025-03-18 | Code |
| 36 | REGIS-Fuse (Finetuning, 128x128, text-conditional) | 141 | No | - | - | Code |
| 37 | MAGVIT-v2 (AR) | 109 | No | Language Model Beats Diffusion -- Tokenizer is K... | 2023-10-09 | Code |
| 38 | ACDiT | 90 | No | ACDiT: Interpolating Autoregressive Conditional ... | 2024-12-10 | Code |
| 39 | Make-A-Video (Finetuning, 256x256, class-conditional) | 81.25 | No | Make-A-Video: Text-to-Video Generation without T... | 2022-09-29 | Code |
| 40 | HPDM-L | 66.32 | No | Hierarchical Patch Diffusion Models for High-Res... | 2024-06-12 | - |
| 41 | LARP | 57 | No | LARP: Tokenizing Videos with a Learned Autoregre... | 2024-10-28 | Code |
| 42 | FAR | 57 | No | Long-Context Autoregressive Video Modeling with ... | 2025-03-25 | Code |
| 43 | Video-GPT | 53 | Yes | Video-GPT via Next Clip Diffusion | 2025-05-18 | Code |