Metric: Inception Score (higher is better)
| # | Model↕ | Inception Score▼ | Extra Data | Paper | Date↕ | Code |
|---|---|---|---|---|---|---|
| 1 | HPDM-L | 87.68 | No | Hierarchical Patch Diffusion Models for High-Res... | 2024-06-12 | - |
| 2 | Make-A-Video (Finetuning, 256x256, class-conditional) | 82.55 | No | Make-A-Video: Text-to-Video Generation without T... | 2022-09-29 | Code |
| 3 | VideoFusion (128x128, class-conditional) | 80.03 | No | VideoFusion: Decomposed Diffusion Models for Hig... | 2023-03-15 | Code |
| 4 | TATS (128x128, class-conditional) | 79.28 | No | Long Video Generation with Time-Agnostic VQGAN a... | 2022-04-07 | Code |
| 5 | FIFO-Diffusion | 74.44 | No | FIFO-Diffusion: Generating Infinite Videos from ... | 2024-05-19 | Code |
| 6 | MMVG (128x128, class-conditional) | 73.7 | No | Tell Me What Happened: Unifying Text-guided Vide... | 2022-11-23 | Code |
| 7 | VideoFusion (128x128, unconditional) | 72.22 | No | VideoFusion: Decomposed Diffusion Models for Hig... | 2023-03-15 | Code |
| 8 | MeBT (128x128, unconditional) | 65.93 | No | Towards End-to-End Generative Modeling of Long V... | 2023-03-20 | Code |
| 9 | GridDiff (Zero-shot) | 62.88 | No | Grid Diffusion Models for Text-to-Video Generation | 2024-03-30 | - |
| 10 | PYoCo (Zero-shot, 64x64, unconditional) | 60.01 | No | Preserve Your Own Correlation: A Noise Prior for... | 2023-05-17 | - |
| 11 | DIGAN (128x128, class-conditional) | 59.68 | No | Generating Videos with Dynamics-aware Implicit G... | 2022-02-21 | Code |
| 12 | MMVG (128x128, unconditional) | 58.3 | No | Tell Me What Happened: Unifying Text-guided Vide... | 2022-11-23 | Code |
| 13 | TATS (128x128, unconditional) | 57.63 | No | Long Video Generation with Time-Agnostic VQGAN a... | 2022-04-07 | Code |
| 14 | CogVideo (128x128, class-conditional) | 51.11 | No | CogVideo: Large-scale Pretraining for Text-to-Vi... | 2022-05-29 | Code |
| 15 | VideoAssembler (Zero-shot, 256x256, class-conditional) | 48.01 | No | MagDiff: Multi-Alignment Diffusion for High-Fide... | 2023-11-29 | Code |
| 16 | PYoCo (Zero-shot, 64x64, text-conditional) | 47.76 | No | Preserve Your Own Correlation: A Noise Prior for... | 2023-05-17 | - |
| 17 | Video-LaVIT | 44.26 | No | Video-LaVIT: Unified Video-Language Pre-training... | 2024-02-05 | Code |
| 18 | PixelDance (256x256, text-conditional) | 42.1 | No | Make Pixels Dance: High-Dynamic Video Generation | 2023-11-18 | - |
| 19 | VideoPoet (text-conditional) | 38.44 | No | VideoPoet: A Large Language Model for Zero-Shot ... | 2023-12-21 | - |
| 20 | Lumiere (Zero-shot. 1024x1024, text-conditional) | 37.54 | No | Lumiere: A Space-Time Diffusion Model for Video ... | 2024-01-23 | Code |
| 21 | W.A.L.T 3B (text-conditional) | 35.1 | No | Photorealistic Video Generation with Diffusion M... | 2023-12-11 | - |
| 22 | MoCoGAN-HD (256x256, unconditional) | 33.95 | No | A Good Image Generator Is What You Need for High... | 2021-04-30 | Code |
| 23 | Video LDM (320x512, text-conditional) | 33.45 | No | Align your Latents: High-Resolution Video Synthe... | 2023-04-18 | Code |
| 24 | Make-A-Video (Zero-shot, 256x256, class-conditional) | 33 | No | Make-A-Video: Text-to-Video Generation without T... | 2022-09-29 | Code |
| 25 | DIGAN (128x128, unconditional) | 32.7 | No | Generating Videos with Dynamics-aware Implicit G... | 2022-02-21 | Code |