Metric: CLIP-FID (lower is better)
| # | Model↕ | CLIP-FID▲ | Extra Data | Paper | Date↕ | Code |
|---|---|---|---|---|---|---|
| 1 | Snap Video (288×288) | 8.48 | No | Snap Video: Scaled Spatiotemporal Transformers f... | 2024-02-22 | - |
| 2 | Snap Video (512x288) | 9.35 | No | Snap Video: Scaled Spatiotemporal Transformers f... | 2024-02-22 | - |
| 3 | Make-A-Video | 13.17 | No | Make-A-Video: Text-to-Video Generation without T... | 2022-09-29 | Code |
| 4 | CogVideo (English) | 23.59 | No | Make-A-Video: Text-to-Video Generation without T... | 2022-09-29 | Code |
| 5 | CogVideo (Chinese) | 24.78 | No | Align your Latents: High-Resolution Video Synthe... | 2023-04-18 | Code |
| 6 | NUWA | 47.68 | No | NÜWA: Visual Synthesis Pre-training for Neural v... | 2021-11-24 | Code |