Metric: FAD (higher is better)
| # | Model↕ | FAD▼ | Extra Data | Paper | Date↕ | Code |
|---|---|---|---|---|---|---|
| 1 | VATT-LLama | 2.38 | No | Tell What You Hear From What You See -- Video to... | 2024-11-08 | Code |
| 2 | ReWas | 2.16 | No | Read, Watch and Scream! Sound Generation from Te... | 2024-07-08 | Code |
| 3 | MaskVAT_Hybrid | 2.04 | No | Masked Generative Video-to-Audio Transformers wi... | 2024-07-15 | - |
| 4 | V-AURA | 1.92 | No | Temporally Aligned Audio for Video with Autoregr... | 2024-09-20 | Code |
| 5 | Frieren | 1.32 | No | Frieren: Efficient Video-to-Audio Generation Net... | 2024-06-01 | Code |
| 6 | MMAudio-L-44.1kHz | 0.97 | No | MMAudio: Taming Multimodal Joint Training for Hi... | 2024-12-19 | Code |
| 7 | V2A-Mapper | 0.841 | No | V2A-Mapper: A Lightweight Solution for Vision-to... | 2023-08-18 | Code |
| 8 | MMAudio-S-16kHz | 0.79 | No | MMAudio: Taming Multimodal Joint Training for Hi... | 2024-12-19 | Code |