TasksSotADatasetsPapersMethodsSubmitAbout
Papers With Code 2

A community resource for machine learning research: papers, code, benchmarks, and state-of-the-art results.

Explore

Notable BenchmarksAll SotADatasetsPapersMethods

Community

Submit ResultsAbout

Data sourced from the PWC Archive (CC-BY-SA 4.0). Built by the community, for the community.

Papers/VideoFusion: Decomposed Diffusion Models for High-Quality ...

VideoFusion: Decomposed Diffusion Models for High-Quality Video Generation

Zhengxiong Luo, Dayou Chen, Yingya Zhang, Yan Huang, Liang Wang, Yujun Shen, Deli Zhao, Jingren Zhou, Tieniu Tan

2023-03-15CVPR 2023 1DenoisingText-to-Video GenerationVocal Bursts Intensity PredictionCode GenerationImage GenerationVideo Generation
PaperPDFCode(official)Code

Abstract

A diffusion probabilistic model (DPM), which constructs a forward diffusion process by gradually adding noise to data points and learns the reverse denoising process to generate new samples, has been shown to handle complex data distribution. Despite its recent success in image synthesis, applying DPMs to video generation is still challenging due to high-dimensional data spaces. Previous methods usually adopt a standard diffusion process, where frames in the same video clip are destroyed with independent noises, ignoring the content redundancy and temporal correlation. This work presents a decomposed diffusion process via resolving the per-frame noise into a base noise that is shared among all frames and a residual noise that varies along the time axis. The denoising pipeline employs two jointly-learned networks to match the noise decomposition accordingly. Experiments on various datasets confirm that our approach, termed as VideoFusion, surpasses both GAN-based and diffusion-based alternatives in high-quality video generation. We further show that our decomposed formulation can benefit from pre-trained image diffusion models and well-support text-conditioned video creation.

Results

TaskDatasetMetricValueModel
VideoUCF-101FVD16173VideoFusion (128x128, class-conditional)
VideoUCF-101Inception Score80.03VideoFusion (128x128, class-conditional)
VideoUCF-101FVD16220VideoFusion (128x128, unconditional)
VideoUCF-101Inception Score72.22VideoFusion (128x128, unconditional)
Video GenerationUCF-101FVD16173VideoFusion (128x128, class-conditional)
Video GenerationUCF-101Inception Score80.03VideoFusion (128x128, class-conditional)
Video GenerationUCF-101FVD16220VideoFusion (128x128, unconditional)
Video GenerationUCF-101Inception Score72.22VideoFusion (128x128, unconditional)

Related Papers

CUDA-L1: Improving CUDA Optimization via Contrastive Reinforcement Learning2025-07-18fastWDM3D: Fast and Accurate 3D Healthy Tissue Inpainting2025-07-17Diffuman4D: 4D Consistent Human View Synthesis from Sparse-View Videos with Spatio-Temporal Diffusion Models2025-07-17LoViC: Efficient Long Video Generation with Context Compression2025-07-17Towards Formal Verification of LLM-Generated Code from Natural Language Prompts2025-07-17Synthesizing Reality: Leveraging the Generative AI-Powered Platform Midjourney for Construction Worker Detection2025-07-17FashionPose: Text to Pose to Relight Image Generation for Personalized Fashion Visualization2025-07-17A Distributed Generative AI Approach for Heterogeneous Multi-Domain Environments under Data Sharing constraints2025-07-17