TasksSotADatasetsPapersMethodsSubmitAbout
Papers With Code 2

A community resource for machine learning research: papers, code, benchmarks, and state-of-the-art results.

Explore

Notable BenchmarksAll SotADatasetsPapersMethods

Community

Submit ResultsAbout

Data sourced from the PWC Archive (CC-BY-SA 4.0). Built by the community, for the community.

Papers/Imagen Video: High Definition Video Generation with Diffus...

Imagen Video: High Definition Video Generation with Diffusion Models

Jonathan Ho, William Chan, Chitwan Saharia, Jay Whang, Ruiqi Gao, Alexey Gritsenko, Diederik P. Kingma, Ben Poole, Mohammad Norouzi, David J. Fleet, Tim Salimans

2022-10-05Super-ResolutionVocal Bursts Intensity PredictionVideo Super-ResolutionWorld KnowledgeImage GenerationVideo Generation
PaperPDF

Abstract

We present Imagen Video, a text-conditional video generation system based on a cascade of video diffusion models. Given a text prompt, Imagen Video generates high definition videos using a base video generation model and a sequence of interleaved spatial and temporal video super-resolution models. We describe how we scale up the system as a high definition text-to-video model including design decisions such as the choice of fully-convolutional temporal and spatial super-resolution models at certain resolutions, and the choice of the v-parameterization of diffusion models. In addition, we confirm and transfer findings from previous work on diffusion-based image generation to the video generation setting. Finally, we apply progressive distillation to our video models with classifier-free guidance for fast, high quality sampling. We find Imagen Video not only capable of generating videos of high fidelity, but also having a high degree of controllability and world knowledge, including the ability to generate diverse videos and text animations in various artistic styles and with 3D object understanding. See https://imagen.research.google/video/ for samples.

Results

TaskDatasetMetricValueModel
VideoLAION-400MCLIP25.19Imagen original (constant=6)
VideoLAION-400MCLIP R-Precision92.12Imagen original (constant=6)
VideoLAION-400MCLIP R-Precision90.97Imagen fully distilled (oscillate (15,1))
VideoLAION-400MCLIP25.29Imagen distilled (constant=6)
VideoLAION-400MCLIP R-Precision90.88Imagen distilled (constant=6)
VideoLAION-400MCLIP25.03Imagen original (oscillate(15,1))
VideoLAION-400MCLIP R-Precision89.91Imagen original (oscillate(15,1))
VideoLAION-400MCLIP R-Precision89.68Imagen fully distilled (constant=6)
VideoLAION-400MCLIP25.12Imagen distilled (oscillate (15,1))
VideoLAION-400MCLIP R-Precision88.78Imagen distilled (oscillate (15,1))
Video GenerationLAION-400MCLIP25.19Imagen original (constant=6)
Video GenerationLAION-400MCLIP R-Precision92.12Imagen original (constant=6)
Video GenerationLAION-400MCLIP R-Precision90.97Imagen fully distilled (oscillate (15,1))
Video GenerationLAION-400MCLIP25.29Imagen distilled (constant=6)
Video GenerationLAION-400MCLIP R-Precision90.88Imagen distilled (constant=6)
Video GenerationLAION-400MCLIP25.03Imagen original (oscillate(15,1))
Video GenerationLAION-400MCLIP R-Precision89.91Imagen original (oscillate(15,1))
Video GenerationLAION-400MCLIP R-Precision89.68Imagen fully distilled (constant=6)
Video GenerationLAION-400MCLIP25.12Imagen distilled (oscillate (15,1))
Video GenerationLAION-400MCLIP R-Precision88.78Imagen distilled (oscillate (15,1))

Related Papers

SpectraLift: Physics-Guided Spectral-Inversion Network for Self-Supervised Hyperspectral Image Super-Resolution2025-07-17HRSeg: High-Resolution Visual Perception and Enhancement for Reasoning Segmentation2025-07-17Comparing Apples to Oranges: A Dataset & Analysis of LLM Humour Understanding from Traditional Puns to Topical Jokes2025-07-17fastWDM3D: Fast and Accurate 3D Healthy Tissue Inpainting2025-07-17Synthesizing Reality: Leveraging the Generative AI-Powered Platform Midjourney for Construction Worker Detection2025-07-17FashionPose: Text to Pose to Relight Image Generation for Personalized Fashion Visualization2025-07-17A Distributed Generative AI Approach for Heterogeneous Multi-Domain Environments under Data Sharing constraints2025-07-17Pixel Perfect MegaMed: A Megapixel-Scale Vision-Language Foundation Model for Generating High Resolution Medical Images2025-07-17