TasksSotADatasetsPapersMethodsSubmitAbout
Papers With Code 2

A community resource for machine learning research: papers, code, benchmarks, and state-of-the-art results.

Explore

Notable BenchmarksAll SotADatasetsPapersMethods

Community

Submit ResultsAbout

Data sourced from the PWC Archive (CC-BY-SA 4.0). Built by the community, for the community.

Papers/VideoComposer: Compositional Video Synthesis with Motion C...

VideoComposer: Compositional Video Synthesis with Motion Controllability

Xiang Wang, Hangjie Yuan, Shiwei Zhang, Dayou Chen, Jiuniu Wang, Yingya Zhang, Yujun Shen, Deli Zhao, Jingren Zhou

2023-06-03NeurIPS 2023 11Text-to-Video GenerationImage Generation
PaperPDFCodeCodeCodeCode

Abstract

The pursuit of controllability as a higher standard of visual content creation has yielded remarkable progress in customizable image synthesis. However, achieving controllable video synthesis remains challenging due to the large variation of temporal dynamics and the requirement of cross-frame temporal consistency. Based on the paradigm of compositional generation, this work presents VideoComposer that allows users to flexibly compose a video with textual conditions, spatial conditions, and more importantly temporal conditions. Specifically, considering the characteristic of video data, we introduce the motion vector from compressed videos as an explicit control signal to provide guidance regarding temporal dynamics. In addition, we develop a Spatio-Temporal Condition encoder (STC-encoder) that serves as a unified interface to effectively incorporate the spatial and temporal relations of sequential inputs, with which the model could make better use of temporal conditions and hence achieve higher inter-frame consistency. Extensive experimental results suggest that VideoComposer is able to control the spatial and temporal patterns simultaneously within a synthesized video in various forms, such as text description, sketch sequence, reference video, or even simply hand-crafted motions. The code and models will be publicly available at https://videocomposer.github.io.

Results

TaskDatasetMetricValueModel
Text-to-Video GenerationEvalCrafter Text-to-Video (ECTV) DatasetMotion Quality53.09ModelScope
Text-to-Video GenerationEvalCrafter Text-to-Video (ECTV) DatasetTemporal Consistency54.46ModelScope
Text-to-Video GenerationEvalCrafter Text-to-Video (ECTV) DatasetText-to-Video Alignment57.8ModelScope
Text-to-Video GenerationEvalCrafter Text-to-Video (ECTV) DatasetTotal Score218ModelScope
Text-to-Video GenerationEvalCrafter Text-to-Video (ECTV) DatasetVisual Quality52.47ModelScope
Text-to-Video GenerationMSR-VTTCLIPSIM0.2932VideoComposer
Text-to-Video GenerationMSR-VTTFVD580VideoComposer

Related Papers

LoViC: Efficient Long Video Generation with Context Compression2025-07-17fastWDM3D: Fast and Accurate 3D Healthy Tissue Inpainting2025-07-17Synthesizing Reality: Leveraging the Generative AI-Powered Platform Midjourney for Construction Worker Detection2025-07-17FashionPose: Text to Pose to Relight Image Generation for Personalized Fashion Visualization2025-07-17A Distributed Generative AI Approach for Heterogeneous Multi-Domain Environments under Data Sharing constraints2025-07-17Pixel Perfect MegaMed: A Megapixel-Scale Vision-Language Foundation Model for Generating High Resolution Medical Images2025-07-17FADE: Adversarial Concept Erasure in Flow Models2025-07-16CharaConsist: Fine-Grained Consistent Character Generation2025-07-15