Make Pixels Dance: High-Dynamic Video Generation

Yan Zeng, Guoqiang Wei, Jiani Zheng, Jiaxin Zou, Yang Wei, Yuchen Zhang, Hang Li

2023-11-18CVPR 2024 1Text-to-Video Generation Video Generation

Abstract

Creating high-dynamic videos such as motion-rich actions and sophisticated visual effects poses a significant challenge in the field of artificial intelligence. Unfortunately, current state-of-the-art video generation methods, primarily focusing on text-to-video generation, tend to produce video clips with minimal motions despite maintaining high fidelity. We argue that relying solely on text instructions is insufficient and suboptimal for video generation. In this paper, we introduce PixelDance, a novel approach based on diffusion models that incorporates image instructions for both the first and last frames in conjunction with text instructions for video generation. Comprehensive experimental results demonstrate that PixelDance trained with public data exhibits significantly better proficiency in synthesizing videos with complex scenes and intricate motions, setting a new standard for video generation.

Results

Task	Dataset	Metric	Value	Model
Video	UCF-101	FVD16	242.82	PixelDance (256x256, text-conditional)
Video	UCF-101	Inception Score	42.1	PixelDance (256x256, text-conditional)
Video Generation	UCF-101	FVD16	242.82	PixelDance (256x256, text-conditional)
Video Generation	UCF-101	Inception Score	42.1	PixelDance (256x256, text-conditional)
Text-to-Video Generation	UCF-101	FVD16	242.82	PixelDance (Zero-shot, 256x256)
Text-to-Video Generation	MSR-VTT	CLIPSIM	0.3125	PixelDance
Text-to-Video Generation	MSR-VTT	FVD	381	PixelDance

Make Pixels Dance: High-Dynamic Video Generation

Abstract

Results

Related Papers

Make Pixels Dance: High-Dynamic Video Generation

Abstract

Results

Related Papers