TasksSotADatasetsPapersMethodsSubmitAbout
Papers With Code 2

A community resource for machine learning research: papers, code, benchmarks, and state-of-the-art results.

Explore

Notable BenchmarksAll SotADatasetsPapersMethods

Community

Submit ResultsAbout

Data sourced from the PWC Archive (CC-BY-SA 4.0). Built by the community, for the community.

Papers/LEO: Generative Latent Image Animator for Human Video Synt...

LEO: Generative Latent Image Animator for Human Video Synthesis

Yaohui Wang, Xin Ma, Xinyuan Chen, Cunjian Chen, Antitza Dantcheva, Bo Dai, Yu Qiao

2023-05-06Video EditingDisentanglement
PaperPDFCodeCodeCodeCode(official)Code

Abstract

Spatio-temporal coherency is a major challenge in synthesizing high quality videos, particularly in synthesizing human videos that contain rich global and local deformations. To resolve this challenge, previous approaches have resorted to different features in the generation process aimed at representing appearance and motion. However, in the absence of strict mechanisms to guarantee such disentanglement, a separation of motion from appearance has remained challenging, resulting in spatial distortions and temporal jittering that break the spatio-temporal coherency. Motivated by this, we here propose LEO, a novel framework for human video synthesis, placing emphasis on spatio-temporal coherency. Our key idea is to represent motion as a sequence of flow maps in the generation process, which inherently isolate motion from appearance. We implement this idea via a flow-based image animator and a Latent Motion Diffusion Model (LMDM). The former bridges a space of motion codes with the space of flow maps, and synthesizes video frames in a warp-and-inpaint manner. LMDM learns to capture motion prior in the training data by synthesizing sequences of motion codes. Extensive quantitative and qualitative analysis suggests that LEO significantly improves coherent synthesis of human videos over previous methods on the datasets TaichiHD, FaceForensics and CelebV-HQ. In addition, the effective disentanglement of appearance and motion in LEO allows for two additional tasks, namely infinite-length human video synthesis, as well as content-preserving video editing.

Related Papers

CSD-VAR: Content-Style Decomposition in Visual Autoregressive Models2025-07-18Towards Imperceptible JPEG Image Hiding: Multi-range Representations-driven Adversarial Stego Generation2025-07-11Generative Head-Mounted Camera Captures for Photorealistic Avatars2025-07-08Reflections Unlock: Geometry-Aware Reflection Disentanglement in 3D Gaussian Splatting for Photorealistic Scenes Rendering2025-07-08Bridging Domain Generalization to Multimodal Domain Generalization via Unified Representations2025-07-04Causal-SAM-LLM: Large Language Models as Causal Reasoners for Robust Medical Segmentation2025-07-04Prompt Disentanglement via Language Guidance and Representation Alignment for Domain Generalization2025-07-03SemFaceEdit: Semantic Face Editing on Generative Radiance Manifolds2025-06-28