TasksSotADatasetsPapersMethodsSubmitAbout
Papers With Code 2

A community resource for machine learning research: papers, code, benchmarks, and state-of-the-art results.

Explore

Notable BenchmarksAll SotADatasetsPapersMethods

Community

Submit ResultsAbout

Data sourced from the PWC Archive (CC-BY-SA 4.0). Built by the community, for the community.

Papers/MotionDiffuse: Text-Driven Human Motion Generation with Di...

MotionDiffuse: Text-Driven Human Motion Generation with Diffusion Model

Mingyuan Zhang, Zhongang Cai, Liang Pan, Fangzhou Hong, Xinying Guo, Lei Yang, Ziwei Liu

2022-08-31DenoisingMotion GenerationMotion Synthesis
PaperPDFCodeCode(official)

Abstract

Human motion modeling is important for many modern graphics applications, which typically require professional skills. In order to remove the skill barriers for laymen, recent motion generation methods can directly generate human motions conditioned on natural languages. However, it remains challenging to achieve diverse and fine-grained motion generation with various text inputs. To address this problem, we propose MotionDiffuse, the first diffusion model-based text-driven motion generation framework, which demonstrates several desired properties over existing methods. 1) Probabilistic Mapping. Instead of a deterministic language-motion mapping, MotionDiffuse generates motions through a series of denoising steps in which variations are injected. 2) Realistic Synthesis. MotionDiffuse excels at modeling complicated data distribution and generating vivid motion sequences. 3) Multi-Level Manipulation. MotionDiffuse responds to fine-grained instructions on body parts, and arbitrary-length motion synthesis with time-varied text prompts. Our experiments show MotionDiffuse outperforms existing SoTA methods by convincing margins on text-driven motion generation and action-conditioned motion generation. A qualitative analysis further demonstrates MotionDiffuse's controllability for comprehensive motion generation. Homepage: https://mingyuan-zhang.github.io/projects/MotionDiffuse.html

Results

TaskDatasetMetricValueModel
Pose TrackingHumanML3DDiversity9.41MotionDiffuse
Pose TrackingHumanML3DFID0.63MotionDiffuse
Pose TrackingHumanML3DMultimodality1.553MotionDiffuse
Pose TrackingHumanML3DR Precision Top30.782MotionDiffuse
Pose TrackingKIT Motion-LanguageDiversity11.1MotionDiffuse
Pose TrackingKIT Motion-LanguageFID1.954MotionDiffuse
Pose TrackingKIT Motion-LanguageMultimodality0.73MotionDiffuse
Pose TrackingKIT Motion-LanguageR Precision Top30.739MotionDiffuse
Motion SynthesisHumanML3DDiversity9.41MotionDiffuse
Motion SynthesisHumanML3DFID0.63MotionDiffuse
Motion SynthesisHumanML3DMultimodality1.553MotionDiffuse
Motion SynthesisHumanML3DR Precision Top30.782MotionDiffuse
Motion SynthesisKIT Motion-LanguageDiversity11.1MotionDiffuse
Motion SynthesisKIT Motion-LanguageFID1.954MotionDiffuse
Motion SynthesisKIT Motion-LanguageMultimodality0.73MotionDiffuse
Motion SynthesisKIT Motion-LanguageR Precision Top30.739MotionDiffuse
10-shot image generationHumanML3DDiversity9.41MotionDiffuse
10-shot image generationHumanML3DFID0.63MotionDiffuse
10-shot image generationHumanML3DMultimodality1.553MotionDiffuse
10-shot image generationHumanML3DR Precision Top30.782MotionDiffuse
10-shot image generationKIT Motion-LanguageDiversity11.1MotionDiffuse
10-shot image generationKIT Motion-LanguageFID1.954MotionDiffuse
10-shot image generationKIT Motion-LanguageMultimodality0.73MotionDiffuse
10-shot image generationKIT Motion-LanguageR Precision Top30.739MotionDiffuse
3D Human Pose TrackingHumanML3DDiversity9.41MotionDiffuse
3D Human Pose TrackingHumanML3DFID0.63MotionDiffuse
3D Human Pose TrackingHumanML3DMultimodality1.553MotionDiffuse
3D Human Pose TrackingHumanML3DR Precision Top30.782MotionDiffuse
3D Human Pose TrackingKIT Motion-LanguageDiversity11.1MotionDiffuse
3D Human Pose TrackingKIT Motion-LanguageFID1.954MotionDiffuse
3D Human Pose TrackingKIT Motion-LanguageMultimodality0.73MotionDiffuse
3D Human Pose TrackingKIT Motion-LanguageR Precision Top30.739MotionDiffuse

Related Papers

fastWDM3D: Fast and Accurate 3D Healthy Tissue Inpainting2025-07-17Diffuman4D: 4D Consistent Human View Synthesis from Sparse-View Videos with Spatio-Temporal Diffusion Models2025-07-17Similarity-Guided Diffusion for Contrastive Sequential Recommendation2025-07-16HUG-VAS: A Hierarchical NURBS-Based Generative Model for Aortic Geometry Synthesis and Controllable Editing2025-07-15AirLLM: Diffusion Policy-based Adaptive LoRA for Remote Fine-Tuning of LLM over the Air2025-07-15SnapMoGen: Human Motion Generation from Expressive Texts2025-07-12A statistical physics framework for optimal learning2025-07-10Go to Zero: Towards Zero-shot Motion Generation with Million-scale Data2025-07-09