TasksSotADatasetsPapersMethodsSubmitAbout
Papers With Code 2

A community resource for machine learning research: papers, code, benchmarks, and state-of-the-art results.

Explore

Notable BenchmarksAll SotADatasetsPapersMethods

Community

Submit ResultsAbout

Data sourced from the PWC Archive (CC-BY-SA 4.0). Built by the community, for the community.

Papers/FineMoGen: Fine-Grained Spatio-Temporal Motion Generation ...

FineMoGen: Fine-Grained Spatio-Temporal Motion Generation and Editing

Mingyuan Zhang, Huirong Li, Zhongang Cai, Jiawei Ren, Lei Yang, Ziwei Liu

2023-12-22NeurIPS 2023 11Motion GenerationMotion Synthesis
PaperPDFCode(official)

Abstract

Text-driven motion generation has achieved substantial progress with the emergence of diffusion models. However, existing methods still struggle to generate complex motion sequences that correspond to fine-grained descriptions, depicting detailed and accurate spatio-temporal actions. This lack of fine controllability limits the usage of motion generation to a larger audience. To tackle these challenges, we present FineMoGen, a diffusion-based motion generation and editing framework that can synthesize fine-grained motions, with spatial-temporal composition to the user instructions. Specifically, FineMoGen builds upon diffusion model with a novel transformer architecture dubbed Spatio-Temporal Mixture Attention (SAMI). SAMI optimizes the generation of the global attention template from two perspectives: 1) explicitly modeling the constraints of spatio-temporal composition; and 2) utilizing sparsely-activated mixture-of-experts to adaptively extract fine-grained features. To facilitate a large-scale study on this new fine-grained motion generation task, we contribute the HuMMan-MoGen dataset, which consists of 2,968 videos and 102,336 fine-grained spatio-temporal descriptions. Extensive experiments validate that FineMoGen exhibits superior motion generation quality over state-of-the-art methods. Notably, FineMoGen further enables zero-shot motion editing capabilities with the aid of modern large language models (LLM), which faithfully manipulates motion sequences with fine-grained instructions. Project Page: https://mingyuan-zhang.github.io/projects/FineMoGen.html

Results

TaskDatasetMetricValueModel
Pose TrackingHumanML3DDiversity9.263FineMoGen
Pose TrackingHumanML3DFID0.151FineMoGen
Pose TrackingHumanML3DMultimodality2.696FineMoGen
Pose TrackingHumanML3DR Precision Top30.784FineMoGen
Pose TrackingKIT Motion-LanguageDiversity10.85FineMoGen
Pose TrackingKIT Motion-LanguageFID0.178FineMoGen
Pose TrackingKIT Motion-LanguageMultimodality1.877FineMoGen
Pose TrackingKIT Motion-LanguageR Precision Top30.772FineMoGen
Motion SynthesisHumanML3DDiversity9.263FineMoGen
Motion SynthesisHumanML3DFID0.151FineMoGen
Motion SynthesisHumanML3DMultimodality2.696FineMoGen
Motion SynthesisHumanML3DR Precision Top30.784FineMoGen
Motion SynthesisKIT Motion-LanguageDiversity10.85FineMoGen
Motion SynthesisKIT Motion-LanguageFID0.178FineMoGen
Motion SynthesisKIT Motion-LanguageMultimodality1.877FineMoGen
Motion SynthesisKIT Motion-LanguageR Precision Top30.772FineMoGen
10-shot image generationHumanML3DDiversity9.263FineMoGen
10-shot image generationHumanML3DFID0.151FineMoGen
10-shot image generationHumanML3DMultimodality2.696FineMoGen
10-shot image generationHumanML3DR Precision Top30.784FineMoGen
10-shot image generationKIT Motion-LanguageDiversity10.85FineMoGen
10-shot image generationKIT Motion-LanguageFID0.178FineMoGen
10-shot image generationKIT Motion-LanguageMultimodality1.877FineMoGen
10-shot image generationKIT Motion-LanguageR Precision Top30.772FineMoGen
3D Human Pose TrackingHumanML3DDiversity9.263FineMoGen
3D Human Pose TrackingHumanML3DFID0.151FineMoGen
3D Human Pose TrackingHumanML3DMultimodality2.696FineMoGen
3D Human Pose TrackingHumanML3DR Precision Top30.784FineMoGen
3D Human Pose TrackingKIT Motion-LanguageDiversity10.85FineMoGen
3D Human Pose TrackingKIT Motion-LanguageFID0.178FineMoGen
3D Human Pose TrackingKIT Motion-LanguageMultimodality1.877FineMoGen
3D Human Pose TrackingKIT Motion-LanguageR Precision Top30.772FineMoGen

Related Papers

SnapMoGen: Human Motion Generation from Expressive Texts2025-07-12Go to Zero: Towards Zero-shot Motion Generation with Million-scale Data2025-07-09Motion Generation: A Survey of Generative Approaches and Benchmarks2025-07-07DeepGesture: A conversational gesture synthesis system based on emotions and semantics2025-07-03A Unified Transformer-Based Framework with Pretraining For Whole Body Grasping Motion Generation2025-07-01VolumetricSMPL: A Neural Volumetric Body Model for Efficient Interactions, Contacts, and Collisions2025-06-29DuetGen: Music Driven Two-Person Dance Generation via Hierarchical Masked Modeling2025-06-23PlanMoGPT: Flow-Enhanced Progressive Planning for Text to Motion Synthesis2025-06-22