TasksSotADatasetsPapersMethodsSubmitAbout
Papers With Code 2

A community resource for machine learning research: papers, code, benchmarks, and state-of-the-art results.

Explore

Notable BenchmarksAll SotADatasetsPapersMethods

Community

Submit ResultsAbout

Data sourced from the PWC Archive (CC-BY-SA 4.0). Built by the community, for the community.

Papers/GUESS:GradUally Enriching SyntheSis for Text-Driven Human ...

GUESS:GradUally Enriching SyntheSis for Text-Driven Human Motion Generation

Xuehao Gao, Yang Yang, Zhenyu Xie, Shaoyi Du, Zhongqian Sun, Yang Wu

2024-01-04Motion GenerationMotion Synthesis
PaperPDFCode(official)

Abstract

In this paper, we propose a novel cascaded diffusion-based generative framework for text-driven human motion synthesis, which exploits a strategy named GradUally Enriching SyntheSis (GUESS as its abbreviation). The strategy sets up generation objectives by grouping body joints of detailed skeletons in close semantic proximity together and then replacing each of such joint group with a single body-part node. Such an operation recursively abstracts a human pose to coarser and coarser skeletons at multiple granularity levels. With gradually increasing the abstraction level, human motion becomes more and more concise and stable, significantly benefiting the cross-modal motion synthesis task. The whole text-driven human motion synthesis problem is then divided into multiple abstraction levels and solved with a multi-stage generation framework with a cascaded latent diffusion model: an initial generator first generates the coarsest human motion guess from a given text description; then, a series of successive generators gradually enrich the motion details based on the textual description and the previous synthesized results. Notably, we further integrate GUESS with the proposed dynamic multi-condition fusion mechanism to dynamically balance the cooperative effects of the given textual condition and synthesized coarse motion prompt in different generation stages. Extensive experiments on large-scale datasets verify that GUESS outperforms existing state-of-the-art methods by large margins in terms of accuracy, realisticness, and diversity. Code is available at https://github.com/Xuehao-Gao/GUESS.

Results

TaskDatasetMetricValueModel
Pose TrackingHumanML3DDiversity9.826GUESS
Pose TrackingHumanML3DFID0.109GUESS
Pose TrackingHumanML3DMultimodality2.43GUESS
Pose TrackingHumanML3DR Precision Top30.787GUESS
Pose TrackingKIT Motion-LanguageDiversity10.933GUESS
Pose TrackingKIT Motion-LanguageFID0.371GUESS
Pose TrackingKIT Motion-LanguageMultimodality2.732GUESS
Pose TrackingKIT Motion-LanguageR Precision Top30.751GUESS
Motion SynthesisHumanML3DDiversity9.826GUESS
Motion SynthesisHumanML3DFID0.109GUESS
Motion SynthesisHumanML3DMultimodality2.43GUESS
Motion SynthesisHumanML3DR Precision Top30.787GUESS
Motion SynthesisKIT Motion-LanguageDiversity10.933GUESS
Motion SynthesisKIT Motion-LanguageFID0.371GUESS
Motion SynthesisKIT Motion-LanguageMultimodality2.732GUESS
Motion SynthesisKIT Motion-LanguageR Precision Top30.751GUESS
10-shot image generationHumanML3DDiversity9.826GUESS
10-shot image generationHumanML3DFID0.109GUESS
10-shot image generationHumanML3DMultimodality2.43GUESS
10-shot image generationHumanML3DR Precision Top30.787GUESS
10-shot image generationKIT Motion-LanguageDiversity10.933GUESS
10-shot image generationKIT Motion-LanguageFID0.371GUESS
10-shot image generationKIT Motion-LanguageMultimodality2.732GUESS
10-shot image generationKIT Motion-LanguageR Precision Top30.751GUESS
3D Human Pose TrackingHumanML3DDiversity9.826GUESS
3D Human Pose TrackingHumanML3DFID0.109GUESS
3D Human Pose TrackingHumanML3DMultimodality2.43GUESS
3D Human Pose TrackingHumanML3DR Precision Top30.787GUESS
3D Human Pose TrackingKIT Motion-LanguageDiversity10.933GUESS
3D Human Pose TrackingKIT Motion-LanguageFID0.371GUESS
3D Human Pose TrackingKIT Motion-LanguageMultimodality2.732GUESS
3D Human Pose TrackingKIT Motion-LanguageR Precision Top30.751GUESS

Related Papers

SnapMoGen: Human Motion Generation from Expressive Texts2025-07-12Go to Zero: Towards Zero-shot Motion Generation with Million-scale Data2025-07-09Motion Generation: A Survey of Generative Approaches and Benchmarks2025-07-07DeepGesture: A conversational gesture synthesis system based on emotions and semantics2025-07-03A Unified Transformer-Based Framework with Pretraining For Whole Body Grasping Motion Generation2025-07-01VolumetricSMPL: A Neural Volumetric Body Model for Efficient Interactions, Contacts, and Collisions2025-06-29DuetGen: Music Driven Two-Person Dance Generation via Hierarchical Masked Modeling2025-06-23PlanMoGPT: Flow-Enhanced Progressive Planning for Text to Motion Synthesis2025-06-22