TasksSotADatasetsPapersMethodsSubmitAbout
Papers With Code 2

A community resource for machine learning research: papers, code, benchmarks, and state-of-the-art results.

Explore

Notable BenchmarksAll SotADatasetsPapersMethods

Community

Submit ResultsAbout

Data sourced from the PWC Archive (CC-BY-SA 4.0). Built by the community, for the community.

Papers/BiPO: Bidirectional Partial Occlusion Network for Text-to-...

BiPO: Bidirectional Partial Occlusion Network for Text-to-Motion Synthesis

Seong-Eun Hong, Soobin Lim, Juyeong Hwang, Minwook Chang, Hyeongyeop Kang

2024-11-28Motion GenerationMotion Synthesis
PaperPDF

Abstract

Generating natural and expressive human motions from textual descriptions is challenging due to the complexity of coordinating full-body dynamics and capturing nuanced motion patterns over extended sequences that accurately reflect the given text. To address this, we introduce BiPO, Bidirectional Partial Occlusion Network for Text-to-Motion Synthesis, a novel model that enhances text-to-motion synthesis by integrating part-based generation with a bidirectional autoregressive architecture. This integration allows BiPO to consider both past and future contexts during generation while enhancing detailed control over individual body parts without requiring ground-truth motion length. To relax the interdependency among body parts caused by the integration, we devise the Partial Occlusion technique, which probabilistically occludes the certain motion part information during training. In our comprehensive experiments, BiPO achieves state-of-the-art performance on the HumanML3D dataset, outperforming recent methods such as ParCo, MoMask, and BAMM in terms of FID scores and overall motion quality. Notably, BiPO excels not only in the text-to-motion generation task but also in motion editing tasks that synthesize motion based on partially generated motion sequences and textual descriptions. These results reveal the BiPO's effectiveness in advancing text-to-motion synthesis and its potential for practical applications.

Results

TaskDatasetMetricValueModel
Pose TrackingHumanML3DDiversity9.556BiPO
Pose TrackingHumanML3DFID0.03BiPO
Pose TrackingHumanML3DMultimodality1.374BiPO
Pose TrackingHumanML3DR Precision Top30.809BiPO
Pose TrackingKIT Motion-LanguageDiversity10.833BiPO
Pose TrackingKIT Motion-LanguageFID0.164BiPO
Pose TrackingKIT Motion-LanguageMultimodality1.098BiPO
Pose TrackingKIT Motion-LanguageR Precision Top30.803BiPO
Motion SynthesisHumanML3DDiversity9.556BiPO
Motion SynthesisHumanML3DFID0.03BiPO
Motion SynthesisHumanML3DMultimodality1.374BiPO
Motion SynthesisHumanML3DR Precision Top30.809BiPO
Motion SynthesisKIT Motion-LanguageDiversity10.833BiPO
Motion SynthesisKIT Motion-LanguageFID0.164BiPO
Motion SynthesisKIT Motion-LanguageMultimodality1.098BiPO
Motion SynthesisKIT Motion-LanguageR Precision Top30.803BiPO
10-shot image generationHumanML3DDiversity9.556BiPO
10-shot image generationHumanML3DFID0.03BiPO
10-shot image generationHumanML3DMultimodality1.374BiPO
10-shot image generationHumanML3DR Precision Top30.809BiPO
10-shot image generationKIT Motion-LanguageDiversity10.833BiPO
10-shot image generationKIT Motion-LanguageFID0.164BiPO
10-shot image generationKIT Motion-LanguageMultimodality1.098BiPO
10-shot image generationKIT Motion-LanguageR Precision Top30.803BiPO
3D Human Pose TrackingHumanML3DDiversity9.556BiPO
3D Human Pose TrackingHumanML3DFID0.03BiPO
3D Human Pose TrackingHumanML3DMultimodality1.374BiPO
3D Human Pose TrackingHumanML3DR Precision Top30.809BiPO
3D Human Pose TrackingKIT Motion-LanguageDiversity10.833BiPO
3D Human Pose TrackingKIT Motion-LanguageFID0.164BiPO
3D Human Pose TrackingKIT Motion-LanguageMultimodality1.098BiPO
3D Human Pose TrackingKIT Motion-LanguageR Precision Top30.803BiPO

Related Papers

SnapMoGen: Human Motion Generation from Expressive Texts2025-07-12Go to Zero: Towards Zero-shot Motion Generation with Million-scale Data2025-07-09Motion Generation: A Survey of Generative Approaches and Benchmarks2025-07-07DeepGesture: A conversational gesture synthesis system based on emotions and semantics2025-07-03A Unified Transformer-Based Framework with Pretraining For Whole Body Grasping Motion Generation2025-07-01VolumetricSMPL: A Neural Volumetric Body Model for Efficient Interactions, Contacts, and Collisions2025-06-29DuetGen: Music Driven Two-Person Dance Generation via Hierarchical Masked Modeling2025-06-23PlanMoGPT: Flow-Enhanced Progressive Planning for Text to Motion Synthesis2025-06-22