TasksSotADatasetsPapersMethodsSubmitAbout
Papers With Code 2

A community resource for machine learning research: papers, code, benchmarks, and state-of-the-art results.

Explore

Notable BenchmarksAll SotADatasetsPapersMethods

Community

Submit ResultsAbout

Data sourced from the PWC Archive (CC-BY-SA 4.0). Built by the community, for the community.

Papers/BAD: Bidirectional Auto-regressive Diffusion for Text-to-M...

BAD: Bidirectional Auto-regressive Diffusion for Text-to-Motion Generation

S. Rohollah Hosseyni, Ali Ahmad Rahmani, S. Jamal Seyedmohammadi, Sanaz Seyedin, Arash Mohammadi

2024-09-17Human motion predictionMotion ForecastingMotion GenerationMotion Synthesis
PaperPDFCode(official)

Abstract

Autoregressive models excel in modeling sequential dependencies by enforcing causal constraints, yet they struggle to capture complex bidirectional patterns due to their unidirectional nature. In contrast, mask-based models leverage bidirectional context, enabling richer dependency modeling. However, they often assume token independence during prediction, which undermines the modeling of sequential dependencies. Additionally, the corruption of sequences through masking or absorption can introduce unnatural distortions, complicating the learning process. To address these issues, we propose Bidirectional Autoregressive Diffusion (BAD), a novel approach that unifies the strengths of autoregressive and mask-based generative models. BAD utilizes a permutation-based corruption technique that preserves the natural sequence structure while enforcing causal dependencies through randomized ordering, enabling the effective capture of both sequential and bidirectional relationships. Comprehensive experiments show that BAD outperforms autoregressive and mask-based models in text-to-motion generation, suggesting a novel pre-training strategy for sequence modeling. The codebase for BAD is available on https://github.com/RohollahHS/BAD.

Results

TaskDatasetMetricValueModel
Pose TrackingHumanML3DDiversity9.688BAD (CBS)
Pose TrackingHumanML3DFID0.049BAD (CBS)
Pose TrackingHumanML3DMultimodality1.119BAD (CBS)
Pose TrackingHumanML3DR Precision Top30.8BAD (CBS)
Pose TrackingHumanML3DDiversity9.694BAD (OAAS)
Pose TrackingHumanML3DFID0.065BAD (OAAS)
Pose TrackingHumanML3DMultimodality1.194BAD (OAAS)
Pose TrackingHumanML3DR Precision Top30.808BAD (OAAS)
Pose TrackingKIT Motion-LanguageDiversity11BAD (OAAS)
Pose TrackingKIT Motion-LanguageFID0.221BAD (OAAS)
Pose TrackingKIT Motion-LanguageMultimodality1.17BAD (OAAS)
Pose TrackingKIT Motion-LanguageR Precision Top30.75BAD (OAAS)
Motion SynthesisHumanML3DDiversity9.688BAD (CBS)
Motion SynthesisHumanML3DFID0.049BAD (CBS)
Motion SynthesisHumanML3DMultimodality1.119BAD (CBS)
Motion SynthesisHumanML3DR Precision Top30.8BAD (CBS)
Motion SynthesisHumanML3DDiversity9.694BAD (OAAS)
Motion SynthesisHumanML3DFID0.065BAD (OAAS)
Motion SynthesisHumanML3DMultimodality1.194BAD (OAAS)
Motion SynthesisHumanML3DR Precision Top30.808BAD (OAAS)
Motion SynthesisKIT Motion-LanguageDiversity11BAD (OAAS)
Motion SynthesisKIT Motion-LanguageFID0.221BAD (OAAS)
Motion SynthesisKIT Motion-LanguageMultimodality1.17BAD (OAAS)
Motion SynthesisKIT Motion-LanguageR Precision Top30.75BAD (OAAS)
10-shot image generationHumanML3DDiversity9.688BAD (CBS)
10-shot image generationHumanML3DFID0.049BAD (CBS)
10-shot image generationHumanML3DMultimodality1.119BAD (CBS)
10-shot image generationHumanML3DR Precision Top30.8BAD (CBS)
10-shot image generationHumanML3DDiversity9.694BAD (OAAS)
10-shot image generationHumanML3DFID0.065BAD (OAAS)
10-shot image generationHumanML3DMultimodality1.194BAD (OAAS)
10-shot image generationHumanML3DR Precision Top30.808BAD (OAAS)
10-shot image generationKIT Motion-LanguageDiversity11BAD (OAAS)
10-shot image generationKIT Motion-LanguageFID0.221BAD (OAAS)
10-shot image generationKIT Motion-LanguageMultimodality1.17BAD (OAAS)
10-shot image generationKIT Motion-LanguageR Precision Top30.75BAD (OAAS)
3D Human Pose TrackingHumanML3DDiversity9.688BAD (CBS)
3D Human Pose TrackingHumanML3DFID0.049BAD (CBS)
3D Human Pose TrackingHumanML3DMultimodality1.119BAD (CBS)
3D Human Pose TrackingHumanML3DR Precision Top30.8BAD (CBS)
3D Human Pose TrackingHumanML3DDiversity9.694BAD (OAAS)
3D Human Pose TrackingHumanML3DFID0.065BAD (OAAS)
3D Human Pose TrackingHumanML3DMultimodality1.194BAD (OAAS)
3D Human Pose TrackingHumanML3DR Precision Top30.808BAD (OAAS)
3D Human Pose TrackingKIT Motion-LanguageDiversity11BAD (OAAS)
3D Human Pose TrackingKIT Motion-LanguageFID0.221BAD (OAAS)
3D Human Pose TrackingKIT Motion-LanguageMultimodality1.17BAD (OAAS)
3D Human Pose TrackingKIT Motion-LanguageR Precision Top30.75BAD (OAAS)

Related Papers

SnapMoGen: Human Motion Generation from Expressive Texts2025-07-12ILNet: Trajectory Prediction with Inverse Learning Attention for Enhancing Intention Capture2025-07-09Go to Zero: Towards Zero-shot Motion Generation with Million-scale Data2025-07-09Motion Generation: A Survey of Generative Approaches and Benchmarks2025-07-07Stochastic Human Motion Prediction with Memory of Action Transition and Action Characteristic2025-07-05Temporal Continual Learning with Prior Compensation for Human Motion Prediction2025-07-05DeepGesture: A conversational gesture synthesis system based on emotions and semantics2025-07-03A Unified Transformer-Based Framework with Pretraining For Whole Body Grasping Motion Generation2025-07-01