TasksSotADatasetsPapersMethodsSubmitAbout
Papers With Code 2

A community resource for machine learning research: papers, code, benchmarks, and state-of-the-art results.

Explore

Notable BenchmarksAll SotADatasetsPapersMethods

Community

Submit ResultsAbout

Data sourced from the PWC Archive (CC-BY-SA 4.0). Built by the community, for the community.

Papers/BAMM: Bidirectional Autoregressive Motion Model

BAMM: Bidirectional Autoregressive Motion Model

Ekkasit Pinyoanuntapong, Muhammad Usama Saleem, Pu Wang, Minwoo Lee, Srijan Das, Chen Chen

2024-03-28DenoisingMotion GenerationMotion Synthesis
PaperPDFCode(official)

Abstract

Generating human motion from text has been dominated by denoising motion models either through diffusion or generative masking process. However, these models face great limitations in usability by requiring prior knowledge of the motion length. Conversely, autoregressive motion models address this limitation by adaptively predicting motion endpoints, at the cost of degraded generation quality and editing capabilities. To address these challenges, we propose Bidirectional Autoregressive Motion Model (BAMM), a novel text-to-motion generation framework. BAMM consists of two key components: (1) a motion tokenizer that transforms 3D human motion into discrete tokens in latent space, and (2) a masked self-attention transformer that autoregressively predicts randomly masked tokens via a hybrid attention masking strategy. By unifying generative masked modeling and autoregressive modeling, BAMM captures rich and bidirectional dependencies among motion tokens, while learning the probabilistic mapping from textual inputs to motion outputs with dynamically-adjusted motion sequence length. This feature enables BAMM to simultaneously achieving high-quality motion generation with enhanced usability and built-in motion editability. Extensive experiments on HumanML3D and KIT-ML datasets demonstrate that BAMM surpasses current state-of-the-art methods in both qualitative and quantitative measures. Our project page is available at https://exitudio.github.io/BAMM-page

Results

TaskDatasetMetricValueModel
Pose TrackingHumanML3DDiversity9.717BAMM
Pose TrackingHumanML3DFID0.055BAMM
Pose TrackingHumanML3DMultimodality1.687BAMM
Pose TrackingHumanML3DR Precision Top30.814BAMM
Pose TrackingKIT Motion-LanguageDiversity11.008BAMM
Pose TrackingKIT Motion-LanguageFID0.183BAMM
Pose TrackingKIT Motion-LanguageMultimodality1.609BAMM
Pose TrackingKIT Motion-LanguageR Precision Top30.788BAMM
Motion SynthesisHumanML3DDiversity9.717BAMM
Motion SynthesisHumanML3DFID0.055BAMM
Motion SynthesisHumanML3DMultimodality1.687BAMM
Motion SynthesisHumanML3DR Precision Top30.814BAMM
Motion SynthesisKIT Motion-LanguageDiversity11.008BAMM
Motion SynthesisKIT Motion-LanguageFID0.183BAMM
Motion SynthesisKIT Motion-LanguageMultimodality1.609BAMM
Motion SynthesisKIT Motion-LanguageR Precision Top30.788BAMM
10-shot image generationHumanML3DDiversity9.717BAMM
10-shot image generationHumanML3DFID0.055BAMM
10-shot image generationHumanML3DMultimodality1.687BAMM
10-shot image generationHumanML3DR Precision Top30.814BAMM
10-shot image generationKIT Motion-LanguageDiversity11.008BAMM
10-shot image generationKIT Motion-LanguageFID0.183BAMM
10-shot image generationKIT Motion-LanguageMultimodality1.609BAMM
10-shot image generationKIT Motion-LanguageR Precision Top30.788BAMM
3D Human Pose TrackingHumanML3DDiversity9.717BAMM
3D Human Pose TrackingHumanML3DFID0.055BAMM
3D Human Pose TrackingHumanML3DMultimodality1.687BAMM
3D Human Pose TrackingHumanML3DR Precision Top30.814BAMM
3D Human Pose TrackingKIT Motion-LanguageDiversity11.008BAMM
3D Human Pose TrackingKIT Motion-LanguageFID0.183BAMM
3D Human Pose TrackingKIT Motion-LanguageMultimodality1.609BAMM
3D Human Pose TrackingKIT Motion-LanguageR Precision Top30.788BAMM

Related Papers

fastWDM3D: Fast and Accurate 3D Healthy Tissue Inpainting2025-07-17Diffuman4D: 4D Consistent Human View Synthesis from Sparse-View Videos with Spatio-Temporal Diffusion Models2025-07-17Similarity-Guided Diffusion for Contrastive Sequential Recommendation2025-07-16HUG-VAS: A Hierarchical NURBS-Based Generative Model for Aortic Geometry Synthesis and Controllable Editing2025-07-15AirLLM: Diffusion Policy-based Adaptive LoRA for Remote Fine-Tuning of LLM over the Air2025-07-15SnapMoGen: Human Motion Generation from Expressive Texts2025-07-12A statistical physics framework for optimal learning2025-07-10Go to Zero: Towards Zero-shot Motion Generation with Million-scale Data2025-07-09