TasksSotADatasetsPapersMethodsSubmitAbout
Papers With Code 2

A community resource for machine learning research: papers, code, benchmarks, and state-of-the-art results.

Explore

Notable BenchmarksAll SotADatasetsPapersMethods

Community

Submit ResultsAbout

Data sourced from the PWC Archive (CC-BY-SA 4.0). Built by the community, for the community.

Papers/MoMask: Generative Masked Modeling of 3D Human Motions

MoMask: Generative Masked Modeling of 3D Human Motions

Chuan Guo, Yuxuan Mu, Muhammad Gohar Javed, Sen Wang, Li Cheng

2023-11-29CVPR 2024 1Human motion predictionMotion ForecastingMotion GenerationMotion SynthesisMotion Interpolation
PaperPDFCode(official)

Abstract

We introduce MoMask, a novel masked modeling framework for text-driven 3D human motion generation. In MoMask, a hierarchical quantization scheme is employed to represent human motion as multi-layer discrete motion tokens with high-fidelity details. Starting at the base layer, with a sequence of motion tokens obtained by vector quantization, the residual tokens of increasing orders are derived and stored at the subsequent layers of the hierarchy. This is consequently followed by two distinct bidirectional transformers. For the base-layer motion tokens, a Masked Transformer is designated to predict randomly masked motion tokens conditioned on text input at training stage. During generation (i.e. inference) stage, starting from an empty sequence, our Masked Transformer iteratively fills up the missing tokens; Subsequently, a Residual Transformer learns to progressively predict the next-layer tokens based on the results from current layer. Extensive experiments demonstrate that MoMask outperforms the state-of-art methods on the text-to-motion generation task, with an FID of 0.045 (vs e.g. 0.141 of T2M-GPT) on the HumanML3D dataset, and 0.228 (vs 0.514) on KIT-ML, respectively. MoMask can also be seamlessly applied in related tasks without further model fine-tuning, such as text-guided temporal inpainting.

Results

TaskDatasetMetricValueModel
Pose TrackingHumanML3DFID0.045MoMask
Pose TrackingHumanML3DMultimodality1.241MoMask
Pose TrackingHumanML3DR Precision Top30.807MoMask
Pose TrackingKIT Motion-LanguageFID0.204MoMask
Pose TrackingKIT Motion-LanguageMultimodality1.131MoMask
Pose TrackingKIT Motion-LanguageR Precision Top30.781MoMask
Motion SynthesisHumanML3DFID0.045MoMask
Motion SynthesisHumanML3DMultimodality1.241MoMask
Motion SynthesisHumanML3DR Precision Top30.807MoMask
Motion SynthesisKIT Motion-LanguageFID0.204MoMask
Motion SynthesisKIT Motion-LanguageMultimodality1.131MoMask
Motion SynthesisKIT Motion-LanguageR Precision Top30.781MoMask
10-shot image generationHumanML3DFID0.045MoMask
10-shot image generationHumanML3DMultimodality1.241MoMask
10-shot image generationHumanML3DR Precision Top30.807MoMask
10-shot image generationKIT Motion-LanguageFID0.204MoMask
10-shot image generationKIT Motion-LanguageMultimodality1.131MoMask
10-shot image generationKIT Motion-LanguageR Precision Top30.781MoMask
3D Human Pose TrackingHumanML3DFID0.045MoMask
3D Human Pose TrackingHumanML3DMultimodality1.241MoMask
3D Human Pose TrackingHumanML3DR Precision Top30.807MoMask
3D Human Pose TrackingKIT Motion-LanguageFID0.204MoMask
3D Human Pose TrackingKIT Motion-LanguageMultimodality1.131MoMask
3D Human Pose TrackingKIT Motion-LanguageR Precision Top30.781MoMask

Related Papers

SnapMoGen: Human Motion Generation from Expressive Texts2025-07-12ILNet: Trajectory Prediction with Inverse Learning Attention for Enhancing Intention Capture2025-07-09Go to Zero: Towards Zero-shot Motion Generation with Million-scale Data2025-07-09Motion Generation: A Survey of Generative Approaches and Benchmarks2025-07-07Stochastic Human Motion Prediction with Memory of Action Transition and Action Characteristic2025-07-05Temporal Continual Learning with Prior Compensation for Human Motion Prediction2025-07-05DeepGesture: A conversational gesture synthesis system based on emotions and semantics2025-07-03A Unified Transformer-Based Framework with Pretraining For Whole Body Grasping Motion Generation2025-07-01