TasksSotADatasetsPapersMethodsSubmitAbout
Papers With Code 2

A community resource for machine learning research: papers, code, benchmarks, and state-of-the-art results.

Explore

Notable BenchmarksAll SotADatasetsPapersMethods

Community

Submit ResultsAbout

Data sourced from the PWC Archive (CC-BY-SA 4.0). Built by the community, for the community.

Papers/ReMoDiffuse: Retrieval-Augmented Motion Diffusion Model

ReMoDiffuse: Retrieval-Augmented Motion Diffusion Model

Mingyuan Zhang, Xinying Guo, Liang Pan, Zhongang Cai, Fangzhou Hong, Huirong Li, Lei Yang, Ziwei Liu

2023-04-03ICCV 2023 1DenoisingRetrievalMotion GenerationMotion Synthesis
PaperPDFCode(official)

Abstract

3D human motion generation is crucial for creative industry. Recent advances rely on generative models with domain knowledge for text-driven motion generation, leading to substantial progress in capturing common motions. However, the performance on more diverse motions remains unsatisfactory. In this work, we propose ReMoDiffuse, a diffusion-model-based motion generation framework that integrates a retrieval mechanism to refine the denoising process. ReMoDiffuse enhances the generalizability and diversity of text-driven motion generation with three key designs: 1) Hybrid Retrieval finds appropriate references from the database in terms of both semantic and kinematic similarities. 2) Semantic-Modulated Transformer selectively absorbs retrieval knowledge, adapting to the difference between retrieved samples and the target motion sequence. 3) Condition Mixture better utilizes the retrieval database during inference, overcoming the scale sensitivity in classifier-free guidance. Extensive experiments demonstrate that ReMoDiffuse outperforms state-of-the-art methods by balancing both text-motion consistency and motion quality, especially for more diverse motion generation.

Results

TaskDatasetMetricValueModel
Pose TrackingHumanML3DDiversity9.018ReMoDiffuse
Pose TrackingHumanML3DFID0.103ReMoDiffuse
Pose TrackingHumanML3DMultimodality1.795ReMoDiffuse
Pose TrackingHumanML3DR Precision Top30.795ReMoDiffuse
Pose TrackingKIT Motion-LanguageDiversity10.8ReMoDiffuse
Pose TrackingKIT Motion-LanguageFID0.155ReMoDiffuse
Pose TrackingKIT Motion-LanguageMultimodality1.239ReMoDiffuse
Pose TrackingKIT Motion-LanguageR Precision Top30.765ReMoDiffuse
Motion SynthesisHumanML3DDiversity9.018ReMoDiffuse
Motion SynthesisHumanML3DFID0.103ReMoDiffuse
Motion SynthesisHumanML3DMultimodality1.795ReMoDiffuse
Motion SynthesisHumanML3DR Precision Top30.795ReMoDiffuse
Motion SynthesisKIT Motion-LanguageDiversity10.8ReMoDiffuse
Motion SynthesisKIT Motion-LanguageFID0.155ReMoDiffuse
Motion SynthesisKIT Motion-LanguageMultimodality1.239ReMoDiffuse
Motion SynthesisKIT Motion-LanguageR Precision Top30.765ReMoDiffuse
10-shot image generationHumanML3DDiversity9.018ReMoDiffuse
10-shot image generationHumanML3DFID0.103ReMoDiffuse
10-shot image generationHumanML3DMultimodality1.795ReMoDiffuse
10-shot image generationHumanML3DR Precision Top30.795ReMoDiffuse
10-shot image generationKIT Motion-LanguageDiversity10.8ReMoDiffuse
10-shot image generationKIT Motion-LanguageFID0.155ReMoDiffuse
10-shot image generationKIT Motion-LanguageMultimodality1.239ReMoDiffuse
10-shot image generationKIT Motion-LanguageR Precision Top30.765ReMoDiffuse
3D Human Pose TrackingHumanML3DDiversity9.018ReMoDiffuse
3D Human Pose TrackingHumanML3DFID0.103ReMoDiffuse
3D Human Pose TrackingHumanML3DMultimodality1.795ReMoDiffuse
3D Human Pose TrackingHumanML3DR Precision Top30.795ReMoDiffuse
3D Human Pose TrackingKIT Motion-LanguageDiversity10.8ReMoDiffuse
3D Human Pose TrackingKIT Motion-LanguageFID0.155ReMoDiffuse
3D Human Pose TrackingKIT Motion-LanguageMultimodality1.239ReMoDiffuse
3D Human Pose TrackingKIT Motion-LanguageR Precision Top30.765ReMoDiffuse

Related Papers

fastWDM3D: Fast and Accurate 3D Healthy Tissue Inpainting2025-07-17Diffuman4D: 4D Consistent Human View Synthesis from Sparse-View Videos with Spatio-Temporal Diffusion Models2025-07-17From Roots to Rewards: Dynamic Tree Reasoning with RL2025-07-17HapticCap: A Multimodal Dataset and Task for Understanding User Experience of Vibration Haptic Signals2025-07-17A Survey of Context Engineering for Large Language Models2025-07-17MCoT-RE: Multi-Faceted Chain-of-Thought and Re-Ranking for Training-Free Zero-Shot Composed Image Retrieval2025-07-17Similarity-Guided Diffusion for Contrastive Sequential Recommendation2025-07-16Developing Visual Augmented Q&A System using Scalable Vision Embedding Retrieval & Late Interaction Re-ranker2025-07-16