ReMoDiffuse: Retrieval-Augmented Motion Diffusion Model

Mingyuan Zhang, Xinying Guo, Liang Pan, Zhongang Cai, Fangzhou Hong, Huirong Li, Lei Yang, Ziwei Liu

2023-04-03ICCV 2023 1Denoising Retrieval Motion Generation Motion Synthesis

Abstract

3D human motion generation is crucial for creative industry. Recent advances rely on generative models with domain knowledge for text-driven motion generation, leading to substantial progress in capturing common motions. However, the performance on more diverse motions remains unsatisfactory. In this work, we propose ReMoDiffuse, a diffusion-model-based motion generation framework that integrates a retrieval mechanism to refine the denoising process. ReMoDiffuse enhances the generalizability and diversity of text-driven motion generation with three key designs: 1) Hybrid Retrieval finds appropriate references from the database in terms of both semantic and kinematic similarities. 2) Semantic-Modulated Transformer selectively absorbs retrieval knowledge, adapting to the difference between retrieved samples and the target motion sequence. 3) Condition Mixture better utilizes the retrieval database during inference, overcoming the scale sensitivity in classifier-free guidance. Extensive experiments demonstrate that ReMoDiffuse outperforms state-of-the-art methods by balancing both text-motion consistency and motion quality, especially for more diverse motion generation.

Results

Task	Dataset	Metric	Value	Model
Pose Tracking	HumanML3D	Diversity	9.018	ReMoDiffuse
Pose Tracking	HumanML3D	FID	0.103	ReMoDiffuse
Pose Tracking	HumanML3D	Multimodality	1.795	ReMoDiffuse
Pose Tracking	HumanML3D	R Precision Top3	0.795	ReMoDiffuse
Pose Tracking	KIT Motion-Language	Diversity	10.8	ReMoDiffuse
Pose Tracking	KIT Motion-Language	FID	0.155	ReMoDiffuse
Pose Tracking	KIT Motion-Language	Multimodality	1.239	ReMoDiffuse
Pose Tracking	KIT Motion-Language	R Precision Top3	0.765	ReMoDiffuse
Motion Synthesis	HumanML3D	Diversity	9.018	ReMoDiffuse
Motion Synthesis	HumanML3D	FID	0.103	ReMoDiffuse
Motion Synthesis	HumanML3D	Multimodality	1.795	ReMoDiffuse
Motion Synthesis	HumanML3D	R Precision Top3	0.795	ReMoDiffuse
Motion Synthesis	KIT Motion-Language	Diversity	10.8	ReMoDiffuse
Motion Synthesis	KIT Motion-Language	FID	0.155	ReMoDiffuse
Motion Synthesis	KIT Motion-Language	Multimodality	1.239	ReMoDiffuse
Motion Synthesis	KIT Motion-Language	R Precision Top3	0.765	ReMoDiffuse
10-shot image generation	HumanML3D	Diversity	9.018	ReMoDiffuse
10-shot image generation	HumanML3D	FID	0.103	ReMoDiffuse
10-shot image generation	HumanML3D	Multimodality	1.795	ReMoDiffuse
10-shot image generation	HumanML3D	R Precision Top3	0.795	ReMoDiffuse
10-shot image generation	KIT Motion-Language	Diversity	10.8	ReMoDiffuse
10-shot image generation	KIT Motion-Language	FID	0.155	ReMoDiffuse
10-shot image generation	KIT Motion-Language	Multimodality	1.239	ReMoDiffuse
10-shot image generation	KIT Motion-Language	R Precision Top3	0.765	ReMoDiffuse
3D Human Pose Tracking	HumanML3D	Diversity	9.018	ReMoDiffuse
3D Human Pose Tracking	HumanML3D	FID	0.103	ReMoDiffuse
3D Human Pose Tracking	HumanML3D	Multimodality	1.795	ReMoDiffuse
3D Human Pose Tracking	HumanML3D	R Precision Top3	0.795	ReMoDiffuse
3D Human Pose Tracking	KIT Motion-Language	Diversity	10.8	ReMoDiffuse
3D Human Pose Tracking	KIT Motion-Language	FID	0.155	ReMoDiffuse
3D Human Pose Tracking	KIT Motion-Language	Multimodality	1.239	ReMoDiffuse
3D Human Pose Tracking	KIT Motion-Language	R Precision Top3	0.765	ReMoDiffuse

ReMoDiffuse: Retrieval-Augmented Motion Diffusion Model

Abstract

Results

Related Papers

ReMoDiffuse: Retrieval-Augmented Motion Diffusion Model

Abstract

Results

Related Papers