FineMoGen: Fine-Grained Spatio-Temporal Motion Generation and Editing

Mingyuan Zhang, Huirong Li, Zhongang Cai, Jiawei Ren, Lei Yang, Ziwei Liu

2023-12-22NeurIPS 2023 11Motion Generation Motion Synthesis

Abstract

Text-driven motion generation has achieved substantial progress with the emergence of diffusion models. However, existing methods still struggle to generate complex motion sequences that correspond to fine-grained descriptions, depicting detailed and accurate spatio-temporal actions. This lack of fine controllability limits the usage of motion generation to a larger audience. To tackle these challenges, we present FineMoGen, a diffusion-based motion generation and editing framework that can synthesize fine-grained motions, with spatial-temporal composition to the user instructions. Specifically, FineMoGen builds upon diffusion model with a novel transformer architecture dubbed Spatio-Temporal Mixture Attention (SAMI). SAMI optimizes the generation of the global attention template from two perspectives: 1) explicitly modeling the constraints of spatio-temporal composition; and 2) utilizing sparsely-activated mixture-of-experts to adaptively extract fine-grained features. To facilitate a large-scale study on this new fine-grained motion generation task, we contribute the HuMMan-MoGen dataset, which consists of 2,968 videos and 102,336 fine-grained spatio-temporal descriptions. Extensive experiments validate that FineMoGen exhibits superior motion generation quality over state-of-the-art methods. Notably, FineMoGen further enables zero-shot motion editing capabilities with the aid of modern large language models (LLM), which faithfully manipulates motion sequences with fine-grained instructions. Project Page: https://mingyuan-zhang.github.io/projects/FineMoGen.html

Results

Task	Dataset	Metric	Value	Model
Pose Tracking	HumanML3D	Diversity	9.263	FineMoGen
Pose Tracking	HumanML3D	FID	0.151	FineMoGen
Pose Tracking	HumanML3D	Multimodality	2.696	FineMoGen
Pose Tracking	HumanML3D	R Precision Top3	0.784	FineMoGen
Pose Tracking	KIT Motion-Language	Diversity	10.85	FineMoGen
Pose Tracking	KIT Motion-Language	FID	0.178	FineMoGen
Pose Tracking	KIT Motion-Language	Multimodality	1.877	FineMoGen
Pose Tracking	KIT Motion-Language	R Precision Top3	0.772	FineMoGen
Motion Synthesis	HumanML3D	Diversity	9.263	FineMoGen
Motion Synthesis	HumanML3D	FID	0.151	FineMoGen
Motion Synthesis	HumanML3D	Multimodality	2.696	FineMoGen
Motion Synthesis	HumanML3D	R Precision Top3	0.784	FineMoGen
Motion Synthesis	KIT Motion-Language	Diversity	10.85	FineMoGen
Motion Synthesis	KIT Motion-Language	FID	0.178	FineMoGen
Motion Synthesis	KIT Motion-Language	Multimodality	1.877	FineMoGen
Motion Synthesis	KIT Motion-Language	R Precision Top3	0.772	FineMoGen
10-shot image generation	HumanML3D	Diversity	9.263	FineMoGen
10-shot image generation	HumanML3D	FID	0.151	FineMoGen
10-shot image generation	HumanML3D	Multimodality	2.696	FineMoGen
10-shot image generation	HumanML3D	R Precision Top3	0.784	FineMoGen
10-shot image generation	KIT Motion-Language	Diversity	10.85	FineMoGen
10-shot image generation	KIT Motion-Language	FID	0.178	FineMoGen
10-shot image generation	KIT Motion-Language	Multimodality	1.877	FineMoGen
10-shot image generation	KIT Motion-Language	R Precision Top3	0.772	FineMoGen
3D Human Pose Tracking	HumanML3D	Diversity	9.263	FineMoGen
3D Human Pose Tracking	HumanML3D	FID	0.151	FineMoGen
3D Human Pose Tracking	HumanML3D	Multimodality	2.696	FineMoGen
3D Human Pose Tracking	HumanML3D	R Precision Top3	0.784	FineMoGen
3D Human Pose Tracking	KIT Motion-Language	Diversity	10.85	FineMoGen
3D Human Pose Tracking	KIT Motion-Language	FID	0.178	FineMoGen
3D Human Pose Tracking	KIT Motion-Language	Multimodality	1.877	FineMoGen
3D Human Pose Tracking	KIT Motion-Language	R Precision Top3	0.772	FineMoGen

FineMoGen: Fine-Grained Spatio-Temporal Motion Generation and Editing

Abstract

Results

Related Papers

FineMoGen: Fine-Grained Spatio-Temporal Motion Generation and Editing

Abstract

Results

Related Papers