DiverseMotion: Towards Diverse Human Motion Generation via Discrete Diffusion

Yunhong Lou, Linchao Zhu, Yaxiong Wang, Xiaohan Wang, Yi Yang

2023-09-04Motion Generation Motion Synthesis Language Modelling

Abstract

We present DiverseMotion, a new approach for synthesizing high-quality human motions conditioned on textual descriptions while preserving motion diversity.Despite the recent significant process in text-based human motion generation,existing methods often prioritize fitting training motions at the expense of action diversity. Consequently, striking a balance between motion quality and diversity remains an unresolved challenge. This problem is compounded by two key factors: 1) the lack of diversity in motion-caption pairs in existing benchmarks and 2) the unilateral and biased semantic understanding of the text prompt, focusing primarily on the verb component while neglecting the nuanced distinctions indicated by other words.In response to the first issue, we construct a large-scale Wild Motion-Caption dataset (WMC) to extend the restricted action boundary of existing well-annotated datasets, enabling the learning of diverse motions through a more extensive range of actions. To this end, a motion BLIP is trained upon a pretrained vision-language model, then we automatically generate diverse motion captions for the collected motion sequences. As a result, we finally build a dataset comprising 8,888 motions coupled with 141k text.To comprehensively understand the text command, we propose a Hierarchical Semantic Aggregation (HSA) module to capture the fine-grained semantics.Finally,we involve the above two designs into an effective Motion Discrete Diffusion (MDD) framework to strike a balance between motion quality and diversity. Extensive experiments on HumanML3D and KIT-ML show that our DiverseMotion achieves the state-of-the-art motion quality and competitive motion diversity. Dataset, code, and pretrained models will be released to reproduce all of our results.

Results

Task	Dataset	Metric	Value	Model
Pose Tracking	HumanML3D	Diversity	9.551	DiverseMotion (s=1)
Pose Tracking	HumanML3D	FID	0.07	DiverseMotion (s=1)
Pose Tracking	HumanML3D	Multimodality	2.062	DiverseMotion (s=1)
Pose Tracking	HumanML3D	R Precision Top3	0.783	DiverseMotion (s=1)
Pose Tracking	HumanML3D	Diversity	9.683	DiverseMotion (s=2)
Pose Tracking	HumanML3D	FID	0.072	DiverseMotion (s=2)
Pose Tracking	HumanML3D	Multimodality	1.869	DiverseMotion (s=2)
Pose Tracking	HumanML3D	R Precision Top3	0.802	DiverseMotion (s=2)
Pose Tracking	KIT Motion-Language	Diversity	10.873	DiverseMotion
Pose Tracking	KIT Motion-Language	FID	0.468	DiverseMotion
Pose Tracking	KIT Motion-Language	Multimodality	2.062	DiverseMotion
Pose Tracking	KIT Motion-Language	R Precision Top3	0.76	DiverseMotion
Motion Synthesis	HumanML3D	Diversity	9.551	DiverseMotion (s=1)
Motion Synthesis	HumanML3D	FID	0.07	DiverseMotion (s=1)
Motion Synthesis	HumanML3D	Multimodality	2.062	DiverseMotion (s=1)
Motion Synthesis	HumanML3D	R Precision Top3	0.783	DiverseMotion (s=1)
Motion Synthesis	HumanML3D	Diversity	9.683	DiverseMotion (s=2)
Motion Synthesis	HumanML3D	FID	0.072	DiverseMotion (s=2)
Motion Synthesis	HumanML3D	Multimodality	1.869	DiverseMotion (s=2)
Motion Synthesis	HumanML3D	R Precision Top3	0.802	DiverseMotion (s=2)
Motion Synthesis	KIT Motion-Language	Diversity	10.873	DiverseMotion
Motion Synthesis	KIT Motion-Language	FID	0.468	DiverseMotion
Motion Synthesis	KIT Motion-Language	Multimodality	2.062	DiverseMotion
Motion Synthesis	KIT Motion-Language	R Precision Top3	0.76	DiverseMotion
10-shot image generation	HumanML3D	Diversity	9.551	DiverseMotion (s=1)
10-shot image generation	HumanML3D	FID	0.07	DiverseMotion (s=1)
10-shot image generation	HumanML3D	Multimodality	2.062	DiverseMotion (s=1)
10-shot image generation	HumanML3D	R Precision Top3	0.783	DiverseMotion (s=1)
10-shot image generation	HumanML3D	Diversity	9.683	DiverseMotion (s=2)
10-shot image generation	HumanML3D	FID	0.072	DiverseMotion (s=2)
10-shot image generation	HumanML3D	Multimodality	1.869	DiverseMotion (s=2)
10-shot image generation	HumanML3D	R Precision Top3	0.802	DiverseMotion (s=2)
10-shot image generation	KIT Motion-Language	Diversity	10.873	DiverseMotion
10-shot image generation	KIT Motion-Language	FID	0.468	DiverseMotion
10-shot image generation	KIT Motion-Language	Multimodality	2.062	DiverseMotion
10-shot image generation	KIT Motion-Language	R Precision Top3	0.76	DiverseMotion
3D Human Pose Tracking	HumanML3D	Diversity	9.551	DiverseMotion (s=1)
3D Human Pose Tracking	HumanML3D	FID	0.07	DiverseMotion (s=1)
3D Human Pose Tracking	HumanML3D	Multimodality	2.062	DiverseMotion (s=1)
3D Human Pose Tracking	HumanML3D	R Precision Top3	0.783	DiverseMotion (s=1)
3D Human Pose Tracking	HumanML3D	Diversity	9.683	DiverseMotion (s=2)
3D Human Pose Tracking	HumanML3D	FID	0.072	DiverseMotion (s=2)
3D Human Pose Tracking	HumanML3D	Multimodality	1.869	DiverseMotion (s=2)
3D Human Pose Tracking	HumanML3D	R Precision Top3	0.802	DiverseMotion (s=2)
3D Human Pose Tracking	KIT Motion-Language	Diversity	10.873	DiverseMotion
3D Human Pose Tracking	KIT Motion-Language	FID	0.468	DiverseMotion
3D Human Pose Tracking	KIT Motion-Language	Multimodality	2.062	DiverseMotion
3D Human Pose Tracking	KIT Motion-Language	R Precision Top3	0.76	DiverseMotion

Abstract

Results

Task	Dataset	Metric	Value	Model
Pose Tracking	HumanML3D	Diversity	9.551	DiverseMotion (s=1)
Pose Tracking	HumanML3D	FID	0.07	DiverseMotion (s=1)
Pose Tracking	HumanML3D	Multimodality	2.062	DiverseMotion (s=1)
Pose Tracking	HumanML3D	R Precision Top3	0.783	DiverseMotion (s=1)
Pose Tracking	HumanML3D	Diversity	9.683	DiverseMotion (s=2)
Pose Tracking	HumanML3D	FID	0.072	DiverseMotion (s=2)
Pose Tracking	HumanML3D	Multimodality	1.869	DiverseMotion (s=2)
Pose Tracking	HumanML3D	R Precision Top3	0.802	DiverseMotion (s=2)
Pose Tracking	KIT Motion-Language	Diversity	10.873	DiverseMotion
Pose Tracking	KIT Motion-Language	FID	0.468	DiverseMotion
Pose Tracking	KIT Motion-Language	Multimodality	2.062	DiverseMotion
Pose Tracking	KIT Motion-Language	R Precision Top3	0.76	DiverseMotion
Motion Synthesis	HumanML3D	Diversity	9.551	DiverseMotion (s=1)
Motion Synthesis	HumanML3D	FID	0.07	DiverseMotion (s=1)
Motion Synthesis	HumanML3D	Multimodality	2.062	DiverseMotion (s=1)
Motion Synthesis	HumanML3D	R Precision Top3	0.783	DiverseMotion (s=1)
Motion Synthesis	HumanML3D	Diversity	9.683	DiverseMotion (s=2)
Motion Synthesis	HumanML3D	FID	0.072	DiverseMotion (s=2)
Motion Synthesis	HumanML3D	Multimodality	1.869	DiverseMotion (s=2)
Motion Synthesis	HumanML3D	R Precision Top3	0.802	DiverseMotion (s=2)
Motion Synthesis	KIT Motion-Language	Diversity	10.873	DiverseMotion
Motion Synthesis	KIT Motion-Language	FID	0.468	DiverseMotion
Motion Synthesis	KIT Motion-Language	Multimodality	2.062	DiverseMotion
Motion Synthesis	KIT Motion-Language	R Precision Top3	0.76	DiverseMotion
10-shot image generation	HumanML3D	Diversity	9.551	DiverseMotion (s=1)
10-shot image generation	HumanML3D	FID	0.07	DiverseMotion (s=1)
10-shot image generation	HumanML3D	Multimodality	2.062	DiverseMotion (s=1)
10-shot image generation	HumanML3D	R Precision Top3	0.783	DiverseMotion (s=1)
10-shot image generation	HumanML3D	Diversity	9.683	DiverseMotion (s=2)
10-shot image generation	HumanML3D	FID	0.072	DiverseMotion (s=2)
10-shot image generation	HumanML3D	Multimodality	1.869	DiverseMotion (s=2)
10-shot image generation	HumanML3D	R Precision Top3	0.802	DiverseMotion (s=2)
10-shot image generation	KIT Motion-Language	Diversity	10.873	DiverseMotion
10-shot image generation	KIT Motion-Language	FID	0.468	DiverseMotion
10-shot image generation	KIT Motion-Language	Multimodality	2.062	DiverseMotion
10-shot image generation	KIT Motion-Language	R Precision Top3	0.76	DiverseMotion
3D Human Pose Tracking	HumanML3D	Diversity	9.551	DiverseMotion (s=1)
3D Human Pose Tracking	HumanML3D	FID	0.07	DiverseMotion (s=1)
3D Human Pose Tracking	HumanML3D	Multimodality	2.062	DiverseMotion (s=1)
3D Human Pose Tracking	HumanML3D	R Precision Top3	0.783	DiverseMotion (s=1)
3D Human Pose Tracking	HumanML3D	Diversity	9.683	DiverseMotion (s=2)
3D Human Pose Tracking	HumanML3D	FID	0.072	DiverseMotion (s=2)
3D Human Pose Tracking	HumanML3D	Multimodality	1.869	DiverseMotion (s=2)
3D Human Pose Tracking	HumanML3D	R Precision Top3	0.802	DiverseMotion (s=2)
3D Human Pose Tracking	KIT Motion-Language	Diversity	10.873	DiverseMotion
3D Human Pose Tracking	KIT Motion-Language	FID	0.468	DiverseMotion
3D Human Pose Tracking	KIT Motion-Language	Multimodality	2.062	DiverseMotion
3D Human Pose Tracking	KIT Motion-Language	R Precision Top3	0.76	DiverseMotion

DiverseMotion: Towards Diverse Human Motion Generation via Discrete Diffusion

Abstract

Results

Related Papers

DiverseMotion: Towards Diverse Human Motion Generation via Discrete Diffusion

Abstract

Results

Related Papers