Fg-T2M: Fine-Grained Text-Driven Human Motion Generation via Diffusion Model

Yin Wang, Zhiying Leng, Frederick W. B. Li, Shun-Cheng Wu, Xiaohui Liang

2023-09-12ICCV 2023 1Motion Generation Motion Synthesis

Abstract

Text-driven human motion generation in computer vision is both significant and challenging. However, current methods are limited to producing either deterministic or imprecise motion sequences, failing to effectively control the temporal and spatial relationships required to conform to a given text description. In this work, we propose a fine-grained method for generating high-quality, conditional human motion sequences supporting precise text description. Our approach consists of two key components: 1) a linguistics-structure assisted module that constructs accurate and complete language feature to fully utilize text information; and 2) a context-aware progressive reasoning module that learns neighborhood and overall semantic linguistics features from shallow and deep graph neural networks to achieve a multi-step inference. Experiments show that our approach outperforms text-driven motion generation methods on HumanML3D and KIT test sets and generates better visually confirmed motion to the text conditions.

Results

Task	Dataset	Metric	Value	Model
Pose Tracking	HumanML3D	Diversity	9.278	Fg-T2M
Pose Tracking	HumanML3D	FID	0.243	Fg-T2M
Pose Tracking	HumanML3D	Multimodality	1.614	Fg-T2M
Pose Tracking	HumanML3D	R Precision Top3	0.783	Fg-T2M
Pose Tracking	KIT Motion-Language	Diversity	10.93	Fg-T2M
Pose Tracking	KIT Motion-Language	FID	0.571	Fg-T2M
Pose Tracking	KIT Motion-Language	Multimodality	1.019	Fg-T2M
Pose Tracking	KIT Motion-Language	R Precision Top3	0.745	Fg-T2M
Motion Synthesis	HumanML3D	Diversity	9.278	Fg-T2M
Motion Synthesis	HumanML3D	FID	0.243	Fg-T2M
Motion Synthesis	HumanML3D	Multimodality	1.614	Fg-T2M
Motion Synthesis	HumanML3D	R Precision Top3	0.783	Fg-T2M
Motion Synthesis	KIT Motion-Language	Diversity	10.93	Fg-T2M
Motion Synthesis	KIT Motion-Language	FID	0.571	Fg-T2M
Motion Synthesis	KIT Motion-Language	Multimodality	1.019	Fg-T2M
Motion Synthesis	KIT Motion-Language	R Precision Top3	0.745	Fg-T2M
10-shot image generation	HumanML3D	Diversity	9.278	Fg-T2M
10-shot image generation	HumanML3D	FID	0.243	Fg-T2M
10-shot image generation	HumanML3D	Multimodality	1.614	Fg-T2M
10-shot image generation	HumanML3D	R Precision Top3	0.783	Fg-T2M
10-shot image generation	KIT Motion-Language	Diversity	10.93	Fg-T2M
10-shot image generation	KIT Motion-Language	FID	0.571	Fg-T2M
10-shot image generation	KIT Motion-Language	Multimodality	1.019	Fg-T2M
10-shot image generation	KIT Motion-Language	R Precision Top3	0.745	Fg-T2M
3D Human Pose Tracking	HumanML3D	Diversity	9.278	Fg-T2M
3D Human Pose Tracking	HumanML3D	FID	0.243	Fg-T2M
3D Human Pose Tracking	HumanML3D	Multimodality	1.614	Fg-T2M
3D Human Pose Tracking	HumanML3D	R Precision Top3	0.783	Fg-T2M
3D Human Pose Tracking	KIT Motion-Language	Diversity	10.93	Fg-T2M
3D Human Pose Tracking	KIT Motion-Language	FID	0.571	Fg-T2M
3D Human Pose Tracking	KIT Motion-Language	Multimodality	1.019	Fg-T2M
3D Human Pose Tracking	KIT Motion-Language	R Precision Top3	0.745	Fg-T2M

Fg-T2M: Fine-Grained Text-Driven Human Motion Generation via Diffusion Model

Abstract

Results

Related Papers

Fg-T2M: Fine-Grained Text-Driven Human Motion Generation via Diffusion Model

Abstract

Results

Related Papers