BiPO: Bidirectional Partial Occlusion Network for Text-to-Motion Synthesis

Seong-Eun Hong, Soobin Lim, Juyeong Hwang, Minwook Chang, Hyeongyeop Kang

2024-11-28Motion Generation Motion Synthesis

Abstract

Generating natural and expressive human motions from textual descriptions is challenging due to the complexity of coordinating full-body dynamics and capturing nuanced motion patterns over extended sequences that accurately reflect the given text. To address this, we introduce BiPO, Bidirectional Partial Occlusion Network for Text-to-Motion Synthesis, a novel model that enhances text-to-motion synthesis by integrating part-based generation with a bidirectional autoregressive architecture. This integration allows BiPO to consider both past and future contexts during generation while enhancing detailed control over individual body parts without requiring ground-truth motion length. To relax the interdependency among body parts caused by the integration, we devise the Partial Occlusion technique, which probabilistically occludes the certain motion part information during training. In our comprehensive experiments, BiPO achieves state-of-the-art performance on the HumanML3D dataset, outperforming recent methods such as ParCo, MoMask, and BAMM in terms of FID scores and overall motion quality. Notably, BiPO excels not only in the text-to-motion generation task but also in motion editing tasks that synthesize motion based on partially generated motion sequences and textual descriptions. These results reveal the BiPO's effectiveness in advancing text-to-motion synthesis and its potential for practical applications.

Results

Task	Dataset	Metric	Value	Model
Pose Tracking	HumanML3D	Diversity	9.556	BiPO
Pose Tracking	HumanML3D	FID	0.03	BiPO
Pose Tracking	HumanML3D	Multimodality	1.374	BiPO
Pose Tracking	HumanML3D	R Precision Top3	0.809	BiPO
Pose Tracking	KIT Motion-Language	Diversity	10.833	BiPO
Pose Tracking	KIT Motion-Language	FID	0.164	BiPO
Pose Tracking	KIT Motion-Language	Multimodality	1.098	BiPO
Pose Tracking	KIT Motion-Language	R Precision Top3	0.803	BiPO
Motion Synthesis	HumanML3D	Diversity	9.556	BiPO
Motion Synthesis	HumanML3D	FID	0.03	BiPO
Motion Synthesis	HumanML3D	Multimodality	1.374	BiPO
Motion Synthesis	HumanML3D	R Precision Top3	0.809	BiPO
Motion Synthesis	KIT Motion-Language	Diversity	10.833	BiPO
Motion Synthesis	KIT Motion-Language	FID	0.164	BiPO
Motion Synthesis	KIT Motion-Language	Multimodality	1.098	BiPO
Motion Synthesis	KIT Motion-Language	R Precision Top3	0.803	BiPO
10-shot image generation	HumanML3D	Diversity	9.556	BiPO
10-shot image generation	HumanML3D	FID	0.03	BiPO
10-shot image generation	HumanML3D	Multimodality	1.374	BiPO
10-shot image generation	HumanML3D	R Precision Top3	0.809	BiPO
10-shot image generation	KIT Motion-Language	Diversity	10.833	BiPO
10-shot image generation	KIT Motion-Language	FID	0.164	BiPO
10-shot image generation	KIT Motion-Language	Multimodality	1.098	BiPO
10-shot image generation	KIT Motion-Language	R Precision Top3	0.803	BiPO
3D Human Pose Tracking	HumanML3D	Diversity	9.556	BiPO
3D Human Pose Tracking	HumanML3D	FID	0.03	BiPO
3D Human Pose Tracking	HumanML3D	Multimodality	1.374	BiPO
3D Human Pose Tracking	HumanML3D	R Precision Top3	0.809	BiPO
3D Human Pose Tracking	KIT Motion-Language	Diversity	10.833	BiPO
3D Human Pose Tracking	KIT Motion-Language	FID	0.164	BiPO
3D Human Pose Tracking	KIT Motion-Language	Multimodality	1.098	BiPO
3D Human Pose Tracking	KIT Motion-Language	R Precision Top3	0.803	BiPO

BiPO: Bidirectional Partial Occlusion Network for Text-to-Motion Synthesis

Abstract

Results

Related Papers

BiPO: Bidirectional Partial Occlusion Network for Text-to-Motion Synthesis

Abstract

Results

Related Papers