DisCoRD: Discrete Tokens to Continuous Motion via Rectified Flow Decoding

Jungbin Cho, Junwan Kim, Jisoo Kim, Minseo Kim, Mingu Kang, Sungeun Hong, Tae-Hyun Oh, Youngjae Yu

2024-11-29Quantization Motion Synthesis

Abstract

Human motion is inherently continuous and dynamic, posing significant challenges for generative models. While discrete generation methods are widely used, they suffer from limited expressiveness and frame-wise noise artifacts. In contrast, continuous approaches produce smoother, more natural motion but often struggle to adhere to conditioning signals due to high-dimensional complexity and limited training data. To resolve this discord between discrete and continuous representations, we introduce DisCoRD: Discrete Tokens to Continuous Motion via Rectified Flow Decoding, a novel method that leverages rectified flow to decode discrete motion tokens in the continuous, raw motion space. Our core idea is to frame token decoding as a conditional generation task, ensuring that DisCoRD captures fine-grained dynamics and achieves smoother, more natural motions. Compatible with any discrete-based framework, our method enhances naturalness without compromising faithfulness to the conditioning signals on diverse settings. Extensive evaluations Our project page is available at: https://whwjdqls.github.io/discord.github.io/.

Results

Task	Dataset	Metric	Value	Model
Pose Tracking	HumanML3D	FID	0.032	DisCoRD (+MoMask)
Pose Tracking	HumanML3D	Multimodality	1.288	DisCoRD (+MoMask)
Pose Tracking	HumanML3D	R Precision Top3	0.809	DisCoRD (+MoMask)
Pose Tracking	KIT Motion-Language	FID	0.169	DisCoRD (+MoMask)
Pose Tracking	KIT Motion-Language	Multimodality	1.266	DisCoRD (+MoMask)
Pose Tracking	KIT Motion-Language	R Precision Top3	0.775	DisCoRD (+MoMask)
Motion Synthesis	HumanML3D	FID	0.032	DisCoRD (+MoMask)
Motion Synthesis	HumanML3D	Multimodality	1.288	DisCoRD (+MoMask)
Motion Synthesis	HumanML3D	R Precision Top3	0.809	DisCoRD (+MoMask)
Motion Synthesis	KIT Motion-Language	FID	0.169	DisCoRD (+MoMask)
Motion Synthesis	KIT Motion-Language	Multimodality	1.266	DisCoRD (+MoMask)
Motion Synthesis	KIT Motion-Language	R Precision Top3	0.775	DisCoRD (+MoMask)
10-shot image generation	HumanML3D	FID	0.032	DisCoRD (+MoMask)
10-shot image generation	HumanML3D	Multimodality	1.288	DisCoRD (+MoMask)
10-shot image generation	HumanML3D	R Precision Top3	0.809	DisCoRD (+MoMask)
10-shot image generation	KIT Motion-Language	FID	0.169	DisCoRD (+MoMask)
10-shot image generation	KIT Motion-Language	Multimodality	1.266	DisCoRD (+MoMask)
10-shot image generation	KIT Motion-Language	R Precision Top3	0.775	DisCoRD (+MoMask)
3D Human Pose Tracking	HumanML3D	FID	0.032	DisCoRD (+MoMask)
3D Human Pose Tracking	HumanML3D	Multimodality	1.288	DisCoRD (+MoMask)
3D Human Pose Tracking	HumanML3D	R Precision Top3	0.809	DisCoRD (+MoMask)
3D Human Pose Tracking	KIT Motion-Language	FID	0.169	DisCoRD (+MoMask)
3D Human Pose Tracking	KIT Motion-Language	Multimodality	1.266	DisCoRD (+MoMask)
3D Human Pose Tracking	KIT Motion-Language	R Precision Top3	0.775	DisCoRD (+MoMask)

DisCoRD: Discrete Tokens to Continuous Motion via Rectified Flow Decoding

Abstract

Results

Related Papers

DisCoRD: Discrete Tokens to Continuous Motion via Rectified Flow Decoding

Abstract

Results

Related Papers