TasksSotADatasetsPapersMethodsSubmitAbout
Papers With Code 2

A community resource for machine learning research: papers, code, benchmarks, and state-of-the-art results.

Explore

Notable BenchmarksAll SotADatasetsPapersMethods

Community

Submit ResultsAbout

Data sourced from the PWC Archive (CC-BY-SA 4.0). Built by the community, for the community.

Papers/DisCoRD: Discrete Tokens to Continuous Motion via Rectifie...

DisCoRD: Discrete Tokens to Continuous Motion via Rectified Flow Decoding

Jungbin Cho, Junwan Kim, Jisoo Kim, Minseo Kim, Mingu Kang, Sungeun Hong, Tae-Hyun Oh, Youngjae Yu

2024-11-29QuantizationMotion Synthesis
PaperPDF

Abstract

Human motion is inherently continuous and dynamic, posing significant challenges for generative models. While discrete generation methods are widely used, they suffer from limited expressiveness and frame-wise noise artifacts. In contrast, continuous approaches produce smoother, more natural motion but often struggle to adhere to conditioning signals due to high-dimensional complexity and limited training data. To resolve this discord between discrete and continuous representations, we introduce DisCoRD: Discrete Tokens to Continuous Motion via Rectified Flow Decoding, a novel method that leverages rectified flow to decode discrete motion tokens in the continuous, raw motion space. Our core idea is to frame token decoding as a conditional generation task, ensuring that DisCoRD captures fine-grained dynamics and achieves smoother, more natural motions. Compatible with any discrete-based framework, our method enhances naturalness without compromising faithfulness to the conditioning signals on diverse settings. Extensive evaluations Our project page is available at: https://whwjdqls.github.io/discord.github.io/.

Results

TaskDatasetMetricValueModel
Pose TrackingHumanML3DFID0.032DisCoRD (+MoMask)
Pose TrackingHumanML3DMultimodality1.288DisCoRD (+MoMask)
Pose TrackingHumanML3DR Precision Top30.809DisCoRD (+MoMask)
Pose TrackingKIT Motion-LanguageFID0.169DisCoRD (+MoMask)
Pose TrackingKIT Motion-LanguageMultimodality1.266DisCoRD (+MoMask)
Pose TrackingKIT Motion-LanguageR Precision Top30.775DisCoRD (+MoMask)
Motion SynthesisHumanML3DFID0.032DisCoRD (+MoMask)
Motion SynthesisHumanML3DMultimodality1.288DisCoRD (+MoMask)
Motion SynthesisHumanML3DR Precision Top30.809DisCoRD (+MoMask)
Motion SynthesisKIT Motion-LanguageFID0.169DisCoRD (+MoMask)
Motion SynthesisKIT Motion-LanguageMultimodality1.266DisCoRD (+MoMask)
Motion SynthesisKIT Motion-LanguageR Precision Top30.775DisCoRD (+MoMask)
10-shot image generationHumanML3DFID0.032DisCoRD (+MoMask)
10-shot image generationHumanML3DMultimodality1.288DisCoRD (+MoMask)
10-shot image generationHumanML3DR Precision Top30.809DisCoRD (+MoMask)
10-shot image generationKIT Motion-LanguageFID0.169DisCoRD (+MoMask)
10-shot image generationKIT Motion-LanguageMultimodality1.266DisCoRD (+MoMask)
10-shot image generationKIT Motion-LanguageR Precision Top30.775DisCoRD (+MoMask)
3D Human Pose TrackingHumanML3DFID0.032DisCoRD (+MoMask)
3D Human Pose TrackingHumanML3DMultimodality1.288DisCoRD (+MoMask)
3D Human Pose TrackingHumanML3DR Precision Top30.809DisCoRD (+MoMask)
3D Human Pose TrackingKIT Motion-LanguageFID0.169DisCoRD (+MoMask)
3D Human Pose TrackingKIT Motion-LanguageMultimodality1.266DisCoRD (+MoMask)
3D Human Pose TrackingKIT Motion-LanguageR Precision Top30.775DisCoRD (+MoMask)

Related Papers

Efficient Deployment of Spiking Neural Networks on SpiNNaker2 for DVS Gesture Recognition Using Neuromorphic Intermediate Representation2025-09-04An End-to-End DNN Inference Framework for the SpiNNaker2 Neuromorphic MPSoC2025-07-18Task-Specific Audio Coding for Machines: Machine-Learned Latent Features Are Codes for That Machine2025-07-17Angle Estimation of a Single Source with Massive Uniform Circular Arrays2025-07-17Quantized Rank Reduction: A Communications-Efficient Federated Learning Scheme for Network-Critical Applications2025-07-15MGVQ: Could VQ-VAE Beat VAE? A Generalizable Tokenizer with Multi-group Quantization2025-07-14Lightweight Federated Learning over Wireless Edge Networks2025-07-13Vision Foundation Models as Effective Visual Tokenizers for Autoregressive Image Generation2025-07-11