TasksSotADatasetsPapersMethodsSubmitAbout
Papers With Code 2

A community resource for machine learning research: papers, code, benchmarks, and state-of-the-art results.

Explore

Notable BenchmarksAll SotADatasetsPapersMethods

Community

Submit ResultsAbout

Data sourced from the PWC Archive (CC-BY-SA 4.0). Built by the community, for the community.

Papers/TEMOS: Generating diverse human motions from textual descr...

TEMOS: Generating diverse human motions from textual descriptions

Mathis Petrovich, Michael J. Black, Gül Varol

2022-04-25Motion Synthesis
PaperPDFCode(official)

Abstract

We address the problem of generating diverse 3D human motions from textual descriptions. This challenging task requires joint modeling of both modalities: understanding and extracting useful human-centric information from the text, and then generating plausible and realistic sequences of human poses. In contrast to most previous work which focuses on generating a single, deterministic, motion from a textual description, we design a variational approach that can produce multiple diverse human motions. We propose TEMOS, a text-conditioned generative model leveraging variational autoencoder (VAE) training with human motion data, in combination with a text encoder that produces distribution parameters compatible with the VAE latent space. We show the TEMOS framework can produce both skeleton-based animations as in prior work, as well more expressive SMPL body motions. We evaluate our approach on the KIT Motion-Language benchmark and, despite being relatively straightforward, demonstrate significant improvements over the state of the art. Code and models are available on our webpage.

Results

TaskDatasetMetricValueModel
Pose TrackingInter-XFID29.258TEMOS
Pose TrackingInter-XMMDist6.867TEMOS
Pose TrackingInter-XMModality0.672TEMOS
Pose TrackingInter-XR-Precision Top30.238TEMOS
Pose TrackingInterHumanFID17.375TEMOS
Pose TrackingInterHumanMMDist6.342TEMOS
Pose TrackingInterHumanMModality0.535TEMOS
Pose TrackingInterHumanR-Precision Top30.45TEMOS
Motion SynthesisInter-XFID29.258TEMOS
Motion SynthesisInter-XMMDist6.867TEMOS
Motion SynthesisInter-XMModality0.672TEMOS
Motion SynthesisInter-XR-Precision Top30.238TEMOS
Motion SynthesisInterHumanFID17.375TEMOS
Motion SynthesisInterHumanMMDist6.342TEMOS
Motion SynthesisInterHumanMModality0.535TEMOS
Motion SynthesisInterHumanR-Precision Top30.45TEMOS
10-shot image generationInter-XFID29.258TEMOS
10-shot image generationInter-XMMDist6.867TEMOS
10-shot image generationInter-XMModality0.672TEMOS
10-shot image generationInter-XR-Precision Top30.238TEMOS
10-shot image generationInterHumanFID17.375TEMOS
10-shot image generationInterHumanMMDist6.342TEMOS
10-shot image generationInterHumanMModality0.535TEMOS
10-shot image generationInterHumanR-Precision Top30.45TEMOS
3D Human Pose TrackingInter-XFID29.258TEMOS
3D Human Pose TrackingInter-XMMDist6.867TEMOS
3D Human Pose TrackingInter-XMModality0.672TEMOS
3D Human Pose TrackingInter-XR-Precision Top30.238TEMOS
3D Human Pose TrackingInterHumanFID17.375TEMOS
3D Human Pose TrackingInterHumanMMDist6.342TEMOS
3D Human Pose TrackingInterHumanMModality0.535TEMOS
3D Human Pose TrackingInterHumanR-Precision Top30.45TEMOS

Related Papers

DeepGesture: A conversational gesture synthesis system based on emotions and semantics2025-07-03VolumetricSMPL: A Neural Volumetric Body Model for Efficient Interactions, Contacts, and Collisions2025-06-29DuetGen: Music Driven Two-Person Dance Generation via Hierarchical Masked Modeling2025-06-23PlanMoGPT: Flow-Enhanced Progressive Planning for Text to Motion Synthesis2025-06-22Motion-R1: Chain-of-Thought Reasoning and Reinforcement Learning for Human Motion Generation2025-06-12DanceChat: Large Language Model-Guided Music-to-Dance Generation2025-06-12MotionRAG-Diff: A Retrieval-Augmented Diffusion Framework for Long-Term Music-to-Dance Generation2025-06-03MotionPro: A Precise Motion Controller for Image-to-Video Generation2025-05-26