TasksSotADatasetsPapersMethodsSubmitAbout
Papers With Code 2

A community resource for machine learning research: papers, code, benchmarks, and state-of-the-art results.

Explore

Notable BenchmarksAll SotADatasetsPapersMethods

Community

Submit ResultsAbout

Data sourced from the PWC Archive (CC-BY-SA 4.0). Built by the community, for the community.

Papers/Rhythmic Gesticulator: Rhythm-Aware Co-Speech Gesture Synt...

Rhythmic Gesticulator: Rhythm-Aware Co-Speech Gesture Synthesis with Hierarchical Neural Embeddings

Tenglong Ao, Qingzhe Gao, Yuke Lou, Baoquan Chen, Libin Liu

2022-10-04Gesture GenerationRhythm
PaperPDFCode(official)

Abstract

Automatic synthesis of realistic co-speech gestures is an increasingly important yet challenging task in artificial embodied agent creation. Previous systems mainly focus on generating gestures in an end-to-end manner, which leads to difficulties in mining the clear rhythm and semantics due to the complex yet subtle harmony between speech and gestures. We present a novel co-speech gesture synthesis method that achieves convincing results both on the rhythm and semantics. For the rhythm, our system contains a robust rhythm-based segmentation pipeline to ensure the temporal coherence between the vocalization and gestures explicitly. For the gesture semantics, we devise a mechanism to effectively disentangle both low- and high-level neural embeddings of speech and motion based on linguistic theory. The high-level embedding corresponds to semantics, while the low-level embedding relates to subtle variations. Lastly, we build correspondence between the hierarchical embeddings of the speech and the motion, resulting in rhythm- and semantics-aware gesture synthesis. Evaluations with existing objective metrics, a newly proposed rhythmic metric, and human feedback show that our method outperforms state-of-the-art systems by a clear margin.

Results

TaskDatasetMetricValueModel
3DTED Gesture DatasetFGD2.04Rhythmic Gesticulator
3D Shape GenerationTED Gesture DatasetFGD2.04Rhythmic Gesticulator

Related Papers

DeepGesture: A conversational gesture synthesis system based on emotions and semantics2025-07-03Exploring Adapter Design Tradeoffs for Low Resource Music Generation2025-06-26CBF-AFA: Chunk-Based Multi-SSL Fusion for Automatic Fluency Assessment2025-06-25Let Your Video Listen to Your Music!2025-06-23From Generality to Mastery: Composer-Style Symbolic Music Generation via Large-Scale Pre-training2025-06-20DanceChat: Large Language Model-Guided Music-to-Dance Generation2025-06-12Rhythm Features for Speaker Identification2025-06-07MMSU: A Massive Multi-task Spoken Language Understanding and Reasoning Benchmark2025-06-05