TasksSotADatasetsPapersMethodsSubmitAbout
Papers With Code 2

A community resource for machine learning research: papers, code, benchmarks, and state-of-the-art results.

Explore

Notable BenchmarksAll SotADatasetsPapersMethods

Community

Submit ResultsAbout

Data sourced from the PWC Archive (CC-BY-SA 4.0). Built by the community, for the community.

Papers/MambaTalk: Efficient Holistic Gesture Synthesis with Selec...

MambaTalk: Efficient Holistic Gesture Synthesis with Selective State Space Models

Zunnan Xu, Yukang Lin, Haonan Han, Sicheng Yang, Ronghui Li, Yachao Zhang, Xiu Li

2024-03-143D Face AnimationGesture GenerationRhythm
PaperPDFCode(official)

Abstract

Gesture synthesis is a vital realm of human-computer interaction, with wide-ranging applications across various fields like film, robotics, and virtual reality. Recent advancements have utilized the diffusion model and attention mechanisms to improve gesture synthesis. However, due to the high computational complexity of these techniques, generating long and diverse sequences with low latency remains a challenge. We explore the potential of state space models (SSMs) to address the challenge, implementing a two-stage modeling strategy with discrete motion priors to enhance the quality of gestures. Leveraging the foundational Mamba block, we introduce MambaTalk, enhancing gesture diversity and rhythm through multimodal integration. Extensive experiments demonstrate that our method matches or exceeds the performance of state-of-the-art models. Our project is publicly available at https://kkakkkka.github.io/MambaTalk

Results

TaskDatasetMetricValueModel
3D Human Pose EstimationBEAT2MSE6.289MambaTalk
Pose EstimationBEAT2MSE6.289MambaTalk
3DBEAT2MSE6.289MambaTalk
3DBEAT2FGD0.5366MambaTalk
3D Face AnimationBEAT2MSE6.289MambaTalk
3D Shape GenerationBEAT2FGD0.5366MambaTalk
2D Human Pose EstimationBEAT2MSE6.289MambaTalk
3D Absolute Human Pose EstimationBEAT2MSE6.289MambaTalk
1 Image, 2*2 StitchiBEAT2MSE6.289MambaTalk

Related Papers

DeepGesture: A conversational gesture synthesis system based on emotions and semantics2025-07-03Exploring Adapter Design Tradeoffs for Low Resource Music Generation2025-06-26CBF-AFA: Chunk-Based Multi-SSL Fusion for Automatic Fluency Assessment2025-06-25Let Your Video Listen to Your Music!2025-06-23From Generality to Mastery: Composer-Style Symbolic Music Generation via Large-Scale Pre-training2025-06-20DanceChat: Large Language Model-Guided Music-to-Dance Generation2025-06-12Rhythm Features for Speaker Identification2025-06-07MMSU: A Massive Multi-task Spoken Language Understanding and Reasoning Benchmark2025-06-05