TasksSotADatasetsPapersMethodsSubmitAbout
Papers With Code 2

A community resource for machine learning research: papers, code, benchmarks, and state-of-the-art results.

Explore

Notable BenchmarksAll SotADatasetsPapersMethods

Community

Submit ResultsAbout

Data sourced from the PWC Archive (CC-BY-SA 4.0). Built by the community, for the community.

Papers/in2IN: Leveraging individual Information to Generate Human...

in2IN: Leveraging individual Information to Generate Human INteractions

Pablo Ruiz Ponce, German Barquero, Cristina Palmero, Sergio Escalera, Jose Garcia-Rodriguez

2024-04-15Large Language ModelMotion GenerationMotion SynthesisLanguage Modelling
PaperPDFCode(official)

Abstract

Generating human-human motion interactions conditioned on textual descriptions is a very useful application in many areas such as robotics, gaming, animation, and the metaverse. Alongside this utility also comes a great difficulty in modeling the highly dimensional inter-personal dynamics. In addition, properly capturing the intra-personal diversity of interactions has a lot of challenges. Current methods generate interactions with limited diversity of intra-person dynamics due to the limitations of the available datasets and conditioning strategies. For this, we introduce in2IN, a novel diffusion model for human-human motion generation which is conditioned not only on the textual description of the overall interaction but also on the individual descriptions of the actions performed by each person involved in the interaction. To train this model, we use a large language model to extend the InterHuman dataset with individual descriptions. As a result, in2IN achieves state-of-the-art performance in the InterHuman dataset. Furthermore, in order to increase the intra-personal diversity on the existing interaction datasets, we propose DualMDM, a model composition technique that combines the motions generated with in2IN and the motions generated by a single-person motion prior pre-trained on HumanML3D. As a result, DualMDM generates motions with higher individual diversity and improves control over the intra-person dynamics while maintaining inter-personal coherence.

Results

TaskDatasetMetricValueModel
Pose TrackingInterHumanFID5.177in2IN
Pose TrackingInterHumanMMDist3.79in2IN
Pose TrackingInterHumanMModality1.061in2IN
Pose TrackingInterHumanR-Precision Top30.687in2IN
Motion SynthesisInterHumanFID5.177in2IN
Motion SynthesisInterHumanMMDist3.79in2IN
Motion SynthesisInterHumanMModality1.061in2IN
Motion SynthesisInterHumanR-Precision Top30.687in2IN
10-shot image generationInterHumanFID5.177in2IN
10-shot image generationInterHumanMMDist3.79in2IN
10-shot image generationInterHumanMModality1.061in2IN
10-shot image generationInterHumanR-Precision Top30.687in2IN
3D Human Pose TrackingInterHumanFID5.177in2IN
3D Human Pose TrackingInterHumanMMDist3.79in2IN
3D Human Pose TrackingInterHumanMModality1.061in2IN
3D Human Pose TrackingInterHumanR-Precision Top30.687in2IN

Related Papers

Visual-Language Model Knowledge Distillation Method for Image Quality Assessment2025-07-21DENSE: Longitudinal Progress Note Generation with Temporal Modeling of Heterogeneous Clinical Notes Across Hospital Visits2025-07-18GeoReg: Weight-Constrained Few-Shot Regression for Socio-Economic Estimation using LLM2025-07-17The Generative Energy Arena (GEA): Incorporating Energy Awareness in Large Language Model (LLM) Human Evaluations2025-07-17Inverse Reinforcement Learning Meets Large Language Model Post-Training: Basics, Advances, and Opportunities2025-07-17Rethinking the Embodied Gap in Vision-and-Language Navigation: A Holistic Study of Physical and Visual Disparities2025-07-17Making Language Model a Hierarchical Classifier and Generator2025-07-17VisionThink: Smart and Efficient Vision Language Model via Reinforcement Learning2025-07-17