TasksSotADatasetsPapersMethodsSubmitAbout
Papers With Code 2

A community resource for machine learning research: papers, code, benchmarks, and state-of-the-art results.

Explore

Notable BenchmarksAll SotADatasetsPapersMethods

Community

Submit ResultsAbout

Data sourced from the PWC Archive (CC-BY-SA 4.0). Built by the community, for the community.

Papers/InterGen: Diffusion-based Multi-human Motion Generation un...

InterGen: Diffusion-based Multi-human Motion Generation under Complex Interactions

Han Liang, Wenqian Zhang, Wenxuan Li, Jingyi Yu, Lan Xu

2023-04-12DenoisingMotion GenerationMotion Synthesis
PaperPDFCode(official)

Abstract

We have recently seen tremendous progress in diffusion advances for generating realistic human motions. Yet, they largely disregard the multi-human interactions. In this paper, we present InterGen, an effective diffusion-based approach that incorporates human-to-human interactions into the motion diffusion process, which enables layman users to customize high-quality two-person interaction motions, with only text guidance. We first contribute a multimodal dataset, named InterHuman. It consists of about 107M frames for diverse two-person interactions, with accurate skeletal motions and 23,337 natural language descriptions. For the algorithm side, we carefully tailor the motion diffusion model to our two-person interaction setting. To handle the symmetry of human identities during interactions, we propose two cooperative transformer-based denoisers that explicitly share weights, with a mutual attention mechanism to further connect the two denoising processes. Then, we propose a novel representation for motion input in our interaction diffusion model, which explicitly formulates the global relations between the two performers in the world frame. We further introduce two novel regularization terms to encode spatial relations, equipped with a corresponding damping scheme during the training of our interaction diffusion model. Extensive experiments validate the effectiveness and generalizability of InterGen. Notably, it can generate more diverse and compelling two-person motions than previous methods and enables various downstream applications for human interactions.

Results

TaskDatasetMetricValueModel
Pose TrackingInter-XFID5.207InterGen
Pose TrackingInter-XMMDist9.58InterGen
Pose TrackingInter-XMModality3.686InterGen
Pose TrackingInter-XR-Precision Top30.429InterGen
Pose TrackingInterHumanFID5.918InterGen
Pose TrackingInterHumanMMDist5.108InterGen
Pose TrackingInterHumanMModality2.141InterGen
Pose TrackingInterHumanR-Precision Top30.624InterGen
Motion SynthesisInter-XFID5.207InterGen
Motion SynthesisInter-XMMDist9.58InterGen
Motion SynthesisInter-XMModality3.686InterGen
Motion SynthesisInter-XR-Precision Top30.429InterGen
Motion SynthesisInterHumanFID5.918InterGen
Motion SynthesisInterHumanMMDist5.108InterGen
Motion SynthesisInterHumanMModality2.141InterGen
Motion SynthesisInterHumanR-Precision Top30.624InterGen
10-shot image generationInter-XFID5.207InterGen
10-shot image generationInter-XMMDist9.58InterGen
10-shot image generationInter-XMModality3.686InterGen
10-shot image generationInter-XR-Precision Top30.429InterGen
10-shot image generationInterHumanFID5.918InterGen
10-shot image generationInterHumanMMDist5.108InterGen
10-shot image generationInterHumanMModality2.141InterGen
10-shot image generationInterHumanR-Precision Top30.624InterGen
3D Human Pose TrackingInter-XFID5.207InterGen
3D Human Pose TrackingInter-XMMDist9.58InterGen
3D Human Pose TrackingInter-XMModality3.686InterGen
3D Human Pose TrackingInter-XR-Precision Top30.429InterGen
3D Human Pose TrackingInterHumanFID5.918InterGen
3D Human Pose TrackingInterHumanMMDist5.108InterGen
3D Human Pose TrackingInterHumanMModality2.141InterGen
3D Human Pose TrackingInterHumanR-Precision Top30.624InterGen

Related Papers

fastWDM3D: Fast and Accurate 3D Healthy Tissue Inpainting2025-07-17Diffuman4D: 4D Consistent Human View Synthesis from Sparse-View Videos with Spatio-Temporal Diffusion Models2025-07-17Similarity-Guided Diffusion for Contrastive Sequential Recommendation2025-07-16HUG-VAS: A Hierarchical NURBS-Based Generative Model for Aortic Geometry Synthesis and Controllable Editing2025-07-15AirLLM: Diffusion Policy-based Adaptive LoRA for Remote Fine-Tuning of LLM over the Air2025-07-15SnapMoGen: Human Motion Generation from Expressive Texts2025-07-12A statistical physics framework for optimal learning2025-07-10Go to Zero: Towards Zero-shot Motion Generation with Million-scale Data2025-07-09