TasksSotADatasetsPapersMethodsSubmitAbout
Papers With Code 2

A community resource for machine learning research: papers, code, benchmarks, and state-of-the-art results.

Explore

Notable BenchmarksAll SotADatasetsPapersMethods

Community

Submit ResultsAbout

Data sourced from the PWC Archive (CC-BY-SA 4.0). Built by the community, for the community.

Papers/Multi-HMR: Multi-Person Whole-Body Human Mesh Recovery in ...

Multi-HMR: Multi-Person Whole-Body Human Mesh Recovery in a Single Shot

Fabien Baradel, Matthieu Armando, Salma Galaaoui, Romain Brégier, Philippe Weinzaepfel, Grégory Rogez, Thomas Lucas

2024-02-223D Human Pose EstimationHuman Mesh Recovery3D Human Reconstruction3D Multi-Person Pose Estimation3D Multi-Person Mesh Recovery
PaperPDFCode(official)

Abstract

We present Multi-HMR, a strong sigle-shot model for multi-person 3D human mesh recovery from a single RGB image. Predictions encompass the whole body, i.e., including hands and facial expressions, using the SMPL-X parametric model and 3D location in the camera coordinate system. Our model detects people by predicting coarse 2D heatmaps of person locations, using features produced by a standard Vision Transformer (ViT) backbone. It then predicts their whole-body pose, shape and 3D location using a new cross-attention module called the Human Prediction Head (HPH), with one query attending to the entire set of features for each detected person. As direct prediction of fine-grained hands and facial poses in a single shot, i.e., without relying on explicit crops around body parts, is hard to learn from existing data, we introduce CUFFS, the Close-Up Frames of Full-Body Subjects dataset, containing humans close to the camera with diverse hand poses. We show that incorporating it into the training data further enhances predictions, particularly for hands. Multi-HMR also optionally accounts for camera intrinsics, if available, by encoding camera ray directions for each image token. This simple design achieves strong performance on whole-body and body-only benchmarks simultaneously: a ViT-S backbone on $448{\times}448$ images already yields a fast and competitive model, while larger models and higher resolutions obtain state-of-the-art results.

Results

TaskDatasetMetricValueModel
ReconstructionEHFMPVPE44.2Multi-HMR
ReconstructionEHFPA V2V (mm), face5.5Multi-HMR
ReconstructionEHFPA V2V (mm), whole body32.7Multi-HMR
3D Human Pose EstimationUBodyPA-PVE-All23.6Multi-HMR
3D Human Pose EstimationUBodyPA-PVE-Face1.8Multi-HMR
3D Human Pose EstimationUBodyPA-PVE-Hands7Multi-HMR
3D Human Pose EstimationUBodyPVE-All56.4Multi-HMR
3D Human Pose EstimationUBodyPVE-Face19.3Multi-HMR
3D Human Pose EstimationUBodyPVE-Hands24.9Multi-HMR
3D Human Pose EstimationMuPoTS-3D3DPCK89.5Multi-HMR
3D Human Pose EstimationAGORAFB-MVE95.9Multi-HMR
3D Human Pose EstimationAGORAFB-NMVE102Multi-HMR
Pose EstimationUBodyPA-PVE-All23.6Multi-HMR
Pose EstimationUBodyPA-PVE-Face1.8Multi-HMR
Pose EstimationUBodyPA-PVE-Hands7Multi-HMR
Pose EstimationUBodyPVE-All56.4Multi-HMR
Pose EstimationUBodyPVE-Face19.3Multi-HMR
Pose EstimationUBodyPVE-Hands24.9Multi-HMR
Pose EstimationMuPoTS-3D3DPCK89.5Multi-HMR
Pose EstimationAGORAFB-MVE95.9Multi-HMR
Pose EstimationAGORAFB-NMVE102Multi-HMR
3DUBodyPA-PVE-All23.6Multi-HMR
3DUBodyPA-PVE-Face1.8Multi-HMR
3DUBodyPA-PVE-Hands7Multi-HMR
3DUBodyPVE-All56.4Multi-HMR
3DUBodyPVE-Face19.3Multi-HMR
3DUBodyPVE-Hands24.9Multi-HMR
3DMuPoTS-3D3DPCK89.5Multi-HMR
3DAGORAFB-MVE95.9Multi-HMR
3DAGORAFB-NMVE102Multi-HMR
3D Multi-Person Pose EstimationMuPoTS-3D3DPCK89.5Multi-HMR
3D Multi-Person Pose EstimationAGORAFB-MVE95.9Multi-HMR
3D Multi-Person Pose EstimationAGORAFB-NMVE102Multi-HMR
Human Mesh RecoveryBEDLAMPVE-All76.8Multi-HMR
1 Image, 2*2 StitchiUBodyPA-PVE-All23.6Multi-HMR
1 Image, 2*2 StitchiUBodyPA-PVE-Face1.8Multi-HMR
1 Image, 2*2 StitchiUBodyPA-PVE-Hands7Multi-HMR
1 Image, 2*2 StitchiUBodyPVE-All56.4Multi-HMR
1 Image, 2*2 StitchiUBodyPVE-Face19.3Multi-HMR
1 Image, 2*2 StitchiUBodyPVE-Hands24.9Multi-HMR
1 Image, 2*2 StitchiMuPoTS-3D3DPCK89.5Multi-HMR
1 Image, 2*2 StitchiAGORAFB-MVE95.9Multi-HMR
1 Image, 2*2 StitchiAGORAFB-NMVE102Multi-HMR

Related Papers

Systematic Comparison of Projection Methods for Monocular 3D Human Pose Estimation on Fisheye Images2025-06-24ExtPose: Robust and Coherent Pose Estimation by Extending ViTs2025-06-18PoseGRAF: Geometric-Reinforced Adaptive Fusion for Monocular 3D Human Pose Estimation2025-06-17PF-LHM: 3D Animatable Avatar Reconstruction from Pose-free Articulated Human Images2025-06-16SMPL Normal Map Is All You Need for Single-view Textured Human Reconstruction2025-06-15MetricHMR: Metric Human Mesh Recovery from Monocular Images2025-06-11Learning Pyramid-structured Long-range Dependencies for 3D Human Pose Estimation2025-06-03HumanRAM: Feed-forward Human Reconstruction and Animation Model using Transformers2025-06-03