TasksSotADatasetsPapersMethodsSubmitAbout
Papers With Code 2

A community resource for machine learning research: papers, code, benchmarks, and state-of-the-art results.

Explore

Notable BenchmarksAll SotADatasetsPapersMethods

Community

Submit ResultsAbout

Data sourced from the PWC Archive (CC-BY-SA 4.0). Built by the community, for the community.

Papers/One-Stage 3D Whole-Body Mesh Recovery with Component Aware...

One-Stage 3D Whole-Body Mesh Recovery with Component Aware Transformer

Jing Lin, Ailing Zeng, Haoqian Wang, Lei Zhang, Yu Li

2023-03-28CVPR 2023 13D Human Pose Estimation3D Human Reconstruction3D Multi-Person Mesh Recovery
PaperPDFCode(official)

Abstract

Whole-body mesh recovery aims to estimate the 3D human body, face, and hands parameters from a single image. It is challenging to perform this task with a single network due to resolution issues, i.e., the face and hands are usually located in extremely small regions. Existing works usually detect hands and faces, enlarge their resolution to feed in a specific network to predict the parameter, and finally fuse the results. While this copy-paste pipeline can capture the fine-grained details of the face and hands, the connections between different parts cannot be easily recovered in late fusion, leading to implausible 3D rotation and unnatural pose. In this work, we propose a one-stage pipeline for expressive whole-body mesh recovery, named OSX, without separate networks for each part. Specifically, we design a Component Aware Transformer (CAT) composed of a global body encoder and a local face/hand decoder. The encoder predicts the body parameters and provides a high-quality feature map for the decoder, which performs a feature-level upsample-crop scheme to extract high-resolution part-specific features and adopt keypoint-guided deformable attention to estimate hand and face precisely. The whole pipeline is simple yet effective without any manual post-processing and naturally avoids implausible prediction. Comprehensive experiments demonstrate the effectiveness of OSX. Lastly, we build a large-scale Upper-Body dataset (UBody) with high-quality 2D and 3D whole-body annotations. It contains persons with partially visible bodies in diverse real-life scenarios to bridge the gap between the basic task and downstream applications.

Results

TaskDatasetMetricValueModel
ReconstructionEHFMPVPE70.8OSX
ReconstructionEHFPA V2V (mm), face6OSX
ReconstructionEHFPA V2V (mm), whole body48.7OSX
3D Human Pose Estimation3DPWMPJPE74.7OSX
3D Human Pose Estimation3DPWPA-MPJPE45.1OSX
3D Human Pose EstimationUBodyPA-PVE-All42.2OSX
3D Human Pose EstimationUBodyPA-PVE-Face2OSX
3D Human Pose EstimationUBodyPA-PVE-Hands8.6OSX
3D Human Pose EstimationUBodyPVE-All81.9OSX
3D Human Pose EstimationUBodyPVE-Face21.2OSX
3D Human Pose EstimationUBodyPVE-Hands41.5OSX
3D Human Pose EstimationAGORAB-NMVE85.3OSX
3D Human Pose EstimationAGORAF-MVE36.2OSX
3D Human Pose EstimationAGORAFB-MVE122.8OSX
3D Human Pose EstimationAGORAFB-NMVE130.6OSX
3D Human Pose EstimationAGORALH/RH-MVE45.7OSX
Pose Estimation3DPWMPJPE74.7OSX
Pose Estimation3DPWPA-MPJPE45.1OSX
Pose EstimationUBodyPA-PVE-All42.2OSX
Pose EstimationUBodyPA-PVE-Face2OSX
Pose EstimationUBodyPA-PVE-Hands8.6OSX
Pose EstimationUBodyPVE-All81.9OSX
Pose EstimationUBodyPVE-Face21.2OSX
Pose EstimationUBodyPVE-Hands41.5OSX
Pose EstimationAGORAB-NMVE85.3OSX
Pose EstimationAGORAF-MVE36.2OSX
Pose EstimationAGORAFB-MVE122.8OSX
Pose EstimationAGORAFB-NMVE130.6OSX
Pose EstimationAGORALH/RH-MVE45.7OSX
3D3DPWMPJPE74.7OSX
3D3DPWPA-MPJPE45.1OSX
3DUBodyPA-PVE-All42.2OSX
3DUBodyPA-PVE-Face2OSX
3DUBodyPA-PVE-Hands8.6OSX
3DUBodyPVE-All81.9OSX
3DUBodyPVE-Face21.2OSX
3DUBodyPVE-Hands41.5OSX
3DAGORAB-NMVE85.3OSX
3DAGORAF-MVE36.2OSX
3DAGORAFB-MVE122.8OSX
3DAGORAFB-NMVE130.6OSX
3DAGORALH/RH-MVE45.7OSX
3D Multi-Person Pose EstimationAGORAB-NMVE85.3OSX
3D Multi-Person Pose EstimationAGORAF-MVE36.2OSX
3D Multi-Person Pose EstimationAGORAFB-MVE122.8OSX
3D Multi-Person Pose EstimationAGORAFB-NMVE130.6OSX
3D Multi-Person Pose EstimationAGORALH/RH-MVE45.7OSX
1 Image, 2*2 Stitchi3DPWMPJPE74.7OSX
1 Image, 2*2 Stitchi3DPWPA-MPJPE45.1OSX
1 Image, 2*2 StitchiUBodyPA-PVE-All42.2OSX
1 Image, 2*2 StitchiUBodyPA-PVE-Face2OSX
1 Image, 2*2 StitchiUBodyPA-PVE-Hands8.6OSX
1 Image, 2*2 StitchiUBodyPVE-All81.9OSX
1 Image, 2*2 StitchiUBodyPVE-Face21.2OSX
1 Image, 2*2 StitchiUBodyPVE-Hands41.5OSX
1 Image, 2*2 StitchiAGORAB-NMVE85.3OSX
1 Image, 2*2 StitchiAGORAF-MVE36.2OSX
1 Image, 2*2 StitchiAGORAFB-MVE122.8OSX
1 Image, 2*2 StitchiAGORAFB-NMVE130.6OSX
1 Image, 2*2 StitchiAGORALH/RH-MVE45.7OSX

Related Papers

Systematic Comparison of Projection Methods for Monocular 3D Human Pose Estimation on Fisheye Images2025-06-24ExtPose: Robust and Coherent Pose Estimation by Extending ViTs2025-06-18PoseGRAF: Geometric-Reinforced Adaptive Fusion for Monocular 3D Human Pose Estimation2025-06-17PF-LHM: 3D Animatable Avatar Reconstruction from Pose-free Articulated Human Images2025-06-16SMPL Normal Map Is All You Need for Single-view Textured Human Reconstruction2025-06-15Learning Pyramid-structured Long-range Dependencies for 3D Human Pose Estimation2025-06-03HumanRAM: Feed-forward Human Reconstruction and Animation Model using Transformers2025-06-03UPTor: Unified 3D Human Pose Dynamics and Trajectory Prediction for Human-Robot Interaction2025-05-20