TasksSotADatasetsPapersMethodsSubmitAbout
Papers With Code 2

A community resource for machine learning research: papers, code, benchmarks, and state-of-the-art results.

Explore

Notable BenchmarksAll SotADatasetsPapersMethods

Community

Submit ResultsAbout

Data sourced from the PWC Archive (CC-BY-SA 4.0). Built by the community, for the community.

Papers/Dual networks based 3D Multi-Person Pose Estimation from M...

Dual networks based 3D Multi-Person Pose Estimation from Monocular Video

Yu Cheng, Bo wang, Robby T. Tan

2022-05-023D Human Pose EstimationHuman DetectionMonocular 3D Human Pose EstimationPose EstimationMulti-Person Pose Estimation3D Multi-Person Pose Estimation (root-relative)3D Multi-Person Pose Estimation (absolute)3D Pose Estimation3D Multi-Person Pose Estimation
PaperPDFCode(official)

Abstract

Monocular 3D human pose estimation has made progress in recent years. Most of the methods focus on single persons, which estimate the poses in the person-centric coordinates, i.e., the coordinates based on the center of the target person. Hence, these methods are inapplicable for multi-person 3D pose estimation, where the absolute coordinates (e.g., the camera coordinates) are required. Moreover, multi-person pose estimation is more challenging than single pose estimation, due to inter-person occlusion and close human interactions. Existing top-down multi-person methods rely on human detection (i.e., top-down approach), and thus suffer from the detection errors and cannot produce reliable pose estimation in multi-person scenes. Meanwhile, existing bottom-up methods that do not use human detection are not affected by detection errors, but since they process all persons in a scene at once, they are prone to errors, particularly for persons in small scales. To address all these challenges, we propose the integration of top-down and bottom-up approaches to exploit their strengths. Our top-down network estimates human joints from all persons instead of one in an image patch, making it robust to possible erroneous bounding boxes. Our bottom-up network incorporates human-detection based normalized heatmaps, allowing the network to be more robust in handling scale variations. Finally, the estimated 3D poses from the top-down and bottom-up networks are fed into our integration network for final 3D poses. To address the common gaps between training and testing data, we do optimization during the test time, by refining the estimated 3D human poses using high-order temporal constraint, re-projection loss, and bone length regularizations. Our evaluations demonstrate the effectiveness of the proposed method. Code and models are available: https://github.com/3dpose/3D-Multi-Person-Pose.

Results

TaskDatasetMetricValueModel
3D Multi-Person Pose Estimation (root-relative)MuPoTS-3D3DPCK89.6Dual network
3D Human Pose Estimation3DPWPA-MPJPE61.7Dual network
3D Human Pose EstimationHuman3.6MAverage MPJPE (mm)49.31Dual network
3D Human Pose EstimationJTAF1(t=0.4m)58.15Dual network
3D Human Pose EstimationJTAF1(t=0.8m)69.32Dual network
3D Human Pose EstimationJTAF1(t=1.2m)74.19Dual network
3D Human Pose EstimationMuPoTS-3D3DPCK89.6Dual network
Pose Estimation3DPWPA-MPJPE61.7Dual network
Pose EstimationHuman3.6MAverage MPJPE (mm)49.31Dual network
Pose EstimationJTAF1(t=0.4m)58.15Dual network
Pose EstimationJTAF1(t=0.8m)69.32Dual network
Pose EstimationJTAF1(t=1.2m)74.19Dual network
Pose EstimationMuPoTS-3D3DPCK89.6Dual network
3D3DPWPA-MPJPE61.7Dual network
3DHuman3.6MAverage MPJPE (mm)49.31Dual network
3DJTAF1(t=0.4m)58.15Dual network
3DJTAF1(t=0.8m)69.32Dual network
3DJTAF1(t=1.2m)74.19Dual network
3DMuPoTS-3D3DPCK89.6Dual network
3D Multi-Person Pose EstimationMuPoTS-3D3DPCK89.6Dual network
1 Image, 2*2 Stitchi3DPWPA-MPJPE61.7Dual network
1 Image, 2*2 StitchiHuman3.6MAverage MPJPE (mm)49.31Dual network
1 Image, 2*2 StitchiJTAF1(t=0.4m)58.15Dual network
1 Image, 2*2 StitchiJTAF1(t=0.8m)69.32Dual network
1 Image, 2*2 StitchiJTAF1(t=1.2m)74.19Dual network
1 Image, 2*2 StitchiMuPoTS-3D3DPCK89.6Dual network

Related Papers

$π^3$: Scalable Permutation-Equivariant Visual Geometry Learning2025-07-17Revisiting Reliability in the Reasoning-based Pose Estimation Benchmark2025-07-17DINO-VO: A Feature-based Visual Odometry Leveraging a Visual Foundation Model2025-07-17From Neck to Head: Bio-Impedance Sensing for Head Pose Estimation2025-07-17AthleticsPose: Authentic Sports Motion Dataset on Athletic Field and Evaluation of Monocular 3D Pose Estimation Ability2025-07-17SpatialTrackerV2: 3D Point Tracking Made Easy2025-07-16SGLoc: Semantic Localization System for Camera Pose Estimation from 3D Gaussian Splatting Representation2025-07-16Efficient Calisthenics Skills Classification through Foreground Instance Selection and Depth Estimation2025-07-16