TasksSotADatasetsPapersMethodsSubmitAbout
Papers With Code 2

A community resource for machine learning research: papers, code, benchmarks, and state-of-the-art results.

Explore

Notable BenchmarksAll SotADatasetsPapersMethods

Community

Submit ResultsAbout

Data sourced from the PWC Archive (CC-BY-SA 4.0). Built by the community, for the community.

Papers/Deep Monocular 3D Human Pose Estimation via Cascaded Dimen...

Deep Monocular 3D Human Pose Estimation via Cascaded Dimension-Lifting

Changgong Zhang, Fangneng Zhan, Yuan Chang

2021-04-083D Human Pose EstimationMonocular 3D Human Pose EstimationPose EstimationDepth Estimation3D Multi-Person Pose Estimation (root-relative)3D Multi-Person Pose Estimation (absolute)3D Pose Estimation
PaperPDF

Abstract

The 3D pose estimation from a single image is a challenging problem due to depth ambiguity. One type of the previous methods lifts 2D joints, obtained by resorting to external 2D pose detectors, to the 3D space. However, this type of approaches discards the contextual information of images which are strong cues for 3D pose estimation. Meanwhile, some other methods predict the joints directly from monocular images but adopt a 2.5D output representation $P^{2.5D} = (u,v,z^{r}) $ where both $u$ and $v$ are in the image space but $z^{r}$ in root-relative 3D space. Thus, the ground-truth information (e.g., the depth of root joint from the camera) is normally utilized to transform the 2.5D output to the 3D space, which limits the applicability in practice. In this work, we propose a novel end-to-end framework that not only exploits the contextual information but also produces the output directly in the 3D space via cascaded dimension-lifting. Specifically, we decompose the task of lifting pose from 2D image space to 3D spatial space into several sequential sub-tasks, 1) kinematic skeletons \& individual joints estimation in 2D space, 2) root-relative depth estimation, and 3) lifting to the 3D space, each of which employs direct supervisions and contextual image features to guide the learning process. Extensive experiments show that the proposed framework achieves state-of-the-art performance on two widely used 3D human pose datasets (Human3.6M, MuPoTS-3D).

Results

TaskDatasetMetricValueModel
3D Multi-Person Pose Estimation (root-relative)MuPoTS-3D3DPCK83.3Cascaded Dimension-Lifting
3D Human Pose EstimationMuPoTS-3D3DPCK83.3Cascaded Dimension-Lifting
Pose EstimationMuPoTS-3D3DPCK83.3Cascaded Dimension-Lifting
3DMuPoTS-3D3DPCK83.3Cascaded Dimension-Lifting
3D Multi-Person Pose EstimationMuPoTS-3D3DPCK83.3Cascaded Dimension-Lifting
1 Image, 2*2 StitchiMuPoTS-3D3DPCK83.3Cascaded Dimension-Lifting

Related Papers

$π^3$: Scalable Permutation-Equivariant Visual Geometry Learning2025-07-17Revisiting Reliability in the Reasoning-based Pose Estimation Benchmark2025-07-17DINO-VO: A Feature-based Visual Odometry Leveraging a Visual Foundation Model2025-07-17From Neck to Head: Bio-Impedance Sensing for Head Pose Estimation2025-07-17AthleticsPose: Authentic Sports Motion Dataset on Athletic Field and Evaluation of Monocular 3D Pose Estimation Ability2025-07-17$S^2M^2$: Scalable Stereo Matching Model for Reliable Depth Estimation2025-07-17SpatialTrackerV2: 3D Point Tracking Made Easy2025-07-16SGLoc: Semantic Localization System for Camera Pose Estimation from 3D Gaussian Splatting Representation2025-07-16