TasksSotADatasetsPapersMethodsSubmitAbout
Papers With Code 2

A community resource for machine learning research: papers, code, benchmarks, and state-of-the-art results.

Explore

Notable BenchmarksAll SotADatasetsPapersMethods

Community

Submit ResultsAbout

Data sourced from the PWC Archive (CC-BY-SA 4.0). Built by the community, for the community.

Papers/Dynamic Graph Reasoning for Multi-person 3D Pose Estimation

Dynamic Graph Reasoning for Multi-person 3D Pose Estimation

Zhongwei Qiu, Qiansheng Yang, Jian Wang, Dongmei Fu

2022-07-223D Human Pose EstimationPose Estimation3D Multi-Person Pose Estimation (root-relative)3D Multi-Person Pose Estimation (absolute)3D Pose Estimation3D Multi-Person Pose Estimation
PaperPDF

Abstract

Multi-person 3D pose estimation is a challenging task because of occlusion and depth ambiguity, especially in the cases of crowd scenes. To solve these problems, most existing methods explore modeling body context cues by enhancing feature representation with graph neural networks or adding structural constraints. However, these methods are not robust for their single-root formulation that decoding 3D poses from a root node with a pre-defined graph. In this paper, we propose GR-M3D, which models the \textbf{M}ulti-person \textbf{3D} pose estimation with dynamic \textbf{G}raph \textbf{R}easoning. The decoding graph in GR-M3D is predicted instead of pre-defined. In particular, It firstly generates several data maps and enhances them with a scale and depth aware refinement module (SDAR). Then multiple root keypoints and dense decoding paths for each person are estimated from these data maps. Based on them, dynamic decoding graphs are built by assigning path weights to the decoding paths, while the path weights are inferred from those enhanced data maps. And this process is named dynamic graph reasoning (DGR). Finally, the 3D poses are decoded according to dynamic decoding graphs for each detected person. GR-M3D can adjust the structure of the decoding graph implicitly by adopting soft path weights according to input data, which makes the decoding graphs be adaptive to different input persons to the best extent and more capable of handling occlusion and depth ambiguity than previous methods. We empirically show that the proposed bottom-up approach even outperforms top-down methods and achieves state-of-the-art results on three 3D pose datasets.

Results

TaskDatasetMetricValueModel
3D Multi-Person Pose Estimation (root-relative)MuPoTS-3D3DPCK84.6GR-M3D
3D Multi-Person Pose Estimation (root-relative)MuPoTS-3DAUC44.1GR-M3D
3D Human Pose EstimationPanopticAverage MPJPE (mm)57.9GR-M3D
3D Human Pose EstimationMuPoTS-3D3DPCK41.2GR-M3D
3D Human Pose EstimationMuPoTS-3D3DPCK84.6GR-M3D
3D Human Pose EstimationMuPoTS-3DAUC44.1GR-M3D
3D Multi-Person Pose Estimation (absolute)MuPoTS-3D3DPCK41.2GR-M3D
Pose EstimationPanopticAverage MPJPE (mm)57.9GR-M3D
Pose EstimationMuPoTS-3D3DPCK41.2GR-M3D
Pose EstimationMuPoTS-3D3DPCK84.6GR-M3D
Pose EstimationMuPoTS-3DAUC44.1GR-M3D
3DPanopticAverage MPJPE (mm)57.9GR-M3D
3DMuPoTS-3D3DPCK41.2GR-M3D
3DMuPoTS-3D3DPCK84.6GR-M3D
3DMuPoTS-3DAUC44.1GR-M3D
3D Multi-Person Pose EstimationPanopticAverage MPJPE (mm)57.9GR-M3D
3D Multi-Person Pose EstimationMuPoTS-3D3DPCK41.2GR-M3D
3D Multi-Person Pose EstimationMuPoTS-3D3DPCK84.6GR-M3D
3D Multi-Person Pose EstimationMuPoTS-3DAUC44.1GR-M3D
1 Image, 2*2 StitchiPanopticAverage MPJPE (mm)57.9GR-M3D
1 Image, 2*2 StitchiMuPoTS-3D3DPCK41.2GR-M3D
1 Image, 2*2 StitchiMuPoTS-3D3DPCK84.6GR-M3D
1 Image, 2*2 StitchiMuPoTS-3DAUC44.1GR-M3D

Related Papers

$π^3$: Scalable Permutation-Equivariant Visual Geometry Learning2025-07-17Revisiting Reliability in the Reasoning-based Pose Estimation Benchmark2025-07-17DINO-VO: A Feature-based Visual Odometry Leveraging a Visual Foundation Model2025-07-17From Neck to Head: Bio-Impedance Sensing for Head Pose Estimation2025-07-17AthleticsPose: Authentic Sports Motion Dataset on Athletic Field and Evaluation of Monocular 3D Pose Estimation Ability2025-07-17SpatialTrackerV2: 3D Point Tracking Made Easy2025-07-16SGLoc: Semantic Localization System for Camera Pose Estimation from 3D Gaussian Splatting Representation2025-07-16Efficient Calisthenics Skills Classification through Foreground Instance Selection and Depth Estimation2025-07-16