TasksSotADatasetsPapersMethodsSubmitAbout
Papers With Code 2

A community resource for machine learning research: papers, code, benchmarks, and state-of-the-art results.

Explore

Notable BenchmarksAll SotADatasetsPapersMethods

Community

Submit ResultsAbout

Data sourced from the PWC Archive (CC-BY-SA 4.0). Built by the community, for the community.

Papers/Capturing the motion of every joint: 3D human pose and sha...

Capturing the motion of every joint: 3D human pose and shape estimation with independent tokens

Sen yang, Wen Heng, Gang Liu, Guozhong Luo, Wankou Yang, Gang Yu

2023-03-013D Human Pose EstimationPose Estimation3D human pose and shape estimation
PaperPDFCode(official)

Abstract

In this paper we present a novel method to estimate 3D human pose and shape from monocular videos. This task requires directly recovering pixel-alignment 3D human pose and body shape from monocular images or videos, which is challenging due to its inherent ambiguity. To improve precision, existing methods highly rely on the initialized mean pose and shape as prior estimates and parameter regression with an iterative error feedback manner. In addition, video-based approaches model the overall change over the image-level features to temporally enhance the single-frame feature, but fail to capture the rotational motion at the joint level, and cannot guarantee local temporal consistency. To address these issues, we propose a novel Transformer-based model with a design of independent tokens. First, we introduce three types of tokens independent of the image feature: \textit{joint rotation tokens, shape token, and camera token}. By progressively interacting with image features through Transformer layers, these tokens learn to encode the prior knowledge of human 3D joint rotations, body shape, and position information from large-scale data, and are updated to estimate SMPL parameters conditioned on a given image. Second, benefiting from the proposed token-based representation, we further use a temporal model to focus on capturing the rotational temporal information of each joint, which is empirically conducive to preventing large jitters in local parts. Despite being conceptually simple, the proposed method attains superior performances on the 3DPW and Human3.6M datasets. Using ResNet-50 and Transformer architectures, it obtains 42.0 mm error on the PA-MPJPE metric of the challenging 3DPW, outperforming state-of-the-art counterparts by a large margin. Code will be publicly available at https://github.com/yangsenius/INT_HMR_Model

Results

TaskDatasetMetricValueModel
3D Human Pose Estimation3DPWAcceleration Error16.5INT-2 (ResNet-50)
3D Human Pose Estimation3DPWMPJPE75.6INT-2 (ResNet-50)
3D Human Pose Estimation3DPWMPVPE87.9INT-2 (ResNet-50)
3D Human Pose Estimation3DPWPA-MPJPE42INT-2 (ResNet-50)
Pose Estimation3DPWAcceleration Error16.5INT-2 (ResNet-50)
Pose Estimation3DPWMPJPE75.6INT-2 (ResNet-50)
Pose Estimation3DPWMPVPE87.9INT-2 (ResNet-50)
Pose Estimation3DPWPA-MPJPE42INT-2 (ResNet-50)
3D3DPWAcceleration Error16.5INT-2 (ResNet-50)
3D3DPWMPJPE75.6INT-2 (ResNet-50)
3D3DPWMPVPE87.9INT-2 (ResNet-50)
3D3DPWPA-MPJPE42INT-2 (ResNet-50)
1 Image, 2*2 Stitchi3DPWAcceleration Error16.5INT-2 (ResNet-50)
1 Image, 2*2 Stitchi3DPWMPJPE75.6INT-2 (ResNet-50)
1 Image, 2*2 Stitchi3DPWMPVPE87.9INT-2 (ResNet-50)
1 Image, 2*2 Stitchi3DPWPA-MPJPE42INT-2 (ResNet-50)

Related Papers

$π^3$: Scalable Permutation-Equivariant Visual Geometry Learning2025-07-17Revisiting Reliability in the Reasoning-based Pose Estimation Benchmark2025-07-17DINO-VO: A Feature-based Visual Odometry Leveraging a Visual Foundation Model2025-07-17From Neck to Head: Bio-Impedance Sensing for Head Pose Estimation2025-07-17AthleticsPose: Authentic Sports Motion Dataset on Athletic Field and Evaluation of Monocular 3D Pose Estimation Ability2025-07-17SpatialTrackerV2: 3D Point Tracking Made Easy2025-07-16SGLoc: Semantic Localization System for Camera Pose Estimation from 3D Gaussian Splatting Representation2025-07-16Efficient Calisthenics Skills Classification through Foreground Instance Selection and Depth Estimation2025-07-16