TasksSotADatasetsPapersMethodsSubmitAbout
Papers With Code 2

A community resource for machine learning research: papers, code, benchmarks, and state-of-the-art results.

Explore

Notable BenchmarksAll SotADatasetsPapersMethods

Community

Submit ResultsAbout

Data sourced from the PWC Archive (CC-BY-SA 4.0). Built by the community, for the community.

Papers/ARTS: Semi-Analytical Regressor using Disentangled Skeleta...

ARTS: Semi-Analytical Regressor using Disentangled Skeletal Representations for Human Mesh Recovery from Videos

Tao Tang, Hong Liu, Yingxuan You, Ti Wang, Wenhao Li

2024-10-213D Human Pose EstimationDisentanglementPose EstimationHuman Mesh Recovery
PaperPDFCode(official)Code

Abstract

Although existing video-based 3D human mesh recovery methods have made significant progress, simultaneously estimating human pose and shape from low-resolution image features limits their performance. These image features lack sufficient spatial information about the human body and contain various noises (e.g., background, lighting, and clothing), which often results in inaccurate pose and inconsistent motion. Inspired by the rapid advance in human pose estimation, we discover that compared to image features, skeletons inherently contain accurate human pose and motion. Therefore, we propose a novel semiAnalytical Regressor using disenTangled Skeletal representations for human mesh recovery from videos, called ARTS. Specifically, a skeleton estimation and disentanglement module is proposed to estimate the 3D skeletons from a video and decouple them into disentangled skeletal representations (i.e., joint position, bone length, and human motion). Then, to fully utilize these representations, we introduce a semi-analytical regressor to estimate the parameters of the human mesh model. The regressor consists of three modules: Temporal Inverse Kinematics (TIK), Bone-guided Shape Fitting (BSF), and Motion-Centric Refinement (MCR). TIK utilizes joint position to estimate initial pose parameters and BSF leverages bone length to regress bone-aligned shape parameters. Finally, MCR combines human motion representation with image features to refine the initial human model parameters. Extensive experiments demonstrate that our ARTS surpasses existing state-of-the-art video-based methods in both per-frame accuracy and temporal consistency on popular benchmarks: 3DPW, MPI-INF-3DHP, and Human3.6M. Code is available at https://github.com/TangTao-PKU/ARTS.

Results

TaskDatasetMetricValueModel
3D Human Pose EstimationMPI-INF-3DHPAcceleration Error7.4ARTS (Resnet50 L=16)
3D Human Pose EstimationMPI-INF-3DHPMPJPE71.8ARTS (Resnet50 L=16)
3D Human Pose EstimationMPI-INF-3DHPPA-MPJPE53ARTS (Resnet50 L=16)
3D Human Pose Estimation3DPWAcceleration Error6.5ARTS (Resnet50 L=16)
3D Human Pose Estimation3DPWMPJPE67.7ARTS (Resnet50 L=16)
3D Human Pose Estimation3DPWMPVPE81.4ARTS (Resnet50 L=16)
3D Human Pose Estimation3DPWPA-MPJPE46.5ARTS (Resnet50 L=16)
Pose EstimationMPI-INF-3DHPAcceleration Error7.4ARTS (Resnet50 L=16)
Pose EstimationMPI-INF-3DHPMPJPE71.8ARTS (Resnet50 L=16)
Pose EstimationMPI-INF-3DHPPA-MPJPE53ARTS (Resnet50 L=16)
Pose Estimation3DPWAcceleration Error6.5ARTS (Resnet50 L=16)
Pose Estimation3DPWMPJPE67.7ARTS (Resnet50 L=16)
Pose Estimation3DPWMPVPE81.4ARTS (Resnet50 L=16)
Pose Estimation3DPWPA-MPJPE46.5ARTS (Resnet50 L=16)
3DMPI-INF-3DHPAcceleration Error7.4ARTS (Resnet50 L=16)
3DMPI-INF-3DHPMPJPE71.8ARTS (Resnet50 L=16)
3DMPI-INF-3DHPPA-MPJPE53ARTS (Resnet50 L=16)
3D3DPWAcceleration Error6.5ARTS (Resnet50 L=16)
3D3DPWMPJPE67.7ARTS (Resnet50 L=16)
3D3DPWMPVPE81.4ARTS (Resnet50 L=16)
3D3DPWPA-MPJPE46.5ARTS (Resnet50 L=16)
1 Image, 2*2 StitchiMPI-INF-3DHPAcceleration Error7.4ARTS (Resnet50 L=16)
1 Image, 2*2 StitchiMPI-INF-3DHPMPJPE71.8ARTS (Resnet50 L=16)
1 Image, 2*2 StitchiMPI-INF-3DHPPA-MPJPE53ARTS (Resnet50 L=16)
1 Image, 2*2 Stitchi3DPWAcceleration Error6.5ARTS (Resnet50 L=16)
1 Image, 2*2 Stitchi3DPWMPJPE67.7ARTS (Resnet50 L=16)
1 Image, 2*2 Stitchi3DPWMPVPE81.4ARTS (Resnet50 L=16)
1 Image, 2*2 Stitchi3DPWPA-MPJPE46.5ARTS (Resnet50 L=16)

Related Papers

CSD-VAR: Content-Style Decomposition in Visual Autoregressive Models2025-07-18$π^3$: Scalable Permutation-Equivariant Visual Geometry Learning2025-07-17Revisiting Reliability in the Reasoning-based Pose Estimation Benchmark2025-07-17DINO-VO: A Feature-based Visual Odometry Leveraging a Visual Foundation Model2025-07-17From Neck to Head: Bio-Impedance Sensing for Head Pose Estimation2025-07-17AthleticsPose: Authentic Sports Motion Dataset on Athletic Field and Evaluation of Monocular 3D Pose Estimation Ability2025-07-17SpatialTrackerV2: 3D Point Tracking Made Easy2025-07-16SGLoc: Semantic Localization System for Camera Pose Estimation from 3D Gaussian Splatting Representation2025-07-16