TasksSotADatasetsPapersMethodsSubmitAbout
Papers With Code 2

A community resource for machine learning research: papers, code, benchmarks, and state-of-the-art results.

Explore

Notable BenchmarksAll SotADatasetsPapersMethods

Community

Submit ResultsAbout

Data sourced from the PWC Archive (CC-BY-SA 4.0). Built by the community, for the community.

Papers/Capturing Humans in Motion: Temporal-Attentive 3D Human Po...

Capturing Humans in Motion: Temporal-Attentive 3D Human Pose and Shape Estimation from Monocular Video

Wen-Li Wei, Jen-Chun Lin, Tyng-Luh Liu, Hong-Yuan Mark Liao

2022-03-16CVPR 2022 13D Human Pose Estimation3D human pose and shape estimation
PaperPDF

Abstract

Learning to capture human motion is essential to 3D human pose and shape estimation from monocular video. However, the existing methods mainly rely on recurrent or convolutional operation to model such temporal information, which limits the ability to capture non-local context relations of human motion. To address this problem, we propose a motion pose and shape network (MPS-Net) to effectively capture humans in motion to estimate accurate and temporally coherent 3D human pose and shape from a video. Specifically, we first propose a motion continuity attention (MoCA) module that leverages visual cues observed from human motion to adaptively recalibrate the range that needs attention in the sequence to better capture the motion continuity dependencies. Then, we develop a hierarchical attentive feature integration (HAFI) module to effectively combine adjacent past and future feature representations to strengthen temporal correlation and refine the feature representation of the current frame. By coupling the MoCA and HAFI modules, the proposed MPS-Net excels in estimating 3D human pose and shape in the video. Though conceptually simple, our MPS-Net not only outperforms the state-of-the-art methods on the 3DPW, MPI-INF-3DHP, and Human3.6M benchmark datasets, but also uses fewer network parameters. The video demos can be found at https://mps-net.github.io/MPS-Net/.

Results

TaskDatasetMetricValueModel
3D Human Pose EstimationMPI-INF-3DHPAcceleration Error9.6MPS-Net (T=16)
3D Human Pose EstimationMPI-INF-3DHPMPJPE96.7MPS-Net (T=16)
3D Human Pose EstimationMPI-INF-3DHPPA-MPJPE62.8MPS-Net (T=16)
3D Human Pose Estimation3DPWAcceleration Error7.4MPS-Net (T=16)
3D Human Pose Estimation3DPWFLOPs (G)4.45MPS-Net (T=16)
3D Human Pose Estimation3DPWMPJPE84.3MPS-Net (T=16)
3D Human Pose Estimation3DPWMPVPE99.7MPS-Net (T=16)
3D Human Pose Estimation3DPWNumber of parameters (M)39.63MPS-Net (T=16)
3D Human Pose Estimation3DPWPA-MPJPE52.1MPS-Net (T=16)
Pose EstimationMPI-INF-3DHPAcceleration Error9.6MPS-Net (T=16)
Pose EstimationMPI-INF-3DHPMPJPE96.7MPS-Net (T=16)
Pose EstimationMPI-INF-3DHPPA-MPJPE62.8MPS-Net (T=16)
Pose Estimation3DPWAcceleration Error7.4MPS-Net (T=16)
Pose Estimation3DPWFLOPs (G)4.45MPS-Net (T=16)
Pose Estimation3DPWMPJPE84.3MPS-Net (T=16)
Pose Estimation3DPWMPVPE99.7MPS-Net (T=16)
Pose Estimation3DPWNumber of parameters (M)39.63MPS-Net (T=16)
Pose Estimation3DPWPA-MPJPE52.1MPS-Net (T=16)
3DMPI-INF-3DHPAcceleration Error9.6MPS-Net (T=16)
3DMPI-INF-3DHPMPJPE96.7MPS-Net (T=16)
3DMPI-INF-3DHPPA-MPJPE62.8MPS-Net (T=16)
3D3DPWAcceleration Error7.4MPS-Net (T=16)
3D3DPWFLOPs (G)4.45MPS-Net (T=16)
3D3DPWMPJPE84.3MPS-Net (T=16)
3D3DPWMPVPE99.7MPS-Net (T=16)
3D3DPWNumber of parameters (M)39.63MPS-Net (T=16)
3D3DPWPA-MPJPE52.1MPS-Net (T=16)
1 Image, 2*2 StitchiMPI-INF-3DHPAcceleration Error9.6MPS-Net (T=16)
1 Image, 2*2 StitchiMPI-INF-3DHPMPJPE96.7MPS-Net (T=16)
1 Image, 2*2 StitchiMPI-INF-3DHPPA-MPJPE62.8MPS-Net (T=16)
1 Image, 2*2 Stitchi3DPWAcceleration Error7.4MPS-Net (T=16)
1 Image, 2*2 Stitchi3DPWFLOPs (G)4.45MPS-Net (T=16)
1 Image, 2*2 Stitchi3DPWMPJPE84.3MPS-Net (T=16)
1 Image, 2*2 Stitchi3DPWMPVPE99.7MPS-Net (T=16)
1 Image, 2*2 Stitchi3DPWNumber of parameters (M)39.63MPS-Net (T=16)
1 Image, 2*2 Stitchi3DPWPA-MPJPE52.1MPS-Net (T=16)

Related Papers

Systematic Comparison of Projection Methods for Monocular 3D Human Pose Estimation on Fisheye Images2025-06-24ExtPose: Robust and Coherent Pose Estimation by Extending ViTs2025-06-18PoseGRAF: Geometric-Reinforced Adaptive Fusion for Monocular 3D Human Pose Estimation2025-06-17Learning Pyramid-structured Long-range Dependencies for 3D Human Pose Estimation2025-06-03UPTor: Unified 3D Human Pose Dynamics and Trajectory Prediction for Human-Robot Interaction2025-05-20PoseBench3D: A Cross-Dataset Analysis Framework for 3D Human Pose Estimation2025-05-16HDiffTG: A Lightweight Hybrid Diffusion-Transformer-GCN Architecture for 3D Human Pose Estimation2025-05-07Continuous Normalizing Flows for Uncertainty-Aware Human Pose Estimation2025-05-04