TasksSotADatasetsPapersMethodsSubmitAbout
Papers With Code 2

A community resource for machine learning research: papers, code, benchmarks, and state-of-the-art results.

Explore

Notable BenchmarksAll SotADatasetsPapersMethods

Community

Submit ResultsAbout

Data sourced from the PWC Archive (CC-BY-SA 4.0). Built by the community, for the community.

Papers/Kinematic-aware Hierarchical Attention Network for Human P...

Kinematic-aware Hierarchical Attention Network for Human Pose Estimation in Videos

Kyung-Min Jin, Byoung-Sung Lim, Gun-Hee Lee, Tae-Kyung Kang, Seong-Whan Lee

2022-11-293D Human Pose EstimationPose Estimation3D Pose Estimation2D Pose Estimation
PaperPDFCode(official)

Abstract

Previous video-based human pose estimation methods have shown promising results by leveraging aggregated features of consecutive frames. However, most approaches compromise accuracy to mitigate jitter or do not sufficiently comprehend the temporal aspects of human motion. Furthermore, occlusion increases uncertainty between consecutive frames, which results in unsmooth results. To address these issues, we design an architecture that exploits the keypoint kinematic features with the following components. First, we effectively capture the temporal features by leveraging individual keypoint's velocity and acceleration. Second, the proposed hierarchical transformer encoder aggregates spatio-temporal dependencies and refines the 2D or 3D input pose estimated from existing estimators. Finally, we provide an online cross-supervision between the refined input pose generated from the encoder and the final pose from our decoder to enable joint optimization. We demonstrate comprehensive results and validate the effectiveness of our model in various tasks: 2D pose estimation, 3D pose estimation, body mesh recovery, and sparsely annotated multi-human pose estimation. Our code is available at https://github.com/KyungMinJin/HANet.

Results

TaskDatasetMetricValueModel
3D Human Pose Estimation3DPWAcceleration Error8PARE + HANet (T=51)
3D Human Pose Estimation3DPWMPJPE74.6PARE + HANet (T=51)
3D Human Pose Estimation3DPWAcceleration Error6.8PARE + HANet (T=101)
3D Human Pose Estimation3DPWMPJPE77.1PARE + HANet (T=101)
3D Human Pose EstimationAIST++Acceleration Error6.4SPIN + HANet (T=51)
3D Human Pose EstimationAIST++MPJPE64.3SPIN + HANet (T=51)
3D Human Pose EstimationAIST++Acceleration Error5.4SPIN + HANet (T=101)
3D Human Pose EstimationAIST++MPJPE69.2SPIN + HANet (T=101)
Pose EstimationJ-HMDBMean PCK@0.0591.9SimpleBaseline + HANet
Pose EstimationJ-HMDBMean PCK@0.198.3SimpleBaseline + HANet
Pose EstimationJ-HMDBMean PCK@0.299.6SimpleBaseline + HANet
Pose Estimation3DPWAcceleration Error8PARE + HANet (T=51)
Pose Estimation3DPWMPJPE74.6PARE + HANet (T=51)
Pose Estimation3DPWAcceleration Error6.8PARE + HANet (T=101)
Pose Estimation3DPWMPJPE77.1PARE + HANet (T=101)
Pose EstimationAIST++Acceleration Error6.4SPIN + HANet (T=51)
Pose EstimationAIST++MPJPE64.3SPIN + HANet (T=51)
Pose EstimationAIST++Acceleration Error5.4SPIN + HANet (T=101)
Pose EstimationAIST++MPJPE69.2SPIN + HANet (T=101)
3DJ-HMDBMean PCK@0.0591.9SimpleBaseline + HANet
3DJ-HMDBMean PCK@0.198.3SimpleBaseline + HANet
3DJ-HMDBMean PCK@0.299.6SimpleBaseline + HANet
3D3DPWAcceleration Error8PARE + HANet (T=51)
3D3DPWMPJPE74.6PARE + HANet (T=51)
3D3DPWAcceleration Error6.8PARE + HANet (T=101)
3D3DPWMPJPE77.1PARE + HANet (T=101)
3DAIST++Acceleration Error6.4SPIN + HANet (T=51)
3DAIST++MPJPE64.3SPIN + HANet (T=51)
3DAIST++Acceleration Error5.4SPIN + HANet (T=101)
3DAIST++MPJPE69.2SPIN + HANet (T=101)
1 Image, 2*2 StitchiJ-HMDBMean PCK@0.0591.9SimpleBaseline + HANet
1 Image, 2*2 StitchiJ-HMDBMean PCK@0.198.3SimpleBaseline + HANet
1 Image, 2*2 StitchiJ-HMDBMean PCK@0.299.6SimpleBaseline + HANet
1 Image, 2*2 Stitchi3DPWAcceleration Error8PARE + HANet (T=51)
1 Image, 2*2 Stitchi3DPWMPJPE74.6PARE + HANet (T=51)
1 Image, 2*2 Stitchi3DPWAcceleration Error6.8PARE + HANet (T=101)
1 Image, 2*2 Stitchi3DPWMPJPE77.1PARE + HANet (T=101)
1 Image, 2*2 StitchiAIST++Acceleration Error6.4SPIN + HANet (T=51)
1 Image, 2*2 StitchiAIST++MPJPE64.3SPIN + HANet (T=51)
1 Image, 2*2 StitchiAIST++Acceleration Error5.4SPIN + HANet (T=101)
1 Image, 2*2 StitchiAIST++MPJPE69.2SPIN + HANet (T=101)

Related Papers

$π^3$: Scalable Permutation-Equivariant Visual Geometry Learning2025-07-17Revisiting Reliability in the Reasoning-based Pose Estimation Benchmark2025-07-17DINO-VO: A Feature-based Visual Odometry Leveraging a Visual Foundation Model2025-07-17From Neck to Head: Bio-Impedance Sensing for Head Pose Estimation2025-07-17AthleticsPose: Authentic Sports Motion Dataset on Athletic Field and Evaluation of Monocular 3D Pose Estimation Ability2025-07-17SpatialTrackerV2: 3D Point Tracking Made Easy2025-07-16SGLoc: Semantic Localization System for Camera Pose Estimation from 3D Gaussian Splatting Representation2025-07-16Efficient Calisthenics Skills Classification through Foreground Instance Selection and Depth Estimation2025-07-16