TasksSotADatasetsPapersMethodsSubmitAbout
Papers With Code 2

A community resource for machine learning research: papers, code, benchmarks, and state-of-the-art results.

Explore

Notable BenchmarksAll SotADatasetsPapersMethods

Community

Submit ResultsAbout

Data sourced from the PWC Archive (CC-BY-SA 4.0). Built by the community, for the community.

Papers/HSTFormer: Hierarchical Spatial-Temporal Transformers for ...

HSTFormer: Hierarchical Spatial-Temporal Transformers for 3D Human Pose Estimation

Xiaoye Qian, YouBao Tang, Ning Zhang, Mei Han, Jing Xiao, Ming-Chun Huang, Ruei-Sung Lin

2023-01-183D Human Pose EstimationPose Estimation
PaperPDF

Abstract

Transformer-based approaches have been successfully proposed for 3D human pose estimation (HPE) from 2D pose sequence and achieved state-of-the-art (SOTA) performance. However, current SOTAs have difficulties in modeling spatial-temporal correlations of joints at different levels simultaneously. This is due to the poses' spatial-temporal complexity. Poses move at various speeds temporarily with various joints and body-parts movement spatially. Hence, a cookie-cutter transformer is non-adaptable and can hardly meet the "in-the-wild" requirement. To mitigate this issue, we propose Hierarchical Spatial-Temporal transFormers (HSTFormer) to capture multi-level joints' spatial-temporal correlations from local to global gradually for accurate 3D HPE. HSTFormer consists of four transformer encoders (TEs) and a fusion module. To the best of our knowledge, HSTFormer is the first to study hierarchical TEs with multi-level fusion. Extensive experiments on three datasets (i.e., Human3.6M, MPI-INF-3DHP, and HumanEva) demonstrate that HSTFormer achieves competitive and consistent performance on benchmarks with various scales and difficulties. Specifically, it surpasses recent SOTAs on the challenging MPI-INF-3DHP dataset and small-scale HumanEva dataset, with a highly generalized systematic approach. The code is available at: https://github.com/qianxiaoye825/HSTFormer.

Results

TaskDatasetMetricValueModel
3D Human Pose EstimationMPI-INF-3DHPAUC78.6HSTFormer (T=81)
3D Human Pose EstimationMPI-INF-3DHPMPJPE28.3HSTFormer (T=81)
3D Human Pose EstimationMPI-INF-3DHPPCK98HSTFormer (T=81)
Pose EstimationMPI-INF-3DHPAUC78.6HSTFormer (T=81)
Pose EstimationMPI-INF-3DHPMPJPE28.3HSTFormer (T=81)
Pose EstimationMPI-INF-3DHPPCK98HSTFormer (T=81)
3DMPI-INF-3DHPAUC78.6HSTFormer (T=81)
3DMPI-INF-3DHPMPJPE28.3HSTFormer (T=81)
3DMPI-INF-3DHPPCK98HSTFormer (T=81)
1 Image, 2*2 StitchiMPI-INF-3DHPAUC78.6HSTFormer (T=81)
1 Image, 2*2 StitchiMPI-INF-3DHPMPJPE28.3HSTFormer (T=81)
1 Image, 2*2 StitchiMPI-INF-3DHPPCK98HSTFormer (T=81)

Related Papers

$π^3$: Scalable Permutation-Equivariant Visual Geometry Learning2025-07-17Revisiting Reliability in the Reasoning-based Pose Estimation Benchmark2025-07-17DINO-VO: A Feature-based Visual Odometry Leveraging a Visual Foundation Model2025-07-17From Neck to Head: Bio-Impedance Sensing for Head Pose Estimation2025-07-17AthleticsPose: Authentic Sports Motion Dataset on Athletic Field and Evaluation of Monocular 3D Pose Estimation Ability2025-07-17SpatialTrackerV2: 3D Point Tracking Made Easy2025-07-16SGLoc: Semantic Localization System for Camera Pose Estimation from 3D Gaussian Splatting Representation2025-07-16Efficient Calisthenics Skills Classification through Foreground Instance Selection and Depth Estimation2025-07-16