TasksSotADatasetsPapersMethodsSubmitAbout
Papers With Code 2

A community resource for machine learning research: papers, code, benchmarks, and state-of-the-art results.

Explore

Notable BenchmarksAll SotADatasetsPapersMethods

Community

Submit ResultsAbout

Data sourced from the PWC Archive (CC-BY-SA 4.0). Built by the community, for the community.

Papers/KTPFormer: Kinematics and Trajectory Prior Knowledge-Enhan...

KTPFormer: Kinematics and Trajectory Prior Knowledge-Enhanced Transformer for 3D Human Pose Estimation

Jihua Peng, Yanghong Zhou, P. Y. Mok

2024-03-31CVPR 2024 13D Human Pose EstimationMonocular 3D Human Pose EstimationPose Estimation
PaperPDFCode(official)

Abstract

This paper presents a novel Kinematics and Trajectory Prior Knowledge-Enhanced Transformer (KTPFormer), which overcomes the weakness in existing transformer-based methods for 3D human pose estimation that the derivation of Q, K, V vectors in their self-attention mechanisms are all based on simple linear mapping. We propose two prior attention modules, namely Kinematics Prior Attention (KPA) and Trajectory Prior Attention (TPA) to take advantage of the known anatomical structure of the human body and motion trajectory information, to facilitate effective learning of global dependencies and features in the multi-head self-attention. KPA models kinematic relationships in the human body by constructing a topology of kinematics, while TPA builds a trajectory topology to learn the information of joint motion trajectory across frames. Yielding Q, K, V vectors with prior knowledge, the two modules enable KTPFormer to model both spatial and temporal correlations simultaneously. Extensive experiments on three benchmarks (Human3.6M, MPI-INF-3DHP and HumanEva) show that KTPFormer achieves superior performance in comparison to state-of-the-art methods. More importantly, our KPA and TPA modules have lightweight plug-and-play designs and can be integrated into various transformer-based networks (i.e., diffusion-based) to improve the performance with only a very small increase in the computational overhead. The code is available at: https://github.com/JihuaPeng/KTPFormer.

Results

TaskDatasetMetricValueModel
3D Human Pose EstimationMPI-INF-3DHPAUC85.9KTPFormer
3D Human Pose EstimationMPI-INF-3DHPMPJPE16.7KTPFormer
3D Human Pose EstimationMPI-INF-3DHPPCK98.9KTPFormer
3D Human Pose EstimationHuman3.6MAverage MPJPE (mm)33KTPFormer (T=243)
3D Human Pose EstimationHuman3.6MPA-MPJPE26.2KTPFormer (T=243)
3D Human Pose EstimationHuman3.6MAverage MPJPE (mm)40.1KTPFormer
3D Human Pose EstimationHuman3.6MFrames Needed243KTPFormer
Pose EstimationMPI-INF-3DHPAUC85.9KTPFormer
Pose EstimationMPI-INF-3DHPMPJPE16.7KTPFormer
Pose EstimationMPI-INF-3DHPPCK98.9KTPFormer
Pose EstimationHuman3.6MAverage MPJPE (mm)33KTPFormer (T=243)
Pose EstimationHuman3.6MPA-MPJPE26.2KTPFormer (T=243)
Pose EstimationHuman3.6MAverage MPJPE (mm)40.1KTPFormer
Pose EstimationHuman3.6MFrames Needed243KTPFormer
3DMPI-INF-3DHPAUC85.9KTPFormer
3DMPI-INF-3DHPMPJPE16.7KTPFormer
3DMPI-INF-3DHPPCK98.9KTPFormer
3DHuman3.6MAverage MPJPE (mm)33KTPFormer (T=243)
3DHuman3.6MPA-MPJPE26.2KTPFormer (T=243)
3DHuman3.6MAverage MPJPE (mm)40.1KTPFormer
3DHuman3.6MFrames Needed243KTPFormer
1 Image, 2*2 StitchiMPI-INF-3DHPAUC85.9KTPFormer
1 Image, 2*2 StitchiMPI-INF-3DHPMPJPE16.7KTPFormer
1 Image, 2*2 StitchiMPI-INF-3DHPPCK98.9KTPFormer
1 Image, 2*2 StitchiHuman3.6MAverage MPJPE (mm)33KTPFormer (T=243)
1 Image, 2*2 StitchiHuman3.6MPA-MPJPE26.2KTPFormer (T=243)
1 Image, 2*2 StitchiHuman3.6MAverage MPJPE (mm)40.1KTPFormer
1 Image, 2*2 StitchiHuman3.6MFrames Needed243KTPFormer

Related Papers

$π^3$: Scalable Permutation-Equivariant Visual Geometry Learning2025-07-17Revisiting Reliability in the Reasoning-based Pose Estimation Benchmark2025-07-17DINO-VO: A Feature-based Visual Odometry Leveraging a Visual Foundation Model2025-07-17From Neck to Head: Bio-Impedance Sensing for Head Pose Estimation2025-07-17AthleticsPose: Authentic Sports Motion Dataset on Athletic Field and Evaluation of Monocular 3D Pose Estimation Ability2025-07-17SpatialTrackerV2: 3D Point Tracking Made Easy2025-07-16SGLoc: Semantic Localization System for Camera Pose Estimation from 3D Gaussian Splatting Representation2025-07-16Efficient Calisthenics Skills Classification through Foreground Instance Selection and Depth Estimation2025-07-16