TasksSotADatasetsPapersMethodsSubmitAbout
Papers With Code 2

A community resource for machine learning research: papers, code, benchmarks, and state-of-the-art results.

Explore

Notable BenchmarksAll SotADatasetsPapersMethods

Community

Submit ResultsAbout

Data sourced from the PWC Archive (CC-BY-SA 4.0). Built by the community, for the community.

Papers/Jointformer: Single-Frame Lifting Transformer with Error P...

Jointformer: Single-Frame Lifting Transformer with Error Prediction and Refinement for 3D Human Pose Estimation

Sebastian Lutz, Richard Blythman, Koustav Ghosal, Matthew Moynihan, Ciaran Simms, Aljosa Smolic

2022-08-073D Human Pose EstimationMonocular 3D Human Pose EstimationPose EstimationMulti-Task Learning
PaperPDFCode(official)

Abstract

Monocular 3D human pose estimation technologies have the potential to greatly increase the availability of human movement data. The best-performing models for single-image 2D-3D lifting use graph convolutional networks (GCNs) that typically require some manual input to define the relationships between different body joints. We propose a novel transformer-based approach that uses the more generalised self-attention mechanism to learn these relationships within a sequence of tokens representing joints. We find that the use of intermediate supervision, as well as residual connections between the stacked encoders benefits performance. We also suggest that using error prediction as part of a multi-task learning framework improves performance by allowing the network to compensate for its confidence level. We perform extensive ablation studies to show that each of our contributions increases performance. Furthermore, we show that our approach outperforms the recent state of the art for single-frame 3D human pose estimation by a large margin. Our code and trained models are made publicly available on Github.

Results

TaskDatasetMetricValueModel
3D Human Pose EstimationH3WBMPJPE63Jointformer-flip
3D Human Pose EstimationHuman3.6MAverage MPJPE (mm)50.5Jointformer (CPN)
Pose EstimationH3WBMPJPE63Jointformer-flip
Pose EstimationHuman3.6MAverage MPJPE (mm)50.5Jointformer (CPN)
3DH3WBMPJPE63Jointformer-flip
3DHuman3.6MAverage MPJPE (mm)50.5Jointformer (CPN)
1 Image, 2*2 StitchiH3WBMPJPE63Jointformer-flip
1 Image, 2*2 StitchiHuman3.6MAverage MPJPE (mm)50.5Jointformer (CPN)

Related Papers

$π^3$: Scalable Permutation-Equivariant Visual Geometry Learning2025-07-17Revisiting Reliability in the Reasoning-based Pose Estimation Benchmark2025-07-17DINO-VO: A Feature-based Visual Odometry Leveraging a Visual Foundation Model2025-07-17From Neck to Head: Bio-Impedance Sensing for Head Pose Estimation2025-07-17AthleticsPose: Authentic Sports Motion Dataset on Athletic Field and Evaluation of Monocular 3D Pose Estimation Ability2025-07-17SGCL: Unifying Self-Supervised and Supervised Learning for Graph Recommendation2025-07-17SpatialTrackerV2: 3D Point Tracking Made Easy2025-07-16SGLoc: Semantic Localization System for Camera Pose Estimation from 3D Gaussian Splatting Representation2025-07-16