TasksSotADatasetsPapersMethodsSubmitAbout
Papers With Code 2

A community resource for machine learning research: papers, code, benchmarks, and state-of-the-art results.

Explore

Notable BenchmarksAll SotADatasetsPapersMethods

Community

Submit ResultsAbout

Data sourced from the PWC Archive (CC-BY-SA 4.0). Built by the community, for the community.

Papers/MHFormer: Multi-Hypothesis Transformer for 3D Human Pose E...

MHFormer: Multi-Hypothesis Transformer for 3D Human Pose Estimation

Wenhao Li, Hong Liu, Hao Tang, Pichao Wang, Luc van Gool

2021-11-24CVPR 2022 13D Human Pose EstimationPose Estimation
PaperPDFCode(official)

Abstract

Estimating 3D human poses from monocular videos is a challenging task due to depth ambiguity and self-occlusion. Most existing works attempt to solve both issues by exploiting spatial and temporal relationships. However, those works ignore the fact that it is an inverse problem where multiple feasible solutions (i.e., hypotheses) exist. To relieve this limitation, we propose a Multi-Hypothesis Transformer (MHFormer) that learns spatio-temporal representations of multiple plausible pose hypotheses. In order to effectively model multi-hypothesis dependencies and build strong relationships across hypothesis features, the task is decomposed into three stages: (i) Generate multiple initial hypothesis representations; (ii) Model self-hypothesis communication, merge multiple hypotheses into a single converged representation and then partition it into several diverged hypotheses; (iii) Learn cross-hypothesis communication and aggregate the multi-hypothesis features to synthesize the final 3D pose. Through the above processes, the final representation is enhanced and the synthesized pose is much more accurate. Extensive experiments show that MHFormer achieves state-of-the-art results on two challenging datasets: Human3.6M and MPI-INF-3DHP. Without bells and whistles, its performance surpasses the previous best result by a large margin of 3% on Human3.6M. Code and models are available at \url{https://github.com/Vegetebird/MHFormer}.

Results

TaskDatasetMetricValueModel
3D Human Pose EstimationMPI-INF-3DHPAUC63.3MHFormer
3D Human Pose EstimationMPI-INF-3DHPMPJPE58MHFormer
3D Human Pose EstimationMPI-INF-3DHPPCK93.8MHFormer
3D Human Pose EstimationHuman3.6MAverage MPJPE (mm)43MHFormer
Pose EstimationMPI-INF-3DHPAUC63.3MHFormer
Pose EstimationMPI-INF-3DHPMPJPE58MHFormer
Pose EstimationMPI-INF-3DHPPCK93.8MHFormer
Pose EstimationHuman3.6MAverage MPJPE (mm)43MHFormer
3DMPI-INF-3DHPAUC63.3MHFormer
3DMPI-INF-3DHPMPJPE58MHFormer
3DMPI-INF-3DHPPCK93.8MHFormer
3DHuman3.6MAverage MPJPE (mm)43MHFormer
1 Image, 2*2 StitchiMPI-INF-3DHPAUC63.3MHFormer
1 Image, 2*2 StitchiMPI-INF-3DHPMPJPE58MHFormer
1 Image, 2*2 StitchiMPI-INF-3DHPPCK93.8MHFormer
1 Image, 2*2 StitchiHuman3.6MAverage MPJPE (mm)43MHFormer

Related Papers

$π^3$: Scalable Permutation-Equivariant Visual Geometry Learning2025-07-17Revisiting Reliability in the Reasoning-based Pose Estimation Benchmark2025-07-17DINO-VO: A Feature-based Visual Odometry Leveraging a Visual Foundation Model2025-07-17From Neck to Head: Bio-Impedance Sensing for Head Pose Estimation2025-07-17AthleticsPose: Authentic Sports Motion Dataset on Athletic Field and Evaluation of Monocular 3D Pose Estimation Ability2025-07-17SpatialTrackerV2: 3D Point Tracking Made Easy2025-07-16SGLoc: Semantic Localization System for Camera Pose Estimation from 3D Gaussian Splatting Representation2025-07-16Efficient Calisthenics Skills Classification through Foreground Instance Selection and Depth Estimation2025-07-16