TasksSotADatasetsPapersMethodsSubmitAbout
Papers With Code 2

A community resource for machine learning research: papers, code, benchmarks, and state-of-the-art results.

Explore

Notable BenchmarksAll SotADatasetsPapersMethods

Community

Submit ResultsAbout

Data sourced from the PWC Archive (CC-BY-SA 4.0). Built by the community, for the community.

Papers/SkelFormer: Markerless 3D Pose and Shape Estimation using ...

SkelFormer: Markerless 3D Pose and Shape Estimation using Skeletal Transformers

Vandad Davoodnia, Saeed Ghorbani, Alexandre Messier, Ali Etemad

2024-04-193D Human Pose EstimationMarkerless Motion CaptureKeypoint DetectionMulti-view 3D Human Pose Estimation
PaperPDF

Abstract

We introduce SkelFormer, a novel markerless motion capture pipeline for multi-view human pose and shape estimation. Our method first uses off-the-shelf 2D keypoint estimators, pre-trained on large-scale in-the-wild data, to obtain 3D joint positions. Next, we design a regression-based inverse-kinematic skeletal transformer that maps the joint positions to pose and shape representations from heavily noisy observations. This module integrates prior knowledge about pose space and infers the full pose state at runtime. Separating the 3D keypoint detection and inverse-kinematic problems, along with the expressive representations learned by our skeletal transformer, enhance the generalization of our method to unseen noisy data. We evaluate our method on three public datasets in both in-distribution and out-of-distribution settings using three datasets, and observe strong performance with respect to prior works. Moreover, ablation experiments demonstrate the impact of each of the modules of our architecture. Finally, we study the performance of our method in dealing with noise and heavy occlusions and find considerable robustness with respect to other solutions.

Results

TaskDatasetMetricValueModel
3D Human Pose EstimationRICHMPJPE44.2SkelFormer (HRNet - eval only)
3D Human Pose EstimationRICHMPVPE39.9SkelFormer (HRNet - eval only)
3D Human Pose EstimationRICHPA-MPJPE35.6SkelFormer (HRNet - eval only)
Pose EstimationRICHMPJPE44.2SkelFormer (HRNet - eval only)
Pose EstimationRICHMPVPE39.9SkelFormer (HRNet - eval only)
Pose EstimationRICHPA-MPJPE35.6SkelFormer (HRNet - eval only)
3DRICHMPJPE44.2SkelFormer (HRNet - eval only)
3DRICHMPVPE39.9SkelFormer (HRNet - eval only)
3DRICHPA-MPJPE35.6SkelFormer (HRNet - eval only)
1 Image, 2*2 StitchiRICHMPJPE44.2SkelFormer (HRNet - eval only)
1 Image, 2*2 StitchiRICHMPVPE39.9SkelFormer (HRNet - eval only)
1 Image, 2*2 StitchiRICHPA-MPJPE35.6SkelFormer (HRNet - eval only)

Related Papers

KptLLM++: Towards Generic Keypoint Comprehension with Large Language Model2025-07-15GKNet: Graph-based Keypoints Network for Monocular Pose Estimation of Non-cooperative Spacecraft2025-07-15FPC-Net: Revisiting SuperPoint with Descriptor-Free Keypoint Detection via Feature Pyramids and Consistency-Based Implicit Matching2025-07-14Doodle Your Keypoints: Sketch-Based Few-Shot Keypoint Detection2025-07-10Reading a Ruler in the Wild2025-07-09MK-Pose: Category-Level Object Pose Estimation via Multimodal-Based Keypoint Learning2025-07-09Systematic Comparison of Projection Methods for Monocular 3D Human Pose Estimation on Fisheye Images2025-06-24Fast Neural Inverse Kinematics on Human Body Motions2025-06-22