TasksSotADatasetsPapersMethodsSubmitAbout
Papers With Code 2

A community resource for machine learning research: papers, code, benchmarks, and state-of-the-art results.

Explore

Notable BenchmarksAll SotADatasetsPapersMethods

Community

Submit ResultsAbout

Data sourced from the PWC Archive (CC-BY-SA 4.0). Built by the community, for the community.

Papers/XNect: Real-time Multi-Person 3D Motion Capture with a Sin...

XNect: Real-time Multi-Person 3D Motion Capture with a Single RGB Camera

Dushyant Mehta, Oleksandr Sotnychenko, Franziska Mueller, Weipeng Xu, Mohamed Elgharib, Pascal Fua, Hans-Peter Seidel, Helge Rhodin, Gerard Pons-Moll, Christian Theobalt

2019-07-013D Human Pose EstimationMonocular 3D Human Pose EstimationPose Estimation3D Multi-Person Human Pose Estimation3D Multi-Person Pose Estimation
PaperPDFCodeCodeCodeCode

Abstract

We present a real-time approach for multi-person 3D motion capture at over 30 fps using a single RGB camera. It operates successfully in generic scenes which may contain occlusions by objects and by other people. Our method operates in subsequent stages. The first stage is a convolutional neural network (CNN) that estimates 2D and 3D pose features along with identity assignments for all visible joints of all individuals.We contribute a new architecture for this CNN, called SelecSLS Net, that uses novel selective long and short range skip connections to improve the information flow allowing for a drastically faster network without compromising accuracy. In the second stage, a fully connected neural network turns the possibly partial (on account of occlusion) 2Dpose and 3Dpose features for each subject into a complete 3Dpose estimate per individual. The third stage applies space-time skeletal model fitting to the predicted 2D and 3D pose per subject to further reconcile the 2D and 3D pose, and enforce temporal coherence. Our method returns the full skeletal pose in joint angles for each subject. This is a further key distinction from previous work that do not produce joint angle results of a coherent skeleton in real time for multi-person scenes. The proposed system runs on consumer hardware at a previously unseen speed of more than 30 fps given 512x320 images as input while achieving state-of-the-art accuracy, which we will demonstrate on a range of challenging real-world scenes.

Results

TaskDatasetMetricValueModel
3D Human Pose EstimationMPI-INF-3DHPAUC45.3XNect (SelecSLS)
3D Human Pose EstimationMPI-INF-3DHPMPJPE98.4XNect (SelecSLS)
3D Human Pose EstimationMPI-INF-3DHPPCK82.8XNect (SelecSLS)
3D Human Pose EstimationHuman3.6MAverage MPJPE (mm)63.6SelecSLS
3D Human Pose EstimationHuman3.6MFrames Needed1SelecSLS
3D Human Pose EstimationMuPoTS-3D3DPCK75.8SelecSLS
Pose EstimationMPI-INF-3DHPAUC45.3XNect (SelecSLS)
Pose EstimationMPI-INF-3DHPMPJPE98.4XNect (SelecSLS)
Pose EstimationMPI-INF-3DHPPCK82.8XNect (SelecSLS)
Pose EstimationHuman3.6MAverage MPJPE (mm)63.6SelecSLS
Pose EstimationHuman3.6MFrames Needed1SelecSLS
Pose EstimationMuPoTS-3D3DPCK75.8SelecSLS
3DMPI-INF-3DHPAUC45.3XNect (SelecSLS)
3DMPI-INF-3DHPMPJPE98.4XNect (SelecSLS)
3DMPI-INF-3DHPPCK82.8XNect (SelecSLS)
3DHuman3.6MAverage MPJPE (mm)63.6SelecSLS
3DHuman3.6MFrames Needed1SelecSLS
3DMuPoTS-3D3DPCK75.8SelecSLS
3D Multi-Person Pose EstimationMuPoTS-3D3DPCK75.8SelecSLS
1 Image, 2*2 StitchiMPI-INF-3DHPAUC45.3XNect (SelecSLS)
1 Image, 2*2 StitchiMPI-INF-3DHPMPJPE98.4XNect (SelecSLS)
1 Image, 2*2 StitchiMPI-INF-3DHPPCK82.8XNect (SelecSLS)
1 Image, 2*2 StitchiHuman3.6MAverage MPJPE (mm)63.6SelecSLS
1 Image, 2*2 StitchiHuman3.6MFrames Needed1SelecSLS
1 Image, 2*2 StitchiMuPoTS-3D3DPCK75.8SelecSLS

Related Papers

$π^3$: Scalable Permutation-Equivariant Visual Geometry Learning2025-07-17Revisiting Reliability in the Reasoning-based Pose Estimation Benchmark2025-07-17DINO-VO: A Feature-based Visual Odometry Leveraging a Visual Foundation Model2025-07-17From Neck to Head: Bio-Impedance Sensing for Head Pose Estimation2025-07-17AthleticsPose: Authentic Sports Motion Dataset on Athletic Field and Evaluation of Monocular 3D Pose Estimation Ability2025-07-17SpatialTrackerV2: 3D Point Tracking Made Easy2025-07-16SGLoc: Semantic Localization System for Camera Pose Estimation from 3D Gaussian Splatting Representation2025-07-16Efficient Calisthenics Skills Classification through Foreground Instance Selection and Depth Estimation2025-07-16