TasksSotADatasetsPapersMethodsSubmitAbout
Papers With Code 2

A community resource for machine learning research: papers, code, benchmarks, and state-of-the-art results.

Explore

Notable BenchmarksAll SotADatasetsPapersMethods

Community

Submit ResultsAbout

Data sourced from the PWC Archive (CC-BY-SA 4.0). Built by the community, for the community.

Papers/Egocentric Whole-Body Motion Capture with FisheyeViT and D...

Egocentric Whole-Body Motion Capture with FisheyeViT and Diffusion-Based Motion Refinement

Jian Wang, Zhe Cao, Diogo Luvizon, Lingjie Liu, Kripasindhu Sarkar, Danhang Tang, Thabo Beeler, Christian Theobalt

2023-11-28CVPR 2024 1Egocentric Pose EstimationPose EstimationPose PredictionHand DetectionHand Pose Estimation
PaperPDF

Abstract

In this work, we explore egocentric whole-body motion capture using a single fisheye camera, which simultaneously estimates human body and hand motion. This task presents significant challenges due to three factors: the lack of high-quality datasets, fisheye camera distortion, and human body self-occlusion. To address these challenges, we propose a novel approach that leverages FisheyeViT to extract fisheye image features, which are subsequently converted into pixel-aligned 3D heatmap representations for 3D human body pose prediction. For hand tracking, we incorporate dedicated hand detection and hand pose estimation networks for regressing 3D hand poses. Finally, we develop a diffusion-based whole-body motion prior model to refine the estimated whole-body motion while accounting for joint uncertainties. To train these networks, we collect a large synthetic dataset, EgoWholeBody, comprising 840,000 high-quality egocentric images captured across a diverse range of whole-body motion sequences. Quantitative and qualitative evaluations demonstrate the effectiveness of our method in producing high-quality whole-body motion estimates from a single egocentric camera.

Results

TaskDatasetMetricValueModel
3D Human Pose EstimationGlobalEgoMocap Test DatasetAverage MPJPE (mm)65.83EgoWholeMocap-Temporal
3D Human Pose EstimationGlobalEgoMocap Test DatasetPA-MPJPE53.47EgoWholeMocap-Temporal
3D Human Pose EstimationGlobalEgoMocap Test DatasetAverage MPJPE (mm)68.59EgoWholeMocap-Single Frame
3D Human Pose EstimationGlobalEgoMocap Test DatasetPA-MPJPE55.92EgoWholeMocap-Single Frame
3D Human Pose EstimationSceneEgoAverage MPJPE (mm)57.59EgoWholeMocap-Temporal
3D Human Pose EstimationSceneEgoPA-MPJPE46.55EgoWholeMocap-Temporal
3D Human Pose EstimationSceneEgoAverage MPJPE (mm)64.19EgoWholeMocap-Single Frame
3D Human Pose EstimationSceneEgoPA-MPJPE50.06EgoWholeMocap-Single Frame
Pose EstimationGlobalEgoMocap Test DatasetAverage MPJPE (mm)65.83EgoWholeMocap-Temporal
Pose EstimationGlobalEgoMocap Test DatasetPA-MPJPE53.47EgoWholeMocap-Temporal
Pose EstimationGlobalEgoMocap Test DatasetAverage MPJPE (mm)68.59EgoWholeMocap-Single Frame
Pose EstimationGlobalEgoMocap Test DatasetPA-MPJPE55.92EgoWholeMocap-Single Frame
Pose EstimationSceneEgoAverage MPJPE (mm)57.59EgoWholeMocap-Temporal
Pose EstimationSceneEgoPA-MPJPE46.55EgoWholeMocap-Temporal
Pose EstimationSceneEgoAverage MPJPE (mm)64.19EgoWholeMocap-Single Frame
Pose EstimationSceneEgoPA-MPJPE50.06EgoWholeMocap-Single Frame
3DGlobalEgoMocap Test DatasetAverage MPJPE (mm)65.83EgoWholeMocap-Temporal
3DGlobalEgoMocap Test DatasetPA-MPJPE53.47EgoWholeMocap-Temporal
3DGlobalEgoMocap Test DatasetAverage MPJPE (mm)68.59EgoWholeMocap-Single Frame
3DGlobalEgoMocap Test DatasetPA-MPJPE55.92EgoWholeMocap-Single Frame
3DSceneEgoAverage MPJPE (mm)57.59EgoWholeMocap-Temporal
3DSceneEgoPA-MPJPE46.55EgoWholeMocap-Temporal
3DSceneEgoAverage MPJPE (mm)64.19EgoWholeMocap-Single Frame
3DSceneEgoPA-MPJPE50.06EgoWholeMocap-Single Frame
1 Image, 2*2 StitchiGlobalEgoMocap Test DatasetAverage MPJPE (mm)65.83EgoWholeMocap-Temporal
1 Image, 2*2 StitchiGlobalEgoMocap Test DatasetPA-MPJPE53.47EgoWholeMocap-Temporal
1 Image, 2*2 StitchiGlobalEgoMocap Test DatasetAverage MPJPE (mm)68.59EgoWholeMocap-Single Frame
1 Image, 2*2 StitchiGlobalEgoMocap Test DatasetPA-MPJPE55.92EgoWholeMocap-Single Frame
1 Image, 2*2 StitchiSceneEgoAverage MPJPE (mm)57.59EgoWholeMocap-Temporal
1 Image, 2*2 StitchiSceneEgoPA-MPJPE46.55EgoWholeMocap-Temporal
1 Image, 2*2 StitchiSceneEgoAverage MPJPE (mm)64.19EgoWholeMocap-Single Frame
1 Image, 2*2 StitchiSceneEgoPA-MPJPE50.06EgoWholeMocap-Single Frame

Related Papers

$π^3$: Scalable Permutation-Equivariant Visual Geometry Learning2025-07-17Revisiting Reliability in the Reasoning-based Pose Estimation Benchmark2025-07-17DINO-VO: A Feature-based Visual Odometry Leveraging a Visual Foundation Model2025-07-17From Neck to Head: Bio-Impedance Sensing for Head Pose Estimation2025-07-17AthleticsPose: Authentic Sports Motion Dataset on Athletic Field and Evaluation of Monocular 3D Pose Estimation Ability2025-07-17SpatialTrackerV2: 3D Point Tracking Made Easy2025-07-16SGLoc: Semantic Localization System for Camera Pose Estimation from 3D Gaussian Splatting Representation2025-07-16Efficient Calisthenics Skills Classification through Foreground Instance Selection and Depth Estimation2025-07-16