TasksSotADatasetsPapersMethodsSubmitAbout
Papers With Code 2

A community resource for machine learning research: papers, code, benchmarks, and state-of-the-art results.

Explore

Notable BenchmarksAll SotADatasetsPapersMethods

Community

Submit ResultsAbout

Data sourced from the PWC Archive (CC-BY-SA 4.0). Built by the community, for the community.

Papers/SHARP: Segmentation of Hands and Arms by Range using Pseud...

SHARP: Segmentation of Hands and Arms by Range using Pseudo-Depth for Enhanced Egocentric 3D Hand Pose Estimation and Action Recognition

Wiktor Mucha, Michael Wray, Martin Kampel

2024-08-193D Hand Pose EstimationSkeleton Based Action RecognitionPose EstimationDepth EstimationAction Recognitionobject-detectionObject DetectionHand Pose Estimation
PaperPDFCode(official)

Abstract

Hand pose represents key information for action recognition in the egocentric perspective, where the user is interacting with objects. We propose to improve egocentric 3D hand pose estimation based on RGB frames only by using pseudo-depth images. Incorporating state-of-the-art single RGB image depth estimation techniques, we generate pseudo-depth representations of the frames and use distance knowledge to segment irrelevant parts of the scene. The resulting depth maps are then used as segmentation masks for the RGB frames. Experimental results on H2O Dataset confirm the high accuracy of the estimated pose with our method in an action recognition task. The 3D hand pose, together with information from object detection, is processed by a transformer-based action recognition network, resulting in an accuracy of 91.73%, outperforming all state-of-the-art methods. Estimations of 3D hand pose result in competitive performance with existing methods with a mean pose error of 28.66 mm. This method opens up new possibilities for employing distance information in egocentric 3D hand pose estimation without relying on depth sensors.

Results

TaskDatasetMetricValueModel
VideoH2O (2 Hands and Objects)Accuracy91.73SHARP
Temporal Action LocalizationH2O (2 Hands and Objects)Accuracy91.73SHARP
Zero-Shot LearningH2O (2 Hands and Objects)Accuracy91.73SHARP
Activity RecognitionH2O (2 Hands and Objects)Actions Top-191.73SHARP
Activity RecognitionH2O (2 Hands and Objects)Accuracy91.73SHARP
Action LocalizationH2O (2 Hands and Objects)Accuracy91.73SHARP
Action DetectionH2O (2 Hands and Objects)Accuracy91.73SHARP
3D Action RecognitionH2O (2 Hands and Objects)Accuracy91.73SHARP
Action RecognitionH2O (2 Hands and Objects)Actions Top-191.73SHARP
Action RecognitionH2O (2 Hands and Objects)Accuracy91.73SHARP

Related Papers

$π^3$: Scalable Permutation-Equivariant Visual Geometry Learning2025-07-17Revisiting Reliability in the Reasoning-based Pose Estimation Benchmark2025-07-17DINO-VO: A Feature-based Visual Odometry Leveraging a Visual Foundation Model2025-07-17From Neck to Head: Bio-Impedance Sensing for Head Pose Estimation2025-07-17AthleticsPose: Authentic Sports Motion Dataset on Athletic Field and Evaluation of Monocular 3D Pose Estimation Ability2025-07-17$S^2M^2$: Scalable Stereo Matching Model for Reliable Depth Estimation2025-07-17A Real-Time System for Egocentric Hand-Object Interaction Detection in Industrial Domains2025-07-17RS-TinyNet: Stage-wise Feature Fusion Network for Detecting Tiny Objects in Remote Sensing Images2025-07-17