ThermoHands
ThermoHands is the first benchmark dataset specifically designed for egocentric 3D hand pose estimation from thermal images. It addresses the challenges of hand pose estimation in low-light conditions and when the hand is occluded by gloves or other wearables—scenarios where traditional RGB or NIR-based systems struggle.
The dataset contains approximately 96,000 synchronized multi-view, multi-spectral images collected from 28 participants under five varied scenarios involving changes in lighting, handwear, and background environments. Each sample includes thermal, NIR, RGB, and depth images, captured using a custom-built head-mounted sensor platform (HMSP) and a multi-view exocentric setup. Ground-truth 3D hand poses and shapes are automatically annotated using multi-view optimization based on the MANO model.
Key features:
Multi-spectral data: Egocentric and exocentric views of thermal, NIR, RGB, and depth modalities
Rich annotations: 3D hand pose and shape ground truth via an automated optimization pipeline
Realistic interactions: Both hand-object and hand-virtual interactions in diverse everyday scenarios
Motivation and Use Cases: ThermoHands enables research into robust, privacy-preserving hand pose estimation for XR, AR/VR, and human-computer interaction applications. It is particularly useful for scenarios involving poor illumination, occlusion by gloves, or the need for passive sensing (e.g., avoiding NIR interference or power constraints).
This dataset also introduces TherFormer, a novel transformer-based baseline tailored to thermal input, and benchmarks multiple SOTA methods under cross-spectral settings.
Paper: https://arxiv.org/pdf/2403.09871 Project page: https://thermohands.github.io/ Code and models: https://github.com/LawrenceZ22/ThermoHands