Real-Time Hand Gesture Recognition: Integrating Skeleton-Based Data Fusion and Multi-Stream CNN

Oluwaleke Yusuf, Maki Habib, Mohamed Moustafa

2024-06-21Image Classification Skeleton Based Action Recognition Gesture Recognition Hand Gesture Recognition Action Recognition Hand-Gesture Recognition

Paper PDF Code(official)

Abstract

Hand Gesture Recognition (HGR) enables intuitive human-computer interactions in various real-world contexts. However, existing frameworks often struggle to meet the real-time requirements essential for practical HGR applications. This study introduces a robust, skeleton-based framework for dynamic HGR that simplifies the recognition of dynamic hand gestures into a static image classification task, effectively reducing both hardware and computational demands. Our framework utilizes a data-level fusion technique to encode 3D skeleton data from dynamic gestures into static RGB spatiotemporal images. It incorporates a specialized end-to-end Ensemble Tuner (e2eET) Multi-Stream CNN architecture that optimizes the semantic connections between data representations while minimizing computational needs. Tested across five benchmark datasets (SHREC'17, DHG-14/28, FPHA, LMDHG, and CNR), the framework showed competitive performance with the state-of-the-art. Its capability to support real-time HGR applications was also demonstrated through deployment on standard consumer PC hardware, showcasing low latency and minimal resource usage in real-world settings. The successful deployment of this framework underscores its potential to enhance real-time applications in fields such as virtual/augmented reality, ambient intelligence, and assistive technologies, providing a scalable and efficient solution for dynamic gesture recognition.

Results

Task	Dataset	Metric	Value	Model
Video	SBU / SBU-Refine	Accuracy	93.96	e2eET
Video	First-Person Hand Action Benchmark	1:1 Accuracy	91.83	e2eET
Temporal Action Localization	SBU / SBU-Refine	Accuracy	93.96	e2eET
Temporal Action Localization	First-Person Hand Action Benchmark	1:1 Accuracy	91.83	e2eET
Zero-Shot Learning	SBU / SBU-Refine	Accuracy	93.96	e2eET
Zero-Shot Learning	First-Person Hand Action Benchmark	1:1 Accuracy	91.83	e2eET
Activity Recognition	SBU / SBU-Refine	Accuracy	93.96	e2eET
Activity Recognition	First-Person Hand Action Benchmark	1:1 Accuracy	91.83	e2eET
Action Localization	SBU / SBU-Refine	Accuracy	93.96	e2eET
Action Localization	First-Person Hand Action Benchmark	1:1 Accuracy	91.83	e2eET
Hand	DHG-28	Accuracy	92.38	e2eET
Hand	SHREC 2017	14 Gestures Accuracy	97.86	e2eET
Hand	SHREC 2017	28 Gestures Accuracy	95.36	e2eET
Hand	DHG-14	Accuracy	95.83	e2eET
Action Detection	SBU / SBU-Refine	Accuracy	93.96	e2eET
Action Detection	First-Person Hand Action Benchmark	1:1 Accuracy	91.83	e2eET
Gesture Recognition	DHG-28	Accuracy	92.38	e2eET
Gesture Recognition	SHREC 2017	14 Gestures Accuracy	97.86	e2eET
Gesture Recognition	SHREC 2017	28 Gestures Accuracy	95.36	e2eET
Gesture Recognition	DHG-14	Accuracy	95.83	e2eET
3D Action Recognition	SBU / SBU-Refine	Accuracy	93.96	e2eET
3D Action Recognition	First-Person Hand Action Benchmark	1:1 Accuracy	91.83	e2eET
Action Recognition	SBU / SBU-Refine	Accuracy	93.96	e2eET
Action Recognition	First-Person Hand Action Benchmark	1:1 Accuracy	91.83	e2eET

Real-Time Hand Gesture Recognition: Integrating Skeleton-Based Data Fusion and Multi-Stream CNN

Abstract

Results

Related Papers

Real-Time Hand Gesture Recognition: Integrating Skeleton-Based Data Fusion and Multi-Stream CNN

Abstract

Results

Related Papers