A Spatio-Temporal Multilayer Perceptron for Gesture Recognition

Adrian Holzbock, Alexander Tsaregorodtsev, Youssef Dawoud, Klaus Dietmayer, Vasileios Belagiannis

2022-04-25Autonomous Vehicles Skeleton Based Action Recognition Gesture Recognition

Abstract

Gesture recognition is essential for the interaction of autonomous vehicles with humans. While the current approaches focus on combining several modalities like image features, keypoints and bone vectors, we present neural network architecture that delivers state-of-the-art results only with body skeleton input data. We propose the spatio-temporal multilayer perceptron for gesture recognition in the context of autonomous vehicles. Given 3D body poses over time, we define temporal and spatial mixing operations to extract features in both domains. Additionally, the importance of each time step is re-weighted with Squeeze-and-Excitation layers. An extensive evaluation of the TCG and Drive&Act datasets is provided to showcase the promising performance of our approach. Furthermore, we deploy our model to our autonomous vehicle to show its real-time capability and stable execution.

Results

Task	Dataset	Metric	Value	Model
Video	TCG-dataset	Acc	85.99	stMLP
Video	TCG-dataset	F1-Score	80.05	stMLP
Video	TCG-dataset	Jaccard Index	67.88	stMLP
Video	Drive&Act	mean per-class accuracy	34.61	stMLP
Temporal Action Localization	TCG-dataset	Acc	85.99	stMLP
Temporal Action Localization	TCG-dataset	F1-Score	80.05	stMLP
Temporal Action Localization	TCG-dataset	Jaccard Index	67.88	stMLP
Temporal Action Localization	Drive&Act	mean per-class accuracy	34.61	stMLP
Zero-Shot Learning	TCG-dataset	Acc	85.99	stMLP
Zero-Shot Learning	TCG-dataset	F1-Score	80.05	stMLP
Zero-Shot Learning	TCG-dataset	Jaccard Index	67.88	stMLP
Zero-Shot Learning	Drive&Act	mean per-class accuracy	34.61	stMLP
Activity Recognition	TCG-dataset	Acc	85.99	stMLP
Activity Recognition	TCG-dataset	F1-Score	80.05	stMLP
Activity Recognition	TCG-dataset	Jaccard Index	67.88	stMLP
Activity Recognition	Drive&Act	mean per-class accuracy	34.61	stMLP
Action Localization	TCG-dataset	Acc	85.99	stMLP
Action Localization	TCG-dataset	F1-Score	80.05	stMLP
Action Localization	TCG-dataset	Jaccard Index	67.88	stMLP
Action Localization	Drive&Act	mean per-class accuracy	34.61	stMLP
Action Detection	TCG-dataset	Acc	85.99	stMLP
Action Detection	TCG-dataset	F1-Score	80.05	stMLP
Action Detection	TCG-dataset	Jaccard Index	67.88	stMLP
Action Detection	Drive&Act	mean per-class accuracy	34.61	stMLP
3D Action Recognition	TCG-dataset	Acc	85.99	stMLP
3D Action Recognition	TCG-dataset	F1-Score	80.05	stMLP
3D Action Recognition	TCG-dataset	Jaccard Index	67.88	stMLP
3D Action Recognition	Drive&Act	mean per-class accuracy	34.61	stMLP
Action Recognition	TCG-dataset	Acc	85.99	stMLP
Action Recognition	TCG-dataset	F1-Score	80.05	stMLP
Action Recognition	TCG-dataset	Jaccard Index	67.88	stMLP
Action Recognition	Drive&Act	mean per-class accuracy	34.61	stMLP

A Spatio-Temporal Multilayer Perceptron for Gesture Recognition

Abstract

Results

Related Papers

A Spatio-Temporal Multilayer Perceptron for Gesture Recognition

Abstract

Results

Related Papers