Julian Wiederer, Arij Bouazizi, Ulrich Kressel, Vasileios Belagiannis
A car driver knows how to react on the gestures of the traffic officers. Clearly, this is not the case for the autonomous vehicle, unless it has road traffic control gesture recognition functionalities. In this work, we address the limitation of the existing autonomous driving datasets to provide learning data for traffic control gesture recognition. We introduce a dataset that is based on 3D body skeleton input to perform traffic control gesture classification on every time step. Our dataset consists of 250 sequences from several actors, ranging from 16 to 90 seconds per sequence. To evaluate our dataset, we propose eight sequential processing models based on deep neural networks such as recurrent networks, attention mechanism, temporal convolutional networks and graph convolutional networks. We present an extensive evaluation and analysis of all approaches for our dataset, as well as real-world quantitative evaluation. The code and dataset is publicly available.
| Task | Dataset | Metric | Value | Model |
|---|---|---|---|---|
| Video | TCG-dataset | Acc | 87.24 | Bidirectional LSTM |
| Video | TCG-dataset | F1-Score | 78.48 | Bidirectional LSTM |
| Video | TCG-dataset | Jaccard Index | 67 | Bidirectional LSTM |
| Temporal Action Localization | TCG-dataset | Acc | 87.24 | Bidirectional LSTM |
| Temporal Action Localization | TCG-dataset | F1-Score | 78.48 | Bidirectional LSTM |
| Temporal Action Localization | TCG-dataset | Jaccard Index | 67 | Bidirectional LSTM |
| Zero-Shot Learning | TCG-dataset | Acc | 87.24 | Bidirectional LSTM |
| Zero-Shot Learning | TCG-dataset | F1-Score | 78.48 | Bidirectional LSTM |
| Zero-Shot Learning | TCG-dataset | Jaccard Index | 67 | Bidirectional LSTM |
| Activity Recognition | TCG-dataset | Acc | 87.24 | Bidirectional LSTM |
| Activity Recognition | TCG-dataset | F1-Score | 78.48 | Bidirectional LSTM |
| Activity Recognition | TCG-dataset | Jaccard Index | 67 | Bidirectional LSTM |
| Action Localization | TCG-dataset | Acc | 87.24 | Bidirectional LSTM |
| Action Localization | TCG-dataset | F1-Score | 78.48 | Bidirectional LSTM |
| Action Localization | TCG-dataset | Jaccard Index | 67 | Bidirectional LSTM |
| Action Detection | TCG-dataset | Acc | 87.24 | Bidirectional LSTM |
| Action Detection | TCG-dataset | F1-Score | 78.48 | Bidirectional LSTM |
| Action Detection | TCG-dataset | Jaccard Index | 67 | Bidirectional LSTM |
| 3D Action Recognition | TCG-dataset | Acc | 87.24 | Bidirectional LSTM |
| 3D Action Recognition | TCG-dataset | F1-Score | 78.48 | Bidirectional LSTM |
| 3D Action Recognition | TCG-dataset | Jaccard Index | 67 | Bidirectional LSTM |
| Action Recognition | TCG-dataset | Acc | 87.24 | Bidirectional LSTM |
| Action Recognition | TCG-dataset | F1-Score | 78.48 | Bidirectional LSTM |
| Action Recognition | TCG-dataset | Jaccard Index | 67 | Bidirectional LSTM |