Neel Trivedi, Anirudh Thatipelli, Ravi Kiran Sarvadevabhatla
The lack of fine-grained joints (facial joints, hand fingers) is a fundamental performance bottleneck for state of the art skeleton action recognition models. Despite this bottleneck, community's efforts seem to be invested only in coming up with novel architectures. To specifically address this bottleneck, we introduce two new pose based human action datasets - NTU60-X and NTU120-X. Our datasets extend the largest existing action recognition dataset, NTU-RGBD. In addition to the 25 body joints for each skeleton as in NTU-RGBD, NTU60-X and NTU120-X dataset includes finger and facial joints, enabling a richer skeleton representation. We appropriately modify the state of the art approaches to enable training using the introduced datasets. Our results demonstrate the effectiveness of these NTU-X datasets in overcoming the aforementioned bottleneck and improve state of the art performance, overall and on previously worst performing action categories. Code and pretrained models can be found at https://github.com/skelemoa/ntu-x .
| Task | Dataset | Metric | Value | Model |
|---|---|---|---|---|
| Video | NTU60-X | Accuracy (Body + Fingers + Face joints) | 89.64 | 4s-ShiftGCN |
| Video | NTU60-X | Accuracy (Body + Fingers joints) | 91.78 | 4s-ShiftGCN |
| Video | NTU60-X | Accuracy (Body joints) | 89.56 | 4s-ShiftGCN |
| Video | NTU60-X | Accuracy (Body + Fingers + Face joints) | 91.12 | MS-G3D |
| Video | NTU60-X | Accuracy (Body + Fingers joints) | 91.76 | MS-G3D |
| Video | NTU60-X | Accuracy (Body joints) | 91.26 | MS-G3D |
| Video | NTU60-X | Accuracy (Body + Fingers + Face joints) | 89.79 | PA-ResGCN |
| Video | NTU60-X | Accuracy (Body + Fingers joints) | 91.64 | PA-ResGCN |
| Video | NTU60-X | Accuracy (Body joints) | 89.98 | PA-ResGCN |
| Temporal Action Localization | NTU60-X | Accuracy (Body + Fingers + Face joints) | 89.64 | 4s-ShiftGCN |
| Temporal Action Localization | NTU60-X | Accuracy (Body + Fingers joints) | 91.78 | 4s-ShiftGCN |
| Temporal Action Localization | NTU60-X | Accuracy (Body joints) | 89.56 | 4s-ShiftGCN |
| Temporal Action Localization | NTU60-X | Accuracy (Body + Fingers + Face joints) | 91.12 | MS-G3D |
| Temporal Action Localization | NTU60-X | Accuracy (Body + Fingers joints) | 91.76 | MS-G3D |
| Temporal Action Localization | NTU60-X | Accuracy (Body joints) | 91.26 | MS-G3D |
| Temporal Action Localization | NTU60-X | Accuracy (Body + Fingers + Face joints) | 89.79 | PA-ResGCN |
| Temporal Action Localization | NTU60-X | Accuracy (Body + Fingers joints) | 91.64 | PA-ResGCN |
| Temporal Action Localization | NTU60-X | Accuracy (Body joints) | 89.98 | PA-ResGCN |
| Zero-Shot Learning | NTU60-X | Accuracy (Body + Fingers + Face joints) | 89.64 | 4s-ShiftGCN |
| Zero-Shot Learning | NTU60-X | Accuracy (Body + Fingers joints) | 91.78 | 4s-ShiftGCN |
| Zero-Shot Learning | NTU60-X | Accuracy (Body joints) | 89.56 | 4s-ShiftGCN |
| Zero-Shot Learning | NTU60-X | Accuracy (Body + Fingers + Face joints) | 91.12 | MS-G3D |
| Zero-Shot Learning | NTU60-X | Accuracy (Body + Fingers joints) | 91.76 | MS-G3D |
| Zero-Shot Learning | NTU60-X | Accuracy (Body joints) | 91.26 | MS-G3D |
| Zero-Shot Learning | NTU60-X | Accuracy (Body + Fingers + Face joints) | 89.79 | PA-ResGCN |
| Zero-Shot Learning | NTU60-X | Accuracy (Body + Fingers joints) | 91.64 | PA-ResGCN |
| Zero-Shot Learning | NTU60-X | Accuracy (Body joints) | 89.98 | PA-ResGCN |
| Activity Recognition | NTU60-X | Accuracy (Body + Fingers + Face joints) | 89.64 | 4s-ShiftGCN |
| Activity Recognition | NTU60-X | Accuracy (Body + Fingers joints) | 91.78 | 4s-ShiftGCN |
| Activity Recognition | NTU60-X | Accuracy (Body joints) | 89.56 | 4s-ShiftGCN |
| Activity Recognition | NTU60-X | Accuracy (Body + Fingers + Face joints) | 91.12 | MS-G3D |
| Activity Recognition | NTU60-X | Accuracy (Body + Fingers joints) | 91.76 | MS-G3D |
| Activity Recognition | NTU60-X | Accuracy (Body joints) | 91.26 | MS-G3D |
| Activity Recognition | NTU60-X | Accuracy (Body + Fingers + Face joints) | 89.79 | PA-ResGCN |
| Activity Recognition | NTU60-X | Accuracy (Body + Fingers joints) | 91.64 | PA-ResGCN |
| Activity Recognition | NTU60-X | Accuracy (Body joints) | 89.98 | PA-ResGCN |
| Action Localization | NTU60-X | Accuracy (Body + Fingers + Face joints) | 89.64 | 4s-ShiftGCN |
| Action Localization | NTU60-X | Accuracy (Body + Fingers joints) | 91.78 | 4s-ShiftGCN |
| Action Localization | NTU60-X | Accuracy (Body joints) | 89.56 | 4s-ShiftGCN |
| Action Localization | NTU60-X | Accuracy (Body + Fingers + Face joints) | 91.12 | MS-G3D |
| Action Localization | NTU60-X | Accuracy (Body + Fingers joints) | 91.76 | MS-G3D |
| Action Localization | NTU60-X | Accuracy (Body joints) | 91.26 | MS-G3D |
| Action Localization | NTU60-X | Accuracy (Body + Fingers + Face joints) | 89.79 | PA-ResGCN |
| Action Localization | NTU60-X | Accuracy (Body + Fingers joints) | 91.64 | PA-ResGCN |
| Action Localization | NTU60-X | Accuracy (Body joints) | 89.98 | PA-ResGCN |
| Action Detection | NTU60-X | Accuracy (Body + Fingers + Face joints) | 89.64 | 4s-ShiftGCN |
| Action Detection | NTU60-X | Accuracy (Body + Fingers joints) | 91.78 | 4s-ShiftGCN |
| Action Detection | NTU60-X | Accuracy (Body joints) | 89.56 | 4s-ShiftGCN |
| Action Detection | NTU60-X | Accuracy (Body + Fingers + Face joints) | 91.12 | MS-G3D |
| Action Detection | NTU60-X | Accuracy (Body + Fingers joints) | 91.76 | MS-G3D |
| Action Detection | NTU60-X | Accuracy (Body joints) | 91.26 | MS-G3D |
| Action Detection | NTU60-X | Accuracy (Body + Fingers + Face joints) | 89.79 | PA-ResGCN |
| Action Detection | NTU60-X | Accuracy (Body + Fingers joints) | 91.64 | PA-ResGCN |
| Action Detection | NTU60-X | Accuracy (Body joints) | 89.98 | PA-ResGCN |
| 3D Action Recognition | NTU60-X | Accuracy (Body + Fingers + Face joints) | 89.64 | 4s-ShiftGCN |
| 3D Action Recognition | NTU60-X | Accuracy (Body + Fingers joints) | 91.78 | 4s-ShiftGCN |
| 3D Action Recognition | NTU60-X | Accuracy (Body joints) | 89.56 | 4s-ShiftGCN |
| 3D Action Recognition | NTU60-X | Accuracy (Body + Fingers + Face joints) | 91.12 | MS-G3D |
| 3D Action Recognition | NTU60-X | Accuracy (Body + Fingers joints) | 91.76 | MS-G3D |
| 3D Action Recognition | NTU60-X | Accuracy (Body joints) | 91.26 | MS-G3D |
| 3D Action Recognition | NTU60-X | Accuracy (Body + Fingers + Face joints) | 89.79 | PA-ResGCN |
| 3D Action Recognition | NTU60-X | Accuracy (Body + Fingers joints) | 91.64 | PA-ResGCN |
| 3D Action Recognition | NTU60-X | Accuracy (Body joints) | 89.98 | PA-ResGCN |
| Action Recognition | NTU60-X | Accuracy (Body + Fingers + Face joints) | 89.64 | 4s-ShiftGCN |
| Action Recognition | NTU60-X | Accuracy (Body + Fingers joints) | 91.78 | 4s-ShiftGCN |
| Action Recognition | NTU60-X | Accuracy (Body joints) | 89.56 | 4s-ShiftGCN |
| Action Recognition | NTU60-X | Accuracy (Body + Fingers + Face joints) | 91.12 | MS-G3D |
| Action Recognition | NTU60-X | Accuracy (Body + Fingers joints) | 91.76 | MS-G3D |
| Action Recognition | NTU60-X | Accuracy (Body joints) | 91.26 | MS-G3D |
| Action Recognition | NTU60-X | Accuracy (Body + Fingers + Face joints) | 89.79 | PA-ResGCN |
| Action Recognition | NTU60-X | Accuracy (Body + Fingers joints) | 91.64 | PA-ResGCN |
| Action Recognition | NTU60-X | Accuracy (Body joints) | 89.98 | PA-ResGCN |