Jongmin Yu, Yongsang Yoon, Moongu Jeon
In skeleton-based action recognition, graph convolutional networks (GCNs), which model human body skeletons using graphical components such as nodes and connections, have achieved remarkable performance recently. However, current state-of-the-art methods for skeleton-based action recognition usually work on the assumption that the completely observed skeletons will be provided. This may be problematic to apply this assumption in real scenarios since there is always a possibility that captured skeletons are incomplete or noisy. In this work, we propose a skeleton-based action recognition method which is robust to noise information of given skeleton features. The key insight of our approach is to train a model by maximizing the mutual information between normal and noisy skeletons using a predictive coding manner. We have conducted comprehensive experiments about skeleton-based action recognition with defected skeletons using NTU-RGB+D and Kinetics-Skeleton datasets. The experimental results demonstrate that our approach achieves outstanding performance when skeleton samples are noised compared with existing state-of-the-art methods.
| Task | Dataset | Metric | Value | Model |
|---|---|---|---|---|
| Video | Kinetics-Skeleton dataset | Accuracy | 34.8 | PeGCN |
| Video | NTU RGB+D | Accuracy (CS) | 85.6 | PeGCN |
| Video | NTU RGB+D | Accuracy (CV) | 93.4 | PeGCN |
| Temporal Action Localization | Kinetics-Skeleton dataset | Accuracy | 34.8 | PeGCN |
| Temporal Action Localization | NTU RGB+D | Accuracy (CS) | 85.6 | PeGCN |
| Temporal Action Localization | NTU RGB+D | Accuracy (CV) | 93.4 | PeGCN |
| Zero-Shot Learning | Kinetics-Skeleton dataset | Accuracy | 34.8 | PeGCN |
| Zero-Shot Learning | NTU RGB+D | Accuracy (CS) | 85.6 | PeGCN |
| Zero-Shot Learning | NTU RGB+D | Accuracy (CV) | 93.4 | PeGCN |
| Activity Recognition | Kinetics-Skeleton dataset | Accuracy | 34.8 | PeGCN |
| Activity Recognition | NTU RGB+D | Accuracy (CS) | 85.6 | PeGCN |
| Activity Recognition | NTU RGB+D | Accuracy (CV) | 93.4 | PeGCN |
| Action Localization | Kinetics-Skeleton dataset | Accuracy | 34.8 | PeGCN |
| Action Localization | NTU RGB+D | Accuracy (CS) | 85.6 | PeGCN |
| Action Localization | NTU RGB+D | Accuracy (CV) | 93.4 | PeGCN |
| Action Detection | Kinetics-Skeleton dataset | Accuracy | 34.8 | PeGCN |
| Action Detection | NTU RGB+D | Accuracy (CS) | 85.6 | PeGCN |
| Action Detection | NTU RGB+D | Accuracy (CV) | 93.4 | PeGCN |
| 3D Action Recognition | Kinetics-Skeleton dataset | Accuracy | 34.8 | PeGCN |
| 3D Action Recognition | NTU RGB+D | Accuracy (CS) | 85.6 | PeGCN |
| 3D Action Recognition | NTU RGB+D | Accuracy (CV) | 93.4 | PeGCN |
| Action Recognition | Kinetics-Skeleton dataset | Accuracy | 34.8 | PeGCN |
| Action Recognition | NTU RGB+D | Accuracy (CS) | 85.6 | PeGCN |
| Action Recognition | NTU RGB+D | Accuracy (CV) | 93.4 | PeGCN |