Mahdi Davoodikakhki, KangKang Yin
Research on human action classification has made significant progresses in the past few years. Most deep learning methods focus on improving performance by adding more network components. We propose, however, to better utilize auxiliary mechanisms, including hierarchical classification, network pruning, and skeleton-based preprocessing, to boost the model robustness and performance. We test the effectiveness of our method on four commonly used testing datasets: NTU RGB+D 60, NTU RGB+D 120, Northwestern-UCLA Multiview Action 3D, and UTD Multimodal Human Action Dataset. Our experiments show that our method can achieve either comparable or better performance on all four datasets. In particular, our method sets up a new baseline for NTU 120, the largest dataset among the four. We also analyze our method with extensive comparisons and ablation studies.
| Task | Dataset | Metric | Value | Model |
|---|---|---|---|---|
| Video | N-UCLA | Accuracy | 93.99 | Hierarchical Action Classification (RGB + Pose) |
| Temporal Action Localization | N-UCLA | Accuracy | 93.99 | Hierarchical Action Classification (RGB + Pose) |
| Zero-Shot Learning | N-UCLA | Accuracy | 93.99 | Hierarchical Action Classification (RGB + Pose) |
| Activity Recognition | NTU RGB+D | Accuracy (CS) | 95.66 | Hierarchical Action Classification (RGB + Pose) |
| Activity Recognition | NTU RGB+D | Accuracy (CV) | 98.79 | Hierarchical Action Classification (RGB + Pose) |
| Activity Recognition | N-UCLA | Accuracy | 93.99 | Hierarchical Action Classification (RGB + Pose) |
| Action Localization | N-UCLA | Accuracy | 93.99 | Hierarchical Action Classification (RGB + Pose) |
| Action Detection | N-UCLA | Accuracy | 93.99 | Hierarchical Action Classification (RGB + Pose) |
| 3D Action Recognition | N-UCLA | Accuracy | 93.99 | Hierarchical Action Classification (RGB + Pose) |
| Action Recognition | NTU RGB+D | Accuracy (CS) | 95.66 | Hierarchical Action Classification (RGB + Pose) |
| Action Recognition | NTU RGB+D | Accuracy (CV) | 98.79 | Hierarchical Action Classification (RGB + Pose) |
| Action Recognition | N-UCLA | Accuracy | 93.99 | Hierarchical Action Classification (RGB + Pose) |