Lukas Hedegaard, Negar Heidari, Alexandros Iosifidis
Graph-based reasoning over skeleton data has emerged as a promising approach for human action recognition. However, the application of prior graph-based methods, which predominantly employ whole temporal sequences as their input, to the setting of online inference entails considerable computational redundancy. In this paper, we tackle this issue by reformulating the Spatio-Temporal Graph Convolutional Neural Network as a Continual Inference Network, which can perform step-by-step predictions in time without repeat frame processing. To evaluate our method, we create a continual version of ST-GCN, CoST-GCN, alongside two derived methods with different self-attention mechanisms, CoAGCN and CoS-TR. We investigate weight transfer strategies and architectural modifications for inference acceleration, and perform experiments on the NTU RGB+D 60, NTU RGB+D 120, and Kinetics Skeleton 400 datasets. Retaining similar predictive accuracy, we observe up to 109x reduction in time complexity, on-hardware accelerations of 26x, and reductions in maximum allocated memory of 52% during online inference.
| Task | Dataset | Metric | Value | Model |
|---|---|---|---|---|
| Video | NTU RGB+D 120 | Accuracy (Cross-Setup) | 86.2 | S-TR (2-stream) |
| Video | NTU RGB+D 120 | Accuracy (Cross-Subject) | 84.8 | S-TR (2-stream) |
| Video | NTU RGB+D 120 | GFLOPS per prediction | 32.4 | S-TR (2-stream) |
| Video | NTU RGB+D 120 | Accuracy (Cross-Setup) | 86.1 | CoS-TR* (2-stream) |
| Video | NTU RGB+D 120 | Accuracy (Cross-Subject) | 84.8 | CoS-TR* (2-stream) |
| Video | NTU RGB+D 120 | GFLOPS per prediction | 0.3 | CoS-TR* (2-stream) |
| Video | NTU RGB+D 120 | Accuracy (Cross-Setup) | 85.5 | CoST-GCN* (2-stream) |
| Video | NTU RGB+D 120 | Accuracy (Cross-Subject) | 84 | CoST-GCN* (2-stream) |
| Video | NTU RGB+D 120 | GFLOPS per prediction | 0.32 | CoST-GCN* (2-stream) |
| Video | NTU RGB+D 120 | Accuracy (Cross-Setup) | 85.4 | AGCN (2-stream) |
| Video | NTU RGB+D 120 | Accuracy (Cross-Subject) | 84 | AGCN (2-stream) |
| Video | NTU RGB+D 120 | GFLOPS per prediction | 37.38 | AGCN (2-stream) |
| Video | NTU RGB+D 120 | Accuracy (Cross-Setup) | 85.1 | ST-GCN (2-stream) |
| Video | NTU RGB+D 120 | Accuracy (Cross-Subject) | 83.7 | ST-GCN (2-stream) |
| Video | NTU RGB+D 120 | GFLOPS per prediction | 33.46 | ST-GCN (2-stream) |
| Video | NTU RGB+D 120 | Accuracy (Cross-Setup) | 82 | CoAGCN* (2-stream) |
| Video | NTU RGB+D 120 | Accuracy (Cross-Subject) | 80.4 | CoAGCN* (2-stream) |
| Video | NTU RGB+D 120 | GFLOPS per prediction | 0.44 | CoAGCN* (2-stream) |
| Video | NTU RGB+D 120 | Accuracy (Cross-Setup) | 81.8 | S-TR (1-stream) |
| Video | NTU RGB+D 120 | Accuracy (Cross-Subject) | 80.2 | S-TR (1-stream) |
| Video | NTU RGB+D 120 | GFLOPS per prediction | 16.2 | S-TR (1-stream) |
| Video | NTU RGB+D 120 | Accuracy (Cross-Setup) | 81.7 | CoS-TR* (1-stream) |
| Video | NTU RGB+D 120 | Accuracy (Cross-Subject) | 79.7 | CoS-TR* (1-stream) |
| Video | NTU RGB+D 120 | GFLOPS per prediction | 0.15 | CoS-TR* (1-stream) |
| Video | NTU RGB+D 120 | Accuracy (Cross-Setup) | 80.7 | AGCN (1-stream) |
| Video | NTU RGB+D 120 | Accuracy (Cross-Subject) | 79.7 | AGCN (1-stream) |
| Video | NTU RGB+D 120 | GFLOPS per prediction | 18.69 | AGCN (1-stream) |
| Video | NTU RGB+D 120 | Accuracy (Cross-Setup) | 81.6 | CoST-GCN* (1-stream) |
| Video | NTU RGB+D 120 | Accuracy (Cross-Subject) | 79.4 | CoST-GCN* (1-stream) |
| Video | NTU RGB+D 120 | GFLOPS per prediction | 0.16 | CoST-GCN* (1-stream) |
| Video | NTU RGB+D 120 | Accuracy (Cross-Subject) | 79 | ST-GCN (1-stream) |
| Video | NTU RGB+D 120 | GFLOPS per prediction | 16.73 | ST-GCN (1-stream) |
| Video | NTU RGB+D 120 | Accuracy (Cross-Setup) | 79.1 | CoAGCN* (1-stream) |
| Video | NTU RGB+D 120 | Accuracy (Cross-Subject) | 77.3 | CoAGCN* (1-stream) |
| Video | NTU RGB+D 120 | GFLOPS per prediction | 0.22 | CoAGCN* (1-stream) |
| Video | Kinetics-Skeleton dataset | Accuracy | 36.9 | AGCN (2-stream) |
| Video | Kinetics-Skeleton dataset | GFLOPS per prediction | 26.91 | AGCN (2-stream) |
| Video | Kinetics-Skeleton dataset | Accuracy | 35 | AGCN (1-stream) |
| Video | Kinetics-Skeleton dataset | GFLOPS per prediction | 13.45 | AGCN (1-stream) |
| Video | Kinetics-Skeleton dataset | Accuracy | 34.7 | S-TR (2-stream) |
| Video | Kinetics-Skeleton dataset | GFLOPS per prediction | 23.24 | S-TR (2-stream) |
| Video | Kinetics-Skeleton dataset | Accuracy | 34.4 | ST-GCN (2-stream) |
| Video | Kinetics-Skeleton dataset | GFLOPS per prediction | 24.09 | ST-GCN (2-stream) |
| Video | Kinetics-Skeleton dataset | Accuracy | 33.4 | ST-GCN (1-stream) |
| Video | Kinetics-Skeleton dataset | GFLOPS per prediction | 12.04 | ST-GCN (1-stream) |
| Video | Kinetics-Skeleton dataset | Accuracy | 33.1 | CoST-GCN (2-stream) |
| Video | Kinetics-Skeleton dataset | GFLOPS per prediction | 0.32 | CoST-GCN (2-stream) |
| Video | Kinetics-Skeleton dataset | Accuracy | 33 | CoAGCN (1-stream) |
| Video | Kinetics-Skeleton dataset | GFLOPS per prediction | 0.18 | CoAGCN (1-stream) |
| Video | Kinetics-Skeleton dataset | Accuracy | 32.7 | CoS-TR (2-stream) |
| Video | Kinetics-Skeleton dataset | GFLOPS per prediction | 0.31 | CoS-TR (2-stream) |
| Video | Kinetics-Skeleton dataset | Accuracy | 32.2 | CoST-GCN* (2-stream) |
| Video | Kinetics-Skeleton dataset | GFLOPS per prediction | 0.22 | CoST-GCN* (2-stream) |
| Video | Kinetics-Skeleton dataset | Accuracy | 32 | S-TR (1-stream) |
| Video | Kinetics-Skeleton dataset | GFLOPS per prediction | 11.62 | S-TR (1-stream) |
| Video | Kinetics-Skeleton dataset | Accuracy | 31.8 | CoST-GCN (1-stream) |
| Video | Kinetics-Skeleton dataset | GFLOPS per prediction | 0.16 | CoST-GCN (1-stream) |
| Video | Kinetics-Skeleton dataset | Accuracy | 30.2 | CoST-GCN* (1-stream) |
| Video | Kinetics-Skeleton dataset | GFLOPS per prediction | 0.11 | CoST-GCN* (1-stream) |
| Video | Kinetics-Skeleton dataset | Accuracy | 29.9 | CoS-TR* (2-stream) |
| Video | Kinetics-Skeleton dataset | GFLOPS per prediction | 0.22 | CoS-TR* (2-stream) |
| Video | Kinetics-Skeleton dataset | Accuracy | 29.7 | CoS-TR (1-stream) |
| Video | Kinetics-Skeleton dataset | Accuracy | 27.5 | CoAGCN* (2-stream) |
| Video | Kinetics-Skeleton dataset | GFLOPS per prediction | 0.25 | CoAGCN* (2-stream) |
| Video | Kinetics-Skeleton dataset | Accuracy | 27.4 | CoS-TR* (1-stream) |
| Video | Kinetics-Skeleton dataset | GFLOPS per prediction | 0.11 | CoS-TR* (1-stream) |
| Video | Kinetics-Skeleton dataset | Accuracy | 23.3 | CoAGCN* (1-stream) |
| Video | Kinetics-Skeleton dataset | GFLOPS per prediction | 0.12 | CoAGCN* (1-stream) |
| Video | Kinetics-Skeleton dataset | GFLOPS per prediction | 0.36 | CoAGCN (2-stream) |
| Video | NTU RGB+D | Accuracy (CS) | 88.9 | CoS-TR* (2-stream) |
| Video | NTU RGB+D | Accuracy (CV) | 94.8 | CoS-TR* (2-stream) |
| Video | NTU RGB+D | GFLOPs per pred | 0.3 | CoS-TR* (2-stream) |
| Video | NTU RGB+D | Accuracy (CS) | 88.3 | CoST-GCN* (2-stream) |
| Video | NTU RGB+D | Accuracy (CV) | 95 | CoST-GCN* (2-stream) |
| Video | NTU RGB+D | GFLOPs per pred | 0.32 | CoST-GCN* (2-stream) |
| Video | NTU RGB+D | Accuracy (CS) | 86.3 | CoST-GCN* |
| Video | NTU RGB+D | Accuracy (CV) | 93.8 | CoST-GCN* |
| Video | NTU RGB+D | GFLOPs per pred | 0.16 | CoST-GCN* |
| Video | NTU RGB+D | Accuracy (CS) | 86.3 | CoS-TR* |
| Video | NTU RGB+D | Accuracy (CV) | 92.4 | CoS-TR* |
| Video | NTU RGB+D | GFLOPs per pred | 0.15 | CoS-TR* |
| Video | NTU RGB+D | Accuracy (CS) | 86 | ST-GCN |
| Video | NTU RGB+D | Accuracy (CV) | 93.4 | ST-GCN |
| Video | NTU RGB+D | GFLOPs per pred | 16.73 | ST-GCN |
| Video | NTU RGB+D | Accuracy (CS) | 86 | CoAGCN* (2-stream) |
| Video | NTU RGB+D | Accuracy (CV) | 93.1 | CoAGCN* (2-stream) |
| Video | NTU RGB+D | GFLOPs per pred | 0.44 | CoAGCN* (2-stream) |
| Video | NTU RGB+D | Accuracy (CS) | 84.1 | CoAGCN* |
| Video | NTU RGB+D | Accuracy (CV) | 92.6 | CoAGCN* |
| Temporal Action Localization | NTU RGB+D 120 | Accuracy (Cross-Setup) | 86.2 | S-TR (2-stream) |
| Temporal Action Localization | NTU RGB+D 120 | Accuracy (Cross-Subject) | 84.8 | S-TR (2-stream) |
| Temporal Action Localization | NTU RGB+D 120 | GFLOPS per prediction | 32.4 | S-TR (2-stream) |
| Temporal Action Localization | NTU RGB+D 120 | Accuracy (Cross-Setup) | 86.1 | CoS-TR* (2-stream) |
| Temporal Action Localization | NTU RGB+D 120 | Accuracy (Cross-Subject) | 84.8 | CoS-TR* (2-stream) |
| Temporal Action Localization | NTU RGB+D 120 | GFLOPS per prediction | 0.3 | CoS-TR* (2-stream) |
| Temporal Action Localization | NTU RGB+D 120 | Accuracy (Cross-Setup) | 85.5 | CoST-GCN* (2-stream) |
| Temporal Action Localization | NTU RGB+D 120 | Accuracy (Cross-Subject) | 84 | CoST-GCN* (2-stream) |
| Temporal Action Localization | NTU RGB+D 120 | GFLOPS per prediction | 0.32 | CoST-GCN* (2-stream) |
| Temporal Action Localization | NTU RGB+D 120 | Accuracy (Cross-Setup) | 85.4 | AGCN (2-stream) |
| Temporal Action Localization | NTU RGB+D 120 | Accuracy (Cross-Subject) | 84 | AGCN (2-stream) |
| Temporal Action Localization | NTU RGB+D 120 | GFLOPS per prediction | 37.38 | AGCN (2-stream) |
| Temporal Action Localization | NTU RGB+D 120 | Accuracy (Cross-Setup) | 85.1 | ST-GCN (2-stream) |
| Temporal Action Localization | NTU RGB+D 120 | Accuracy (Cross-Subject) | 83.7 | ST-GCN (2-stream) |
| Temporal Action Localization | NTU RGB+D 120 | GFLOPS per prediction | 33.46 | ST-GCN (2-stream) |
| Temporal Action Localization | NTU RGB+D 120 | Accuracy (Cross-Setup) | 82 | CoAGCN* (2-stream) |
| Temporal Action Localization | NTU RGB+D 120 | Accuracy (Cross-Subject) | 80.4 | CoAGCN* (2-stream) |
| Temporal Action Localization | NTU RGB+D 120 | GFLOPS per prediction | 0.44 | CoAGCN* (2-stream) |
| Temporal Action Localization | NTU RGB+D 120 | Accuracy (Cross-Setup) | 81.8 | S-TR (1-stream) |
| Temporal Action Localization | NTU RGB+D 120 | Accuracy (Cross-Subject) | 80.2 | S-TR (1-stream) |
| Temporal Action Localization | NTU RGB+D 120 | GFLOPS per prediction | 16.2 | S-TR (1-stream) |
| Temporal Action Localization | NTU RGB+D 120 | Accuracy (Cross-Setup) | 81.7 | CoS-TR* (1-stream) |
| Temporal Action Localization | NTU RGB+D 120 | Accuracy (Cross-Subject) | 79.7 | CoS-TR* (1-stream) |
| Temporal Action Localization | NTU RGB+D 120 | GFLOPS per prediction | 0.15 | CoS-TR* (1-stream) |
| Temporal Action Localization | NTU RGB+D 120 | Accuracy (Cross-Setup) | 80.7 | AGCN (1-stream) |
| Temporal Action Localization | NTU RGB+D 120 | Accuracy (Cross-Subject) | 79.7 | AGCN (1-stream) |
| Temporal Action Localization | NTU RGB+D 120 | GFLOPS per prediction | 18.69 | AGCN (1-stream) |
| Temporal Action Localization | NTU RGB+D 120 | Accuracy (Cross-Setup) | 81.6 | CoST-GCN* (1-stream) |
| Temporal Action Localization | NTU RGB+D 120 | Accuracy (Cross-Subject) | 79.4 | CoST-GCN* (1-stream) |
| Temporal Action Localization | NTU RGB+D 120 | GFLOPS per prediction | 0.16 | CoST-GCN* (1-stream) |
| Temporal Action Localization | NTU RGB+D 120 | Accuracy (Cross-Subject) | 79 | ST-GCN (1-stream) |
| Temporal Action Localization | NTU RGB+D 120 | GFLOPS per prediction | 16.73 | ST-GCN (1-stream) |
| Temporal Action Localization | NTU RGB+D 120 | Accuracy (Cross-Setup) | 79.1 | CoAGCN* (1-stream) |
| Temporal Action Localization | NTU RGB+D 120 | Accuracy (Cross-Subject) | 77.3 | CoAGCN* (1-stream) |
| Temporal Action Localization | NTU RGB+D 120 | GFLOPS per prediction | 0.22 | CoAGCN* (1-stream) |
| Temporal Action Localization | Kinetics-Skeleton dataset | Accuracy | 36.9 | AGCN (2-stream) |
| Temporal Action Localization | Kinetics-Skeleton dataset | GFLOPS per prediction | 26.91 | AGCN (2-stream) |
| Temporal Action Localization | Kinetics-Skeleton dataset | Accuracy | 35 | AGCN (1-stream) |
| Temporal Action Localization | Kinetics-Skeleton dataset | GFLOPS per prediction | 13.45 | AGCN (1-stream) |
| Temporal Action Localization | Kinetics-Skeleton dataset | Accuracy | 34.7 | S-TR (2-stream) |
| Temporal Action Localization | Kinetics-Skeleton dataset | GFLOPS per prediction | 23.24 | S-TR (2-stream) |
| Temporal Action Localization | Kinetics-Skeleton dataset | Accuracy | 34.4 | ST-GCN (2-stream) |
| Temporal Action Localization | Kinetics-Skeleton dataset | GFLOPS per prediction | 24.09 | ST-GCN (2-stream) |
| Temporal Action Localization | Kinetics-Skeleton dataset | Accuracy | 33.4 | ST-GCN (1-stream) |
| Temporal Action Localization | Kinetics-Skeleton dataset | GFLOPS per prediction | 12.04 | ST-GCN (1-stream) |
| Temporal Action Localization | Kinetics-Skeleton dataset | Accuracy | 33.1 | CoST-GCN (2-stream) |
| Temporal Action Localization | Kinetics-Skeleton dataset | GFLOPS per prediction | 0.32 | CoST-GCN (2-stream) |
| Temporal Action Localization | Kinetics-Skeleton dataset | Accuracy | 33 | CoAGCN (1-stream) |
| Temporal Action Localization | Kinetics-Skeleton dataset | GFLOPS per prediction | 0.18 | CoAGCN (1-stream) |
| Temporal Action Localization | Kinetics-Skeleton dataset | Accuracy | 32.7 | CoS-TR (2-stream) |
| Temporal Action Localization | Kinetics-Skeleton dataset | GFLOPS per prediction | 0.31 | CoS-TR (2-stream) |
| Temporal Action Localization | Kinetics-Skeleton dataset | Accuracy | 32.2 | CoST-GCN* (2-stream) |
| Temporal Action Localization | Kinetics-Skeleton dataset | GFLOPS per prediction | 0.22 | CoST-GCN* (2-stream) |
| Temporal Action Localization | Kinetics-Skeleton dataset | Accuracy | 32 | S-TR (1-stream) |
| Temporal Action Localization | Kinetics-Skeleton dataset | GFLOPS per prediction | 11.62 | S-TR (1-stream) |
| Temporal Action Localization | Kinetics-Skeleton dataset | Accuracy | 31.8 | CoST-GCN (1-stream) |
| Temporal Action Localization | Kinetics-Skeleton dataset | GFLOPS per prediction | 0.16 | CoST-GCN (1-stream) |
| Temporal Action Localization | Kinetics-Skeleton dataset | Accuracy | 30.2 | CoST-GCN* (1-stream) |
| Temporal Action Localization | Kinetics-Skeleton dataset | GFLOPS per prediction | 0.11 | CoST-GCN* (1-stream) |
| Temporal Action Localization | Kinetics-Skeleton dataset | Accuracy | 29.9 | CoS-TR* (2-stream) |
| Temporal Action Localization | Kinetics-Skeleton dataset | GFLOPS per prediction | 0.22 | CoS-TR* (2-stream) |
| Temporal Action Localization | Kinetics-Skeleton dataset | Accuracy | 29.7 | CoS-TR (1-stream) |
| Temporal Action Localization | Kinetics-Skeleton dataset | Accuracy | 27.5 | CoAGCN* (2-stream) |
| Temporal Action Localization | Kinetics-Skeleton dataset | GFLOPS per prediction | 0.25 | CoAGCN* (2-stream) |
| Temporal Action Localization | Kinetics-Skeleton dataset | Accuracy | 27.4 | CoS-TR* (1-stream) |
| Temporal Action Localization | Kinetics-Skeleton dataset | GFLOPS per prediction | 0.11 | CoS-TR* (1-stream) |
| Temporal Action Localization | Kinetics-Skeleton dataset | Accuracy | 23.3 | CoAGCN* (1-stream) |
| Temporal Action Localization | Kinetics-Skeleton dataset | GFLOPS per prediction | 0.12 | CoAGCN* (1-stream) |
| Temporal Action Localization | Kinetics-Skeleton dataset | GFLOPS per prediction | 0.36 | CoAGCN (2-stream) |
| Temporal Action Localization | NTU RGB+D | Accuracy (CS) | 88.9 | CoS-TR* (2-stream) |
| Temporal Action Localization | NTU RGB+D | Accuracy (CV) | 94.8 | CoS-TR* (2-stream) |
| Temporal Action Localization | NTU RGB+D | GFLOPs per pred | 0.3 | CoS-TR* (2-stream) |
| Temporal Action Localization | NTU RGB+D | Accuracy (CS) | 88.3 | CoST-GCN* (2-stream) |
| Temporal Action Localization | NTU RGB+D | Accuracy (CV) | 95 | CoST-GCN* (2-stream) |
| Temporal Action Localization | NTU RGB+D | GFLOPs per pred | 0.32 | CoST-GCN* (2-stream) |
| Temporal Action Localization | NTU RGB+D | Accuracy (CS) | 86.3 | CoST-GCN* |
| Temporal Action Localization | NTU RGB+D | Accuracy (CV) | 93.8 | CoST-GCN* |
| Temporal Action Localization | NTU RGB+D | GFLOPs per pred | 0.16 | CoST-GCN* |
| Temporal Action Localization | NTU RGB+D | Accuracy (CS) | 86.3 | CoS-TR* |
| Temporal Action Localization | NTU RGB+D | Accuracy (CV) | 92.4 | CoS-TR* |
| Temporal Action Localization | NTU RGB+D | GFLOPs per pred | 0.15 | CoS-TR* |
| Temporal Action Localization | NTU RGB+D | Accuracy (CS) | 86 | ST-GCN |
| Temporal Action Localization | NTU RGB+D | Accuracy (CV) | 93.4 | ST-GCN |
| Temporal Action Localization | NTU RGB+D | GFLOPs per pred | 16.73 | ST-GCN |
| Temporal Action Localization | NTU RGB+D | Accuracy (CS) | 86 | CoAGCN* (2-stream) |
| Temporal Action Localization | NTU RGB+D | Accuracy (CV) | 93.1 | CoAGCN* (2-stream) |
| Temporal Action Localization | NTU RGB+D | GFLOPs per pred | 0.44 | CoAGCN* (2-stream) |
| Temporal Action Localization | NTU RGB+D | Accuracy (CS) | 84.1 | CoAGCN* |
| Temporal Action Localization | NTU RGB+D | Accuracy (CV) | 92.6 | CoAGCN* |
| Zero-Shot Learning | NTU RGB+D 120 | Accuracy (Cross-Setup) | 86.2 | S-TR (2-stream) |
| Zero-Shot Learning | NTU RGB+D 120 | Accuracy (Cross-Subject) | 84.8 | S-TR (2-stream) |
| Zero-Shot Learning | NTU RGB+D 120 | GFLOPS per prediction | 32.4 | S-TR (2-stream) |
| Zero-Shot Learning | NTU RGB+D 120 | Accuracy (Cross-Setup) | 86.1 | CoS-TR* (2-stream) |
| Zero-Shot Learning | NTU RGB+D 120 | Accuracy (Cross-Subject) | 84.8 | CoS-TR* (2-stream) |
| Zero-Shot Learning | NTU RGB+D 120 | GFLOPS per prediction | 0.3 | CoS-TR* (2-stream) |
| Zero-Shot Learning | NTU RGB+D 120 | Accuracy (Cross-Setup) | 85.5 | CoST-GCN* (2-stream) |
| Zero-Shot Learning | NTU RGB+D 120 | Accuracy (Cross-Subject) | 84 | CoST-GCN* (2-stream) |
| Zero-Shot Learning | NTU RGB+D 120 | GFLOPS per prediction | 0.32 | CoST-GCN* (2-stream) |
| Zero-Shot Learning | NTU RGB+D 120 | Accuracy (Cross-Setup) | 85.4 | AGCN (2-stream) |
| Zero-Shot Learning | NTU RGB+D 120 | Accuracy (Cross-Subject) | 84 | AGCN (2-stream) |
| Zero-Shot Learning | NTU RGB+D 120 | GFLOPS per prediction | 37.38 | AGCN (2-stream) |
| Zero-Shot Learning | NTU RGB+D 120 | Accuracy (Cross-Setup) | 85.1 | ST-GCN (2-stream) |
| Zero-Shot Learning | NTU RGB+D 120 | Accuracy (Cross-Subject) | 83.7 | ST-GCN (2-stream) |
| Zero-Shot Learning | NTU RGB+D 120 | GFLOPS per prediction | 33.46 | ST-GCN (2-stream) |
| Zero-Shot Learning | NTU RGB+D 120 | Accuracy (Cross-Setup) | 82 | CoAGCN* (2-stream) |
| Zero-Shot Learning | NTU RGB+D 120 | Accuracy (Cross-Subject) | 80.4 | CoAGCN* (2-stream) |
| Zero-Shot Learning | NTU RGB+D 120 | GFLOPS per prediction | 0.44 | CoAGCN* (2-stream) |
| Zero-Shot Learning | NTU RGB+D 120 | Accuracy (Cross-Setup) | 81.8 | S-TR (1-stream) |
| Zero-Shot Learning | NTU RGB+D 120 | Accuracy (Cross-Subject) | 80.2 | S-TR (1-stream) |
| Zero-Shot Learning | NTU RGB+D 120 | GFLOPS per prediction | 16.2 | S-TR (1-stream) |
| Zero-Shot Learning | NTU RGB+D 120 | Accuracy (Cross-Setup) | 81.7 | CoS-TR* (1-stream) |
| Zero-Shot Learning | NTU RGB+D 120 | Accuracy (Cross-Subject) | 79.7 | CoS-TR* (1-stream) |
| Zero-Shot Learning | NTU RGB+D 120 | GFLOPS per prediction | 0.15 | CoS-TR* (1-stream) |
| Zero-Shot Learning | NTU RGB+D 120 | Accuracy (Cross-Setup) | 80.7 | AGCN (1-stream) |
| Zero-Shot Learning | NTU RGB+D 120 | Accuracy (Cross-Subject) | 79.7 | AGCN (1-stream) |
| Zero-Shot Learning | NTU RGB+D 120 | GFLOPS per prediction | 18.69 | AGCN (1-stream) |
| Zero-Shot Learning | NTU RGB+D 120 | Accuracy (Cross-Setup) | 81.6 | CoST-GCN* (1-stream) |
| Zero-Shot Learning | NTU RGB+D 120 | Accuracy (Cross-Subject) | 79.4 | CoST-GCN* (1-stream) |
| Zero-Shot Learning | NTU RGB+D 120 | GFLOPS per prediction | 0.16 | CoST-GCN* (1-stream) |
| Zero-Shot Learning | NTU RGB+D 120 | Accuracy (Cross-Subject) | 79 | ST-GCN (1-stream) |
| Zero-Shot Learning | NTU RGB+D 120 | GFLOPS per prediction | 16.73 | ST-GCN (1-stream) |
| Zero-Shot Learning | NTU RGB+D 120 | Accuracy (Cross-Setup) | 79.1 | CoAGCN* (1-stream) |
| Zero-Shot Learning | NTU RGB+D 120 | Accuracy (Cross-Subject) | 77.3 | CoAGCN* (1-stream) |
| Zero-Shot Learning | NTU RGB+D 120 | GFLOPS per prediction | 0.22 | CoAGCN* (1-stream) |
| Zero-Shot Learning | Kinetics-Skeleton dataset | Accuracy | 36.9 | AGCN (2-stream) |
| Zero-Shot Learning | Kinetics-Skeleton dataset | GFLOPS per prediction | 26.91 | AGCN (2-stream) |
| Zero-Shot Learning | Kinetics-Skeleton dataset | Accuracy | 35 | AGCN (1-stream) |
| Zero-Shot Learning | Kinetics-Skeleton dataset | GFLOPS per prediction | 13.45 | AGCN (1-stream) |
| Zero-Shot Learning | Kinetics-Skeleton dataset | Accuracy | 34.7 | S-TR (2-stream) |
| Zero-Shot Learning | Kinetics-Skeleton dataset | GFLOPS per prediction | 23.24 | S-TR (2-stream) |
| Zero-Shot Learning | Kinetics-Skeleton dataset | Accuracy | 34.4 | ST-GCN (2-stream) |
| Zero-Shot Learning | Kinetics-Skeleton dataset | GFLOPS per prediction | 24.09 | ST-GCN (2-stream) |
| Zero-Shot Learning | Kinetics-Skeleton dataset | Accuracy | 33.4 | ST-GCN (1-stream) |
| Zero-Shot Learning | Kinetics-Skeleton dataset | GFLOPS per prediction | 12.04 | ST-GCN (1-stream) |
| Zero-Shot Learning | Kinetics-Skeleton dataset | Accuracy | 33.1 | CoST-GCN (2-stream) |
| Zero-Shot Learning | Kinetics-Skeleton dataset | GFLOPS per prediction | 0.32 | CoST-GCN (2-stream) |
| Zero-Shot Learning | Kinetics-Skeleton dataset | Accuracy | 33 | CoAGCN (1-stream) |
| Zero-Shot Learning | Kinetics-Skeleton dataset | GFLOPS per prediction | 0.18 | CoAGCN (1-stream) |
| Zero-Shot Learning | Kinetics-Skeleton dataset | Accuracy | 32.7 | CoS-TR (2-stream) |
| Zero-Shot Learning | Kinetics-Skeleton dataset | GFLOPS per prediction | 0.31 | CoS-TR (2-stream) |
| Zero-Shot Learning | Kinetics-Skeleton dataset | Accuracy | 32.2 | CoST-GCN* (2-stream) |
| Zero-Shot Learning | Kinetics-Skeleton dataset | GFLOPS per prediction | 0.22 | CoST-GCN* (2-stream) |
| Zero-Shot Learning | Kinetics-Skeleton dataset | Accuracy | 32 | S-TR (1-stream) |
| Zero-Shot Learning | Kinetics-Skeleton dataset | GFLOPS per prediction | 11.62 | S-TR (1-stream) |
| Zero-Shot Learning | Kinetics-Skeleton dataset | Accuracy | 31.8 | CoST-GCN (1-stream) |
| Zero-Shot Learning | Kinetics-Skeleton dataset | GFLOPS per prediction | 0.16 | CoST-GCN (1-stream) |
| Zero-Shot Learning | Kinetics-Skeleton dataset | Accuracy | 30.2 | CoST-GCN* (1-stream) |
| Zero-Shot Learning | Kinetics-Skeleton dataset | GFLOPS per prediction | 0.11 | CoST-GCN* (1-stream) |
| Zero-Shot Learning | Kinetics-Skeleton dataset | Accuracy | 29.9 | CoS-TR* (2-stream) |
| Zero-Shot Learning | Kinetics-Skeleton dataset | GFLOPS per prediction | 0.22 | CoS-TR* (2-stream) |
| Zero-Shot Learning | Kinetics-Skeleton dataset | Accuracy | 29.7 | CoS-TR (1-stream) |
| Zero-Shot Learning | Kinetics-Skeleton dataset | Accuracy | 27.5 | CoAGCN* (2-stream) |
| Zero-Shot Learning | Kinetics-Skeleton dataset | GFLOPS per prediction | 0.25 | CoAGCN* (2-stream) |
| Zero-Shot Learning | Kinetics-Skeleton dataset | Accuracy | 27.4 | CoS-TR* (1-stream) |
| Zero-Shot Learning | Kinetics-Skeleton dataset | GFLOPS per prediction | 0.11 | CoS-TR* (1-stream) |
| Zero-Shot Learning | Kinetics-Skeleton dataset | Accuracy | 23.3 | CoAGCN* (1-stream) |
| Zero-Shot Learning | Kinetics-Skeleton dataset | GFLOPS per prediction | 0.12 | CoAGCN* (1-stream) |
| Zero-Shot Learning | Kinetics-Skeleton dataset | GFLOPS per prediction | 0.36 | CoAGCN (2-stream) |
| Zero-Shot Learning | NTU RGB+D | Accuracy (CS) | 88.9 | CoS-TR* (2-stream) |
| Zero-Shot Learning | NTU RGB+D | Accuracy (CV) | 94.8 | CoS-TR* (2-stream) |
| Zero-Shot Learning | NTU RGB+D | GFLOPs per pred | 0.3 | CoS-TR* (2-stream) |
| Zero-Shot Learning | NTU RGB+D | Accuracy (CS) | 88.3 | CoST-GCN* (2-stream) |
| Zero-Shot Learning | NTU RGB+D | Accuracy (CV) | 95 | CoST-GCN* (2-stream) |
| Zero-Shot Learning | NTU RGB+D | GFLOPs per pred | 0.32 | CoST-GCN* (2-stream) |
| Zero-Shot Learning | NTU RGB+D | Accuracy (CS) | 86.3 | CoST-GCN* |
| Zero-Shot Learning | NTU RGB+D | Accuracy (CV) | 93.8 | CoST-GCN* |
| Zero-Shot Learning | NTU RGB+D | GFLOPs per pred | 0.16 | CoST-GCN* |
| Zero-Shot Learning | NTU RGB+D | Accuracy (CS) | 86.3 | CoS-TR* |
| Zero-Shot Learning | NTU RGB+D | Accuracy (CV) | 92.4 | CoS-TR* |
| Zero-Shot Learning | NTU RGB+D | GFLOPs per pred | 0.15 | CoS-TR* |
| Zero-Shot Learning | NTU RGB+D | Accuracy (CS) | 86 | ST-GCN |
| Zero-Shot Learning | NTU RGB+D | Accuracy (CV) | 93.4 | ST-GCN |
| Zero-Shot Learning | NTU RGB+D | GFLOPs per pred | 16.73 | ST-GCN |
| Zero-Shot Learning | NTU RGB+D | Accuracy (CS) | 86 | CoAGCN* (2-stream) |
| Zero-Shot Learning | NTU RGB+D | Accuracy (CV) | 93.1 | CoAGCN* (2-stream) |
| Zero-Shot Learning | NTU RGB+D | GFLOPs per pred | 0.44 | CoAGCN* (2-stream) |
| Zero-Shot Learning | NTU RGB+D | Accuracy (CS) | 84.1 | CoAGCN* |
| Zero-Shot Learning | NTU RGB+D | Accuracy (CV) | 92.6 | CoAGCN* |
| Activity Recognition | NTU RGB+D 120 | Accuracy (Cross-Setup) | 86.2 | S-TR (2-stream) |
| Activity Recognition | NTU RGB+D 120 | Accuracy (Cross-Subject) | 84.8 | S-TR (2-stream) |
| Activity Recognition | NTU RGB+D 120 | GFLOPS per prediction | 32.4 | S-TR (2-stream) |
| Activity Recognition | NTU RGB+D 120 | Accuracy (Cross-Setup) | 86.1 | CoS-TR* (2-stream) |
| Activity Recognition | NTU RGB+D 120 | Accuracy (Cross-Subject) | 84.8 | CoS-TR* (2-stream) |
| Activity Recognition | NTU RGB+D 120 | GFLOPS per prediction | 0.3 | CoS-TR* (2-stream) |
| Activity Recognition | NTU RGB+D 120 | Accuracy (Cross-Setup) | 85.5 | CoST-GCN* (2-stream) |
| Activity Recognition | NTU RGB+D 120 | Accuracy (Cross-Subject) | 84 | CoST-GCN* (2-stream) |
| Activity Recognition | NTU RGB+D 120 | GFLOPS per prediction | 0.32 | CoST-GCN* (2-stream) |
| Activity Recognition | NTU RGB+D 120 | Accuracy (Cross-Setup) | 85.4 | AGCN (2-stream) |
| Activity Recognition | NTU RGB+D 120 | Accuracy (Cross-Subject) | 84 | AGCN (2-stream) |
| Activity Recognition | NTU RGB+D 120 | GFLOPS per prediction | 37.38 | AGCN (2-stream) |
| Activity Recognition | NTU RGB+D 120 | Accuracy (Cross-Setup) | 85.1 | ST-GCN (2-stream) |
| Activity Recognition | NTU RGB+D 120 | Accuracy (Cross-Subject) | 83.7 | ST-GCN (2-stream) |
| Activity Recognition | NTU RGB+D 120 | GFLOPS per prediction | 33.46 | ST-GCN (2-stream) |
| Activity Recognition | NTU RGB+D 120 | Accuracy (Cross-Setup) | 82 | CoAGCN* (2-stream) |
| Activity Recognition | NTU RGB+D 120 | Accuracy (Cross-Subject) | 80.4 | CoAGCN* (2-stream) |
| Activity Recognition | NTU RGB+D 120 | GFLOPS per prediction | 0.44 | CoAGCN* (2-stream) |
| Activity Recognition | NTU RGB+D 120 | Accuracy (Cross-Setup) | 81.8 | S-TR (1-stream) |
| Activity Recognition | NTU RGB+D 120 | Accuracy (Cross-Subject) | 80.2 | S-TR (1-stream) |
| Activity Recognition | NTU RGB+D 120 | GFLOPS per prediction | 16.2 | S-TR (1-stream) |
| Activity Recognition | NTU RGB+D 120 | Accuracy (Cross-Setup) | 81.7 | CoS-TR* (1-stream) |
| Activity Recognition | NTU RGB+D 120 | Accuracy (Cross-Subject) | 79.7 | CoS-TR* (1-stream) |
| Activity Recognition | NTU RGB+D 120 | GFLOPS per prediction | 0.15 | CoS-TR* (1-stream) |
| Activity Recognition | NTU RGB+D 120 | Accuracy (Cross-Setup) | 80.7 | AGCN (1-stream) |
| Activity Recognition | NTU RGB+D 120 | Accuracy (Cross-Subject) | 79.7 | AGCN (1-stream) |
| Activity Recognition | NTU RGB+D 120 | GFLOPS per prediction | 18.69 | AGCN (1-stream) |
| Activity Recognition | NTU RGB+D 120 | Accuracy (Cross-Setup) | 81.6 | CoST-GCN* (1-stream) |
| Activity Recognition | NTU RGB+D 120 | Accuracy (Cross-Subject) | 79.4 | CoST-GCN* (1-stream) |
| Activity Recognition | NTU RGB+D 120 | GFLOPS per prediction | 0.16 | CoST-GCN* (1-stream) |
| Activity Recognition | NTU RGB+D 120 | Accuracy (Cross-Subject) | 79 | ST-GCN (1-stream) |
| Activity Recognition | NTU RGB+D 120 | GFLOPS per prediction | 16.73 | ST-GCN (1-stream) |
| Activity Recognition | NTU RGB+D 120 | Accuracy (Cross-Setup) | 79.1 | CoAGCN* (1-stream) |
| Activity Recognition | NTU RGB+D 120 | Accuracy (Cross-Subject) | 77.3 | CoAGCN* (1-stream) |
| Activity Recognition | NTU RGB+D 120 | GFLOPS per prediction | 0.22 | CoAGCN* (1-stream) |
| Activity Recognition | Kinetics-Skeleton dataset | Accuracy | 36.9 | AGCN (2-stream) |
| Activity Recognition | Kinetics-Skeleton dataset | GFLOPS per prediction | 26.91 | AGCN (2-stream) |
| Activity Recognition | Kinetics-Skeleton dataset | Accuracy | 35 | AGCN (1-stream) |
| Activity Recognition | Kinetics-Skeleton dataset | GFLOPS per prediction | 13.45 | AGCN (1-stream) |
| Activity Recognition | Kinetics-Skeleton dataset | Accuracy | 34.7 | S-TR (2-stream) |
| Activity Recognition | Kinetics-Skeleton dataset | GFLOPS per prediction | 23.24 | S-TR (2-stream) |
| Activity Recognition | Kinetics-Skeleton dataset | Accuracy | 34.4 | ST-GCN (2-stream) |
| Activity Recognition | Kinetics-Skeleton dataset | GFLOPS per prediction | 24.09 | ST-GCN (2-stream) |
| Activity Recognition | Kinetics-Skeleton dataset | Accuracy | 33.4 | ST-GCN (1-stream) |
| Activity Recognition | Kinetics-Skeleton dataset | GFLOPS per prediction | 12.04 | ST-GCN (1-stream) |
| Activity Recognition | Kinetics-Skeleton dataset | Accuracy | 33.1 | CoST-GCN (2-stream) |
| Activity Recognition | Kinetics-Skeleton dataset | GFLOPS per prediction | 0.32 | CoST-GCN (2-stream) |
| Activity Recognition | Kinetics-Skeleton dataset | Accuracy | 33 | CoAGCN (1-stream) |
| Activity Recognition | Kinetics-Skeleton dataset | GFLOPS per prediction | 0.18 | CoAGCN (1-stream) |
| Activity Recognition | Kinetics-Skeleton dataset | Accuracy | 32.7 | CoS-TR (2-stream) |
| Activity Recognition | Kinetics-Skeleton dataset | GFLOPS per prediction | 0.31 | CoS-TR (2-stream) |
| Activity Recognition | Kinetics-Skeleton dataset | Accuracy | 32.2 | CoST-GCN* (2-stream) |
| Activity Recognition | Kinetics-Skeleton dataset | GFLOPS per prediction | 0.22 | CoST-GCN* (2-stream) |
| Activity Recognition | Kinetics-Skeleton dataset | Accuracy | 32 | S-TR (1-stream) |
| Activity Recognition | Kinetics-Skeleton dataset | GFLOPS per prediction | 11.62 | S-TR (1-stream) |
| Activity Recognition | Kinetics-Skeleton dataset | Accuracy | 31.8 | CoST-GCN (1-stream) |
| Activity Recognition | Kinetics-Skeleton dataset | GFLOPS per prediction | 0.16 | CoST-GCN (1-stream) |
| Activity Recognition | Kinetics-Skeleton dataset | Accuracy | 30.2 | CoST-GCN* (1-stream) |
| Activity Recognition | Kinetics-Skeleton dataset | GFLOPS per prediction | 0.11 | CoST-GCN* (1-stream) |
| Activity Recognition | Kinetics-Skeleton dataset | Accuracy | 29.9 | CoS-TR* (2-stream) |
| Activity Recognition | Kinetics-Skeleton dataset | GFLOPS per prediction | 0.22 | CoS-TR* (2-stream) |
| Activity Recognition | Kinetics-Skeleton dataset | Accuracy | 29.7 | CoS-TR (1-stream) |
| Activity Recognition | Kinetics-Skeleton dataset | Accuracy | 27.5 | CoAGCN* (2-stream) |
| Activity Recognition | Kinetics-Skeleton dataset | GFLOPS per prediction | 0.25 | CoAGCN* (2-stream) |
| Activity Recognition | Kinetics-Skeleton dataset | Accuracy | 27.4 | CoS-TR* (1-stream) |
| Activity Recognition | Kinetics-Skeleton dataset | GFLOPS per prediction | 0.11 | CoS-TR* (1-stream) |
| Activity Recognition | Kinetics-Skeleton dataset | Accuracy | 23.3 | CoAGCN* (1-stream) |
| Activity Recognition | Kinetics-Skeleton dataset | GFLOPS per prediction | 0.12 | CoAGCN* (1-stream) |
| Activity Recognition | Kinetics-Skeleton dataset | GFLOPS per prediction | 0.36 | CoAGCN (2-stream) |
| Activity Recognition | NTU RGB+D | Accuracy (CS) | 88.9 | CoS-TR* (2-stream) |
| Activity Recognition | NTU RGB+D | Accuracy (CV) | 94.8 | CoS-TR* (2-stream) |
| Activity Recognition | NTU RGB+D | GFLOPs per pred | 0.3 | CoS-TR* (2-stream) |
| Activity Recognition | NTU RGB+D | Accuracy (CS) | 88.3 | CoST-GCN* (2-stream) |
| Activity Recognition | NTU RGB+D | Accuracy (CV) | 95 | CoST-GCN* (2-stream) |
| Activity Recognition | NTU RGB+D | GFLOPs per pred | 0.32 | CoST-GCN* (2-stream) |
| Activity Recognition | NTU RGB+D | Accuracy (CS) | 86.3 | CoST-GCN* |
| Activity Recognition | NTU RGB+D | Accuracy (CV) | 93.8 | CoST-GCN* |
| Activity Recognition | NTU RGB+D | GFLOPs per pred | 0.16 | CoST-GCN* |
| Activity Recognition | NTU RGB+D | Accuracy (CS) | 86.3 | CoS-TR* |
| Activity Recognition | NTU RGB+D | Accuracy (CV) | 92.4 | CoS-TR* |
| Activity Recognition | NTU RGB+D | GFLOPs per pred | 0.15 | CoS-TR* |
| Activity Recognition | NTU RGB+D | Accuracy (CS) | 86 | ST-GCN |
| Activity Recognition | NTU RGB+D | Accuracy (CV) | 93.4 | ST-GCN |
| Activity Recognition | NTU RGB+D | GFLOPs per pred | 16.73 | ST-GCN |
| Activity Recognition | NTU RGB+D | Accuracy (CS) | 86 | CoAGCN* (2-stream) |
| Activity Recognition | NTU RGB+D | Accuracy (CV) | 93.1 | CoAGCN* (2-stream) |
| Activity Recognition | NTU RGB+D | GFLOPs per pred | 0.44 | CoAGCN* (2-stream) |
| Activity Recognition | NTU RGB+D | Accuracy (CS) | 84.1 | CoAGCN* |
| Activity Recognition | NTU RGB+D | Accuracy (CV) | 92.6 | CoAGCN* |
| Action Localization | NTU RGB+D 120 | Accuracy (Cross-Setup) | 86.2 | S-TR (2-stream) |
| Action Localization | NTU RGB+D 120 | Accuracy (Cross-Subject) | 84.8 | S-TR (2-stream) |
| Action Localization | NTU RGB+D 120 | GFLOPS per prediction | 32.4 | S-TR (2-stream) |
| Action Localization | NTU RGB+D 120 | Accuracy (Cross-Setup) | 86.1 | CoS-TR* (2-stream) |
| Action Localization | NTU RGB+D 120 | Accuracy (Cross-Subject) | 84.8 | CoS-TR* (2-stream) |
| Action Localization | NTU RGB+D 120 | GFLOPS per prediction | 0.3 | CoS-TR* (2-stream) |
| Action Localization | NTU RGB+D 120 | Accuracy (Cross-Setup) | 85.5 | CoST-GCN* (2-stream) |
| Action Localization | NTU RGB+D 120 | Accuracy (Cross-Subject) | 84 | CoST-GCN* (2-stream) |
| Action Localization | NTU RGB+D 120 | GFLOPS per prediction | 0.32 | CoST-GCN* (2-stream) |
| Action Localization | NTU RGB+D 120 | Accuracy (Cross-Setup) | 85.4 | AGCN (2-stream) |
| Action Localization | NTU RGB+D 120 | Accuracy (Cross-Subject) | 84 | AGCN (2-stream) |
| Action Localization | NTU RGB+D 120 | GFLOPS per prediction | 37.38 | AGCN (2-stream) |
| Action Localization | NTU RGB+D 120 | Accuracy (Cross-Setup) | 85.1 | ST-GCN (2-stream) |
| Action Localization | NTU RGB+D 120 | Accuracy (Cross-Subject) | 83.7 | ST-GCN (2-stream) |
| Action Localization | NTU RGB+D 120 | GFLOPS per prediction | 33.46 | ST-GCN (2-stream) |
| Action Localization | NTU RGB+D 120 | Accuracy (Cross-Setup) | 82 | CoAGCN* (2-stream) |
| Action Localization | NTU RGB+D 120 | Accuracy (Cross-Subject) | 80.4 | CoAGCN* (2-stream) |
| Action Localization | NTU RGB+D 120 | GFLOPS per prediction | 0.44 | CoAGCN* (2-stream) |
| Action Localization | NTU RGB+D 120 | Accuracy (Cross-Setup) | 81.8 | S-TR (1-stream) |
| Action Localization | NTU RGB+D 120 | Accuracy (Cross-Subject) | 80.2 | S-TR (1-stream) |
| Action Localization | NTU RGB+D 120 | GFLOPS per prediction | 16.2 | S-TR (1-stream) |
| Action Localization | NTU RGB+D 120 | Accuracy (Cross-Setup) | 81.7 | CoS-TR* (1-stream) |
| Action Localization | NTU RGB+D 120 | Accuracy (Cross-Subject) | 79.7 | CoS-TR* (1-stream) |
| Action Localization | NTU RGB+D 120 | GFLOPS per prediction | 0.15 | CoS-TR* (1-stream) |
| Action Localization | NTU RGB+D 120 | Accuracy (Cross-Setup) | 80.7 | AGCN (1-stream) |
| Action Localization | NTU RGB+D 120 | Accuracy (Cross-Subject) | 79.7 | AGCN (1-stream) |
| Action Localization | NTU RGB+D 120 | GFLOPS per prediction | 18.69 | AGCN (1-stream) |
| Action Localization | NTU RGB+D 120 | Accuracy (Cross-Setup) | 81.6 | CoST-GCN* (1-stream) |
| Action Localization | NTU RGB+D 120 | Accuracy (Cross-Subject) | 79.4 | CoST-GCN* (1-stream) |
| Action Localization | NTU RGB+D 120 | GFLOPS per prediction | 0.16 | CoST-GCN* (1-stream) |
| Action Localization | NTU RGB+D 120 | Accuracy (Cross-Subject) | 79 | ST-GCN (1-stream) |
| Action Localization | NTU RGB+D 120 | GFLOPS per prediction | 16.73 | ST-GCN (1-stream) |
| Action Localization | NTU RGB+D 120 | Accuracy (Cross-Setup) | 79.1 | CoAGCN* (1-stream) |
| Action Localization | NTU RGB+D 120 | Accuracy (Cross-Subject) | 77.3 | CoAGCN* (1-stream) |
| Action Localization | NTU RGB+D 120 | GFLOPS per prediction | 0.22 | CoAGCN* (1-stream) |
| Action Localization | Kinetics-Skeleton dataset | Accuracy | 36.9 | AGCN (2-stream) |
| Action Localization | Kinetics-Skeleton dataset | GFLOPS per prediction | 26.91 | AGCN (2-stream) |
| Action Localization | Kinetics-Skeleton dataset | Accuracy | 35 | AGCN (1-stream) |
| Action Localization | Kinetics-Skeleton dataset | GFLOPS per prediction | 13.45 | AGCN (1-stream) |
| Action Localization | Kinetics-Skeleton dataset | Accuracy | 34.7 | S-TR (2-stream) |
| Action Localization | Kinetics-Skeleton dataset | GFLOPS per prediction | 23.24 | S-TR (2-stream) |
| Action Localization | Kinetics-Skeleton dataset | Accuracy | 34.4 | ST-GCN (2-stream) |
| Action Localization | Kinetics-Skeleton dataset | GFLOPS per prediction | 24.09 | ST-GCN (2-stream) |
| Action Localization | Kinetics-Skeleton dataset | Accuracy | 33.4 | ST-GCN (1-stream) |
| Action Localization | Kinetics-Skeleton dataset | GFLOPS per prediction | 12.04 | ST-GCN (1-stream) |
| Action Localization | Kinetics-Skeleton dataset | Accuracy | 33.1 | CoST-GCN (2-stream) |
| Action Localization | Kinetics-Skeleton dataset | GFLOPS per prediction | 0.32 | CoST-GCN (2-stream) |
| Action Localization | Kinetics-Skeleton dataset | Accuracy | 33 | CoAGCN (1-stream) |
| Action Localization | Kinetics-Skeleton dataset | GFLOPS per prediction | 0.18 | CoAGCN (1-stream) |
| Action Localization | Kinetics-Skeleton dataset | Accuracy | 32.7 | CoS-TR (2-stream) |
| Action Localization | Kinetics-Skeleton dataset | GFLOPS per prediction | 0.31 | CoS-TR (2-stream) |
| Action Localization | Kinetics-Skeleton dataset | Accuracy | 32.2 | CoST-GCN* (2-stream) |
| Action Localization | Kinetics-Skeleton dataset | GFLOPS per prediction | 0.22 | CoST-GCN* (2-stream) |
| Action Localization | Kinetics-Skeleton dataset | Accuracy | 32 | S-TR (1-stream) |
| Action Localization | Kinetics-Skeleton dataset | GFLOPS per prediction | 11.62 | S-TR (1-stream) |
| Action Localization | Kinetics-Skeleton dataset | Accuracy | 31.8 | CoST-GCN (1-stream) |
| Action Localization | Kinetics-Skeleton dataset | GFLOPS per prediction | 0.16 | CoST-GCN (1-stream) |
| Action Localization | Kinetics-Skeleton dataset | Accuracy | 30.2 | CoST-GCN* (1-stream) |
| Action Localization | Kinetics-Skeleton dataset | GFLOPS per prediction | 0.11 | CoST-GCN* (1-stream) |
| Action Localization | Kinetics-Skeleton dataset | Accuracy | 29.9 | CoS-TR* (2-stream) |
| Action Localization | Kinetics-Skeleton dataset | GFLOPS per prediction | 0.22 | CoS-TR* (2-stream) |
| Action Localization | Kinetics-Skeleton dataset | Accuracy | 29.7 | CoS-TR (1-stream) |
| Action Localization | Kinetics-Skeleton dataset | Accuracy | 27.5 | CoAGCN* (2-stream) |
| Action Localization | Kinetics-Skeleton dataset | GFLOPS per prediction | 0.25 | CoAGCN* (2-stream) |
| Action Localization | Kinetics-Skeleton dataset | Accuracy | 27.4 | CoS-TR* (1-stream) |
| Action Localization | Kinetics-Skeleton dataset | GFLOPS per prediction | 0.11 | CoS-TR* (1-stream) |
| Action Localization | Kinetics-Skeleton dataset | Accuracy | 23.3 | CoAGCN* (1-stream) |
| Action Localization | Kinetics-Skeleton dataset | GFLOPS per prediction | 0.12 | CoAGCN* (1-stream) |
| Action Localization | Kinetics-Skeleton dataset | GFLOPS per prediction | 0.36 | CoAGCN (2-stream) |
| Action Localization | NTU RGB+D | Accuracy (CS) | 88.9 | CoS-TR* (2-stream) |
| Action Localization | NTU RGB+D | Accuracy (CV) | 94.8 | CoS-TR* (2-stream) |
| Action Localization | NTU RGB+D | GFLOPs per pred | 0.3 | CoS-TR* (2-stream) |
| Action Localization | NTU RGB+D | Accuracy (CS) | 88.3 | CoST-GCN* (2-stream) |
| Action Localization | NTU RGB+D | Accuracy (CV) | 95 | CoST-GCN* (2-stream) |
| Action Localization | NTU RGB+D | GFLOPs per pred | 0.32 | CoST-GCN* (2-stream) |
| Action Localization | NTU RGB+D | Accuracy (CS) | 86.3 | CoST-GCN* |
| Action Localization | NTU RGB+D | Accuracy (CV) | 93.8 | CoST-GCN* |
| Action Localization | NTU RGB+D | GFLOPs per pred | 0.16 | CoST-GCN* |
| Action Localization | NTU RGB+D | Accuracy (CS) | 86.3 | CoS-TR* |
| Action Localization | NTU RGB+D | Accuracy (CV) | 92.4 | CoS-TR* |
| Action Localization | NTU RGB+D | GFLOPs per pred | 0.15 | CoS-TR* |
| Action Localization | NTU RGB+D | Accuracy (CS) | 86 | ST-GCN |
| Action Localization | NTU RGB+D | Accuracy (CV) | 93.4 | ST-GCN |
| Action Localization | NTU RGB+D | GFLOPs per pred | 16.73 | ST-GCN |
| Action Localization | NTU RGB+D | Accuracy (CS) | 86 | CoAGCN* (2-stream) |
| Action Localization | NTU RGB+D | Accuracy (CV) | 93.1 | CoAGCN* (2-stream) |
| Action Localization | NTU RGB+D | GFLOPs per pred | 0.44 | CoAGCN* (2-stream) |
| Action Localization | NTU RGB+D | Accuracy (CS) | 84.1 | CoAGCN* |
| Action Localization | NTU RGB+D | Accuracy (CV) | 92.6 | CoAGCN* |
| Action Detection | NTU RGB+D 120 | Accuracy (Cross-Setup) | 86.2 | S-TR (2-stream) |
| Action Detection | NTU RGB+D 120 | Accuracy (Cross-Subject) | 84.8 | S-TR (2-stream) |
| Action Detection | NTU RGB+D 120 | GFLOPS per prediction | 32.4 | S-TR (2-stream) |
| Action Detection | NTU RGB+D 120 | Accuracy (Cross-Setup) | 86.1 | CoS-TR* (2-stream) |
| Action Detection | NTU RGB+D 120 | Accuracy (Cross-Subject) | 84.8 | CoS-TR* (2-stream) |
| Action Detection | NTU RGB+D 120 | GFLOPS per prediction | 0.3 | CoS-TR* (2-stream) |
| Action Detection | NTU RGB+D 120 | Accuracy (Cross-Setup) | 85.5 | CoST-GCN* (2-stream) |
| Action Detection | NTU RGB+D 120 | Accuracy (Cross-Subject) | 84 | CoST-GCN* (2-stream) |
| Action Detection | NTU RGB+D 120 | GFLOPS per prediction | 0.32 | CoST-GCN* (2-stream) |
| Action Detection | NTU RGB+D 120 | Accuracy (Cross-Setup) | 85.4 | AGCN (2-stream) |
| Action Detection | NTU RGB+D 120 | Accuracy (Cross-Subject) | 84 | AGCN (2-stream) |
| Action Detection | NTU RGB+D 120 | GFLOPS per prediction | 37.38 | AGCN (2-stream) |
| Action Detection | NTU RGB+D 120 | Accuracy (Cross-Setup) | 85.1 | ST-GCN (2-stream) |
| Action Detection | NTU RGB+D 120 | Accuracy (Cross-Subject) | 83.7 | ST-GCN (2-stream) |
| Action Detection | NTU RGB+D 120 | GFLOPS per prediction | 33.46 | ST-GCN (2-stream) |
| Action Detection | NTU RGB+D 120 | Accuracy (Cross-Setup) | 82 | CoAGCN* (2-stream) |
| Action Detection | NTU RGB+D 120 | Accuracy (Cross-Subject) | 80.4 | CoAGCN* (2-stream) |
| Action Detection | NTU RGB+D 120 | GFLOPS per prediction | 0.44 | CoAGCN* (2-stream) |
| Action Detection | NTU RGB+D 120 | Accuracy (Cross-Setup) | 81.8 | S-TR (1-stream) |
| Action Detection | NTU RGB+D 120 | Accuracy (Cross-Subject) | 80.2 | S-TR (1-stream) |
| Action Detection | NTU RGB+D 120 | GFLOPS per prediction | 16.2 | S-TR (1-stream) |
| Action Detection | NTU RGB+D 120 | Accuracy (Cross-Setup) | 81.7 | CoS-TR* (1-stream) |
| Action Detection | NTU RGB+D 120 | Accuracy (Cross-Subject) | 79.7 | CoS-TR* (1-stream) |
| Action Detection | NTU RGB+D 120 | GFLOPS per prediction | 0.15 | CoS-TR* (1-stream) |
| Action Detection | NTU RGB+D 120 | Accuracy (Cross-Setup) | 80.7 | AGCN (1-stream) |
| Action Detection | NTU RGB+D 120 | Accuracy (Cross-Subject) | 79.7 | AGCN (1-stream) |
| Action Detection | NTU RGB+D 120 | GFLOPS per prediction | 18.69 | AGCN (1-stream) |
| Action Detection | NTU RGB+D 120 | Accuracy (Cross-Setup) | 81.6 | CoST-GCN* (1-stream) |
| Action Detection | NTU RGB+D 120 | Accuracy (Cross-Subject) | 79.4 | CoST-GCN* (1-stream) |
| Action Detection | NTU RGB+D 120 | GFLOPS per prediction | 0.16 | CoST-GCN* (1-stream) |
| Action Detection | NTU RGB+D 120 | Accuracy (Cross-Subject) | 79 | ST-GCN (1-stream) |
| Action Detection | NTU RGB+D 120 | GFLOPS per prediction | 16.73 | ST-GCN (1-stream) |
| Action Detection | NTU RGB+D 120 | Accuracy (Cross-Setup) | 79.1 | CoAGCN* (1-stream) |
| Action Detection | NTU RGB+D 120 | Accuracy (Cross-Subject) | 77.3 | CoAGCN* (1-stream) |
| Action Detection | NTU RGB+D 120 | GFLOPS per prediction | 0.22 | CoAGCN* (1-stream) |
| Action Detection | Kinetics-Skeleton dataset | Accuracy | 36.9 | AGCN (2-stream) |
| Action Detection | Kinetics-Skeleton dataset | GFLOPS per prediction | 26.91 | AGCN (2-stream) |
| Action Detection | Kinetics-Skeleton dataset | Accuracy | 35 | AGCN (1-stream) |
| Action Detection | Kinetics-Skeleton dataset | GFLOPS per prediction | 13.45 | AGCN (1-stream) |
| Action Detection | Kinetics-Skeleton dataset | Accuracy | 34.7 | S-TR (2-stream) |
| Action Detection | Kinetics-Skeleton dataset | GFLOPS per prediction | 23.24 | S-TR (2-stream) |
| Action Detection | Kinetics-Skeleton dataset | Accuracy | 34.4 | ST-GCN (2-stream) |
| Action Detection | Kinetics-Skeleton dataset | GFLOPS per prediction | 24.09 | ST-GCN (2-stream) |
| Action Detection | Kinetics-Skeleton dataset | Accuracy | 33.4 | ST-GCN (1-stream) |
| Action Detection | Kinetics-Skeleton dataset | GFLOPS per prediction | 12.04 | ST-GCN (1-stream) |
| Action Detection | Kinetics-Skeleton dataset | Accuracy | 33.1 | CoST-GCN (2-stream) |
| Action Detection | Kinetics-Skeleton dataset | GFLOPS per prediction | 0.32 | CoST-GCN (2-stream) |
| Action Detection | Kinetics-Skeleton dataset | Accuracy | 33 | CoAGCN (1-stream) |
| Action Detection | Kinetics-Skeleton dataset | GFLOPS per prediction | 0.18 | CoAGCN (1-stream) |
| Action Detection | Kinetics-Skeleton dataset | Accuracy | 32.7 | CoS-TR (2-stream) |
| Action Detection | Kinetics-Skeleton dataset | GFLOPS per prediction | 0.31 | CoS-TR (2-stream) |
| Action Detection | Kinetics-Skeleton dataset | Accuracy | 32.2 | CoST-GCN* (2-stream) |
| Action Detection | Kinetics-Skeleton dataset | GFLOPS per prediction | 0.22 | CoST-GCN* (2-stream) |
| Action Detection | Kinetics-Skeleton dataset | Accuracy | 32 | S-TR (1-stream) |
| Action Detection | Kinetics-Skeleton dataset | GFLOPS per prediction | 11.62 | S-TR (1-stream) |
| Action Detection | Kinetics-Skeleton dataset | Accuracy | 31.8 | CoST-GCN (1-stream) |
| Action Detection | Kinetics-Skeleton dataset | GFLOPS per prediction | 0.16 | CoST-GCN (1-stream) |
| Action Detection | Kinetics-Skeleton dataset | Accuracy | 30.2 | CoST-GCN* (1-stream) |
| Action Detection | Kinetics-Skeleton dataset | GFLOPS per prediction | 0.11 | CoST-GCN* (1-stream) |
| Action Detection | Kinetics-Skeleton dataset | Accuracy | 29.9 | CoS-TR* (2-stream) |
| Action Detection | Kinetics-Skeleton dataset | GFLOPS per prediction | 0.22 | CoS-TR* (2-stream) |
| Action Detection | Kinetics-Skeleton dataset | Accuracy | 29.7 | CoS-TR (1-stream) |
| Action Detection | Kinetics-Skeleton dataset | Accuracy | 27.5 | CoAGCN* (2-stream) |
| Action Detection | Kinetics-Skeleton dataset | GFLOPS per prediction | 0.25 | CoAGCN* (2-stream) |
| Action Detection | Kinetics-Skeleton dataset | Accuracy | 27.4 | CoS-TR* (1-stream) |
| Action Detection | Kinetics-Skeleton dataset | GFLOPS per prediction | 0.11 | CoS-TR* (1-stream) |
| Action Detection | Kinetics-Skeleton dataset | Accuracy | 23.3 | CoAGCN* (1-stream) |
| Action Detection | Kinetics-Skeleton dataset | GFLOPS per prediction | 0.12 | CoAGCN* (1-stream) |
| Action Detection | Kinetics-Skeleton dataset | GFLOPS per prediction | 0.36 | CoAGCN (2-stream) |
| Action Detection | NTU RGB+D | Accuracy (CS) | 88.9 | CoS-TR* (2-stream) |
| Action Detection | NTU RGB+D | Accuracy (CV) | 94.8 | CoS-TR* (2-stream) |
| Action Detection | NTU RGB+D | GFLOPs per pred | 0.3 | CoS-TR* (2-stream) |
| Action Detection | NTU RGB+D | Accuracy (CS) | 88.3 | CoST-GCN* (2-stream) |
| Action Detection | NTU RGB+D | Accuracy (CV) | 95 | CoST-GCN* (2-stream) |
| Action Detection | NTU RGB+D | GFLOPs per pred | 0.32 | CoST-GCN* (2-stream) |
| Action Detection | NTU RGB+D | Accuracy (CS) | 86.3 | CoST-GCN* |
| Action Detection | NTU RGB+D | Accuracy (CV) | 93.8 | CoST-GCN* |
| Action Detection | NTU RGB+D | GFLOPs per pred | 0.16 | CoST-GCN* |
| Action Detection | NTU RGB+D | Accuracy (CS) | 86.3 | CoS-TR* |
| Action Detection | NTU RGB+D | Accuracy (CV) | 92.4 | CoS-TR* |
| Action Detection | NTU RGB+D | GFLOPs per pred | 0.15 | CoS-TR* |
| Action Detection | NTU RGB+D | Accuracy (CS) | 86 | ST-GCN |
| Action Detection | NTU RGB+D | Accuracy (CV) | 93.4 | ST-GCN |
| Action Detection | NTU RGB+D | GFLOPs per pred | 16.73 | ST-GCN |
| Action Detection | NTU RGB+D | Accuracy (CS) | 86 | CoAGCN* (2-stream) |
| Action Detection | NTU RGB+D | Accuracy (CV) | 93.1 | CoAGCN* (2-stream) |
| Action Detection | NTU RGB+D | GFLOPs per pred | 0.44 | CoAGCN* (2-stream) |
| Action Detection | NTU RGB+D | Accuracy (CS) | 84.1 | CoAGCN* |
| Action Detection | NTU RGB+D | Accuracy (CV) | 92.6 | CoAGCN* |
| 3D Action Recognition | NTU RGB+D 120 | Accuracy (Cross-Setup) | 86.2 | S-TR (2-stream) |
| 3D Action Recognition | NTU RGB+D 120 | Accuracy (Cross-Subject) | 84.8 | S-TR (2-stream) |
| 3D Action Recognition | NTU RGB+D 120 | GFLOPS per prediction | 32.4 | S-TR (2-stream) |
| 3D Action Recognition | NTU RGB+D 120 | Accuracy (Cross-Setup) | 86.1 | CoS-TR* (2-stream) |
| 3D Action Recognition | NTU RGB+D 120 | Accuracy (Cross-Subject) | 84.8 | CoS-TR* (2-stream) |
| 3D Action Recognition | NTU RGB+D 120 | GFLOPS per prediction | 0.3 | CoS-TR* (2-stream) |
| 3D Action Recognition | NTU RGB+D 120 | Accuracy (Cross-Setup) | 85.5 | CoST-GCN* (2-stream) |
| 3D Action Recognition | NTU RGB+D 120 | Accuracy (Cross-Subject) | 84 | CoST-GCN* (2-stream) |
| 3D Action Recognition | NTU RGB+D 120 | GFLOPS per prediction | 0.32 | CoST-GCN* (2-stream) |
| 3D Action Recognition | NTU RGB+D 120 | Accuracy (Cross-Setup) | 85.4 | AGCN (2-stream) |
| 3D Action Recognition | NTU RGB+D 120 | Accuracy (Cross-Subject) | 84 | AGCN (2-stream) |
| 3D Action Recognition | NTU RGB+D 120 | GFLOPS per prediction | 37.38 | AGCN (2-stream) |
| 3D Action Recognition | NTU RGB+D 120 | Accuracy (Cross-Setup) | 85.1 | ST-GCN (2-stream) |
| 3D Action Recognition | NTU RGB+D 120 | Accuracy (Cross-Subject) | 83.7 | ST-GCN (2-stream) |
| 3D Action Recognition | NTU RGB+D 120 | GFLOPS per prediction | 33.46 | ST-GCN (2-stream) |
| 3D Action Recognition | NTU RGB+D 120 | Accuracy (Cross-Setup) | 82 | CoAGCN* (2-stream) |
| 3D Action Recognition | NTU RGB+D 120 | Accuracy (Cross-Subject) | 80.4 | CoAGCN* (2-stream) |
| 3D Action Recognition | NTU RGB+D 120 | GFLOPS per prediction | 0.44 | CoAGCN* (2-stream) |
| 3D Action Recognition | NTU RGB+D 120 | Accuracy (Cross-Setup) | 81.8 | S-TR (1-stream) |
| 3D Action Recognition | NTU RGB+D 120 | Accuracy (Cross-Subject) | 80.2 | S-TR (1-stream) |
| 3D Action Recognition | NTU RGB+D 120 | GFLOPS per prediction | 16.2 | S-TR (1-stream) |
| 3D Action Recognition | NTU RGB+D 120 | Accuracy (Cross-Setup) | 81.7 | CoS-TR* (1-stream) |
| 3D Action Recognition | NTU RGB+D 120 | Accuracy (Cross-Subject) | 79.7 | CoS-TR* (1-stream) |
| 3D Action Recognition | NTU RGB+D 120 | GFLOPS per prediction | 0.15 | CoS-TR* (1-stream) |
| 3D Action Recognition | NTU RGB+D 120 | Accuracy (Cross-Setup) | 80.7 | AGCN (1-stream) |
| 3D Action Recognition | NTU RGB+D 120 | Accuracy (Cross-Subject) | 79.7 | AGCN (1-stream) |
| 3D Action Recognition | NTU RGB+D 120 | GFLOPS per prediction | 18.69 | AGCN (1-stream) |
| 3D Action Recognition | NTU RGB+D 120 | Accuracy (Cross-Setup) | 81.6 | CoST-GCN* (1-stream) |
| 3D Action Recognition | NTU RGB+D 120 | Accuracy (Cross-Subject) | 79.4 | CoST-GCN* (1-stream) |
| 3D Action Recognition | NTU RGB+D 120 | GFLOPS per prediction | 0.16 | CoST-GCN* (1-stream) |
| 3D Action Recognition | NTU RGB+D 120 | Accuracy (Cross-Subject) | 79 | ST-GCN (1-stream) |
| 3D Action Recognition | NTU RGB+D 120 | GFLOPS per prediction | 16.73 | ST-GCN (1-stream) |
| 3D Action Recognition | NTU RGB+D 120 | Accuracy (Cross-Setup) | 79.1 | CoAGCN* (1-stream) |
| 3D Action Recognition | NTU RGB+D 120 | Accuracy (Cross-Subject) | 77.3 | CoAGCN* (1-stream) |
| 3D Action Recognition | NTU RGB+D 120 | GFLOPS per prediction | 0.22 | CoAGCN* (1-stream) |
| 3D Action Recognition | Kinetics-Skeleton dataset | Accuracy | 36.9 | AGCN (2-stream) |
| 3D Action Recognition | Kinetics-Skeleton dataset | GFLOPS per prediction | 26.91 | AGCN (2-stream) |
| 3D Action Recognition | Kinetics-Skeleton dataset | Accuracy | 35 | AGCN (1-stream) |
| 3D Action Recognition | Kinetics-Skeleton dataset | GFLOPS per prediction | 13.45 | AGCN (1-stream) |
| 3D Action Recognition | Kinetics-Skeleton dataset | Accuracy | 34.7 | S-TR (2-stream) |
| 3D Action Recognition | Kinetics-Skeleton dataset | GFLOPS per prediction | 23.24 | S-TR (2-stream) |
| 3D Action Recognition | Kinetics-Skeleton dataset | Accuracy | 34.4 | ST-GCN (2-stream) |
| 3D Action Recognition | Kinetics-Skeleton dataset | GFLOPS per prediction | 24.09 | ST-GCN (2-stream) |
| 3D Action Recognition | Kinetics-Skeleton dataset | Accuracy | 33.4 | ST-GCN (1-stream) |
| 3D Action Recognition | Kinetics-Skeleton dataset | GFLOPS per prediction | 12.04 | ST-GCN (1-stream) |
| 3D Action Recognition | Kinetics-Skeleton dataset | Accuracy | 33.1 | CoST-GCN (2-stream) |
| 3D Action Recognition | Kinetics-Skeleton dataset | GFLOPS per prediction | 0.32 | CoST-GCN (2-stream) |
| 3D Action Recognition | Kinetics-Skeleton dataset | Accuracy | 33 | CoAGCN (1-stream) |
| 3D Action Recognition | Kinetics-Skeleton dataset | GFLOPS per prediction | 0.18 | CoAGCN (1-stream) |
| 3D Action Recognition | Kinetics-Skeleton dataset | Accuracy | 32.7 | CoS-TR (2-stream) |
| 3D Action Recognition | Kinetics-Skeleton dataset | GFLOPS per prediction | 0.31 | CoS-TR (2-stream) |
| 3D Action Recognition | Kinetics-Skeleton dataset | Accuracy | 32.2 | CoST-GCN* (2-stream) |
| 3D Action Recognition | Kinetics-Skeleton dataset | GFLOPS per prediction | 0.22 | CoST-GCN* (2-stream) |
| 3D Action Recognition | Kinetics-Skeleton dataset | Accuracy | 32 | S-TR (1-stream) |
| 3D Action Recognition | Kinetics-Skeleton dataset | GFLOPS per prediction | 11.62 | S-TR (1-stream) |
| 3D Action Recognition | Kinetics-Skeleton dataset | Accuracy | 31.8 | CoST-GCN (1-stream) |
| 3D Action Recognition | Kinetics-Skeleton dataset | GFLOPS per prediction | 0.16 | CoST-GCN (1-stream) |
| 3D Action Recognition | Kinetics-Skeleton dataset | Accuracy | 30.2 | CoST-GCN* (1-stream) |
| 3D Action Recognition | Kinetics-Skeleton dataset | GFLOPS per prediction | 0.11 | CoST-GCN* (1-stream) |
| 3D Action Recognition | Kinetics-Skeleton dataset | Accuracy | 29.9 | CoS-TR* (2-stream) |
| 3D Action Recognition | Kinetics-Skeleton dataset | GFLOPS per prediction | 0.22 | CoS-TR* (2-stream) |
| 3D Action Recognition | Kinetics-Skeleton dataset | Accuracy | 29.7 | CoS-TR (1-stream) |
| 3D Action Recognition | Kinetics-Skeleton dataset | Accuracy | 27.5 | CoAGCN* (2-stream) |
| 3D Action Recognition | Kinetics-Skeleton dataset | GFLOPS per prediction | 0.25 | CoAGCN* (2-stream) |
| 3D Action Recognition | Kinetics-Skeleton dataset | Accuracy | 27.4 | CoS-TR* (1-stream) |
| 3D Action Recognition | Kinetics-Skeleton dataset | GFLOPS per prediction | 0.11 | CoS-TR* (1-stream) |
| 3D Action Recognition | Kinetics-Skeleton dataset | Accuracy | 23.3 | CoAGCN* (1-stream) |
| 3D Action Recognition | Kinetics-Skeleton dataset | GFLOPS per prediction | 0.12 | CoAGCN* (1-stream) |
| 3D Action Recognition | Kinetics-Skeleton dataset | GFLOPS per prediction | 0.36 | CoAGCN (2-stream) |
| 3D Action Recognition | NTU RGB+D | Accuracy (CS) | 88.9 | CoS-TR* (2-stream) |
| 3D Action Recognition | NTU RGB+D | Accuracy (CV) | 94.8 | CoS-TR* (2-stream) |
| 3D Action Recognition | NTU RGB+D | GFLOPs per pred | 0.3 | CoS-TR* (2-stream) |
| 3D Action Recognition | NTU RGB+D | Accuracy (CS) | 88.3 | CoST-GCN* (2-stream) |
| 3D Action Recognition | NTU RGB+D | Accuracy (CV) | 95 | CoST-GCN* (2-stream) |
| 3D Action Recognition | NTU RGB+D | GFLOPs per pred | 0.32 | CoST-GCN* (2-stream) |
| 3D Action Recognition | NTU RGB+D | Accuracy (CS) | 86.3 | CoST-GCN* |
| 3D Action Recognition | NTU RGB+D | Accuracy (CV) | 93.8 | CoST-GCN* |
| 3D Action Recognition | NTU RGB+D | GFLOPs per pred | 0.16 | CoST-GCN* |
| 3D Action Recognition | NTU RGB+D | Accuracy (CS) | 86.3 | CoS-TR* |
| 3D Action Recognition | NTU RGB+D | Accuracy (CV) | 92.4 | CoS-TR* |
| 3D Action Recognition | NTU RGB+D | GFLOPs per pred | 0.15 | CoS-TR* |
| 3D Action Recognition | NTU RGB+D | Accuracy (CS) | 86 | ST-GCN |
| 3D Action Recognition | NTU RGB+D | Accuracy (CV) | 93.4 | ST-GCN |
| 3D Action Recognition | NTU RGB+D | GFLOPs per pred | 16.73 | ST-GCN |
| 3D Action Recognition | NTU RGB+D | Accuracy (CS) | 86 | CoAGCN* (2-stream) |
| 3D Action Recognition | NTU RGB+D | Accuracy (CV) | 93.1 | CoAGCN* (2-stream) |
| 3D Action Recognition | NTU RGB+D | GFLOPs per pred | 0.44 | CoAGCN* (2-stream) |
| 3D Action Recognition | NTU RGB+D | Accuracy (CS) | 84.1 | CoAGCN* |
| 3D Action Recognition | NTU RGB+D | Accuracy (CV) | 92.6 | CoAGCN* |
| Action Recognition | NTU RGB+D 120 | Accuracy (Cross-Setup) | 86.2 | S-TR (2-stream) |
| Action Recognition | NTU RGB+D 120 | Accuracy (Cross-Subject) | 84.8 | S-TR (2-stream) |
| Action Recognition | NTU RGB+D 120 | GFLOPS per prediction | 32.4 | S-TR (2-stream) |
| Action Recognition | NTU RGB+D 120 | Accuracy (Cross-Setup) | 86.1 | CoS-TR* (2-stream) |
| Action Recognition | NTU RGB+D 120 | Accuracy (Cross-Subject) | 84.8 | CoS-TR* (2-stream) |
| Action Recognition | NTU RGB+D 120 | GFLOPS per prediction | 0.3 | CoS-TR* (2-stream) |
| Action Recognition | NTU RGB+D 120 | Accuracy (Cross-Setup) | 85.5 | CoST-GCN* (2-stream) |
| Action Recognition | NTU RGB+D 120 | Accuracy (Cross-Subject) | 84 | CoST-GCN* (2-stream) |
| Action Recognition | NTU RGB+D 120 | GFLOPS per prediction | 0.32 | CoST-GCN* (2-stream) |
| Action Recognition | NTU RGB+D 120 | Accuracy (Cross-Setup) | 85.4 | AGCN (2-stream) |
| Action Recognition | NTU RGB+D 120 | Accuracy (Cross-Subject) | 84 | AGCN (2-stream) |
| Action Recognition | NTU RGB+D 120 | GFLOPS per prediction | 37.38 | AGCN (2-stream) |
| Action Recognition | NTU RGB+D 120 | Accuracy (Cross-Setup) | 85.1 | ST-GCN (2-stream) |
| Action Recognition | NTU RGB+D 120 | Accuracy (Cross-Subject) | 83.7 | ST-GCN (2-stream) |
| Action Recognition | NTU RGB+D 120 | GFLOPS per prediction | 33.46 | ST-GCN (2-stream) |
| Action Recognition | NTU RGB+D 120 | Accuracy (Cross-Setup) | 82 | CoAGCN* (2-stream) |
| Action Recognition | NTU RGB+D 120 | Accuracy (Cross-Subject) | 80.4 | CoAGCN* (2-stream) |
| Action Recognition | NTU RGB+D 120 | GFLOPS per prediction | 0.44 | CoAGCN* (2-stream) |
| Action Recognition | NTU RGB+D 120 | Accuracy (Cross-Setup) | 81.8 | S-TR (1-stream) |
| Action Recognition | NTU RGB+D 120 | Accuracy (Cross-Subject) | 80.2 | S-TR (1-stream) |
| Action Recognition | NTU RGB+D 120 | GFLOPS per prediction | 16.2 | S-TR (1-stream) |
| Action Recognition | NTU RGB+D 120 | Accuracy (Cross-Setup) | 81.7 | CoS-TR* (1-stream) |
| Action Recognition | NTU RGB+D 120 | Accuracy (Cross-Subject) | 79.7 | CoS-TR* (1-stream) |
| Action Recognition | NTU RGB+D 120 | GFLOPS per prediction | 0.15 | CoS-TR* (1-stream) |
| Action Recognition | NTU RGB+D 120 | Accuracy (Cross-Setup) | 80.7 | AGCN (1-stream) |
| Action Recognition | NTU RGB+D 120 | Accuracy (Cross-Subject) | 79.7 | AGCN (1-stream) |
| Action Recognition | NTU RGB+D 120 | GFLOPS per prediction | 18.69 | AGCN (1-stream) |
| Action Recognition | NTU RGB+D 120 | Accuracy (Cross-Setup) | 81.6 | CoST-GCN* (1-stream) |
| Action Recognition | NTU RGB+D 120 | Accuracy (Cross-Subject) | 79.4 | CoST-GCN* (1-stream) |
| Action Recognition | NTU RGB+D 120 | GFLOPS per prediction | 0.16 | CoST-GCN* (1-stream) |
| Action Recognition | NTU RGB+D 120 | Accuracy (Cross-Subject) | 79 | ST-GCN (1-stream) |
| Action Recognition | NTU RGB+D 120 | GFLOPS per prediction | 16.73 | ST-GCN (1-stream) |
| Action Recognition | NTU RGB+D 120 | Accuracy (Cross-Setup) | 79.1 | CoAGCN* (1-stream) |
| Action Recognition | NTU RGB+D 120 | Accuracy (Cross-Subject) | 77.3 | CoAGCN* (1-stream) |
| Action Recognition | NTU RGB+D 120 | GFLOPS per prediction | 0.22 | CoAGCN* (1-stream) |
| Action Recognition | Kinetics-Skeleton dataset | Accuracy | 36.9 | AGCN (2-stream) |
| Action Recognition | Kinetics-Skeleton dataset | GFLOPS per prediction | 26.91 | AGCN (2-stream) |
| Action Recognition | Kinetics-Skeleton dataset | Accuracy | 35 | AGCN (1-stream) |
| Action Recognition | Kinetics-Skeleton dataset | GFLOPS per prediction | 13.45 | AGCN (1-stream) |
| Action Recognition | Kinetics-Skeleton dataset | Accuracy | 34.7 | S-TR (2-stream) |
| Action Recognition | Kinetics-Skeleton dataset | GFLOPS per prediction | 23.24 | S-TR (2-stream) |
| Action Recognition | Kinetics-Skeleton dataset | Accuracy | 34.4 | ST-GCN (2-stream) |
| Action Recognition | Kinetics-Skeleton dataset | GFLOPS per prediction | 24.09 | ST-GCN (2-stream) |
| Action Recognition | Kinetics-Skeleton dataset | Accuracy | 33.4 | ST-GCN (1-stream) |
| Action Recognition | Kinetics-Skeleton dataset | GFLOPS per prediction | 12.04 | ST-GCN (1-stream) |
| Action Recognition | Kinetics-Skeleton dataset | Accuracy | 33.1 | CoST-GCN (2-stream) |
| Action Recognition | Kinetics-Skeleton dataset | GFLOPS per prediction | 0.32 | CoST-GCN (2-stream) |
| Action Recognition | Kinetics-Skeleton dataset | Accuracy | 33 | CoAGCN (1-stream) |
| Action Recognition | Kinetics-Skeleton dataset | GFLOPS per prediction | 0.18 | CoAGCN (1-stream) |
| Action Recognition | Kinetics-Skeleton dataset | Accuracy | 32.7 | CoS-TR (2-stream) |
| Action Recognition | Kinetics-Skeleton dataset | GFLOPS per prediction | 0.31 | CoS-TR (2-stream) |
| Action Recognition | Kinetics-Skeleton dataset | Accuracy | 32.2 | CoST-GCN* (2-stream) |
| Action Recognition | Kinetics-Skeleton dataset | GFLOPS per prediction | 0.22 | CoST-GCN* (2-stream) |
| Action Recognition | Kinetics-Skeleton dataset | Accuracy | 32 | S-TR (1-stream) |
| Action Recognition | Kinetics-Skeleton dataset | GFLOPS per prediction | 11.62 | S-TR (1-stream) |
| Action Recognition | Kinetics-Skeleton dataset | Accuracy | 31.8 | CoST-GCN (1-stream) |
| Action Recognition | Kinetics-Skeleton dataset | GFLOPS per prediction | 0.16 | CoST-GCN (1-stream) |
| Action Recognition | Kinetics-Skeleton dataset | Accuracy | 30.2 | CoST-GCN* (1-stream) |
| Action Recognition | Kinetics-Skeleton dataset | GFLOPS per prediction | 0.11 | CoST-GCN* (1-stream) |
| Action Recognition | Kinetics-Skeleton dataset | Accuracy | 29.9 | CoS-TR* (2-stream) |
| Action Recognition | Kinetics-Skeleton dataset | GFLOPS per prediction | 0.22 | CoS-TR* (2-stream) |
| Action Recognition | Kinetics-Skeleton dataset | Accuracy | 29.7 | CoS-TR (1-stream) |
| Action Recognition | Kinetics-Skeleton dataset | Accuracy | 27.5 | CoAGCN* (2-stream) |
| Action Recognition | Kinetics-Skeleton dataset | GFLOPS per prediction | 0.25 | CoAGCN* (2-stream) |
| Action Recognition | Kinetics-Skeleton dataset | Accuracy | 27.4 | CoS-TR* (1-stream) |
| Action Recognition | Kinetics-Skeleton dataset | GFLOPS per prediction | 0.11 | CoS-TR* (1-stream) |
| Action Recognition | Kinetics-Skeleton dataset | Accuracy | 23.3 | CoAGCN* (1-stream) |
| Action Recognition | Kinetics-Skeleton dataset | GFLOPS per prediction | 0.12 | CoAGCN* (1-stream) |
| Action Recognition | Kinetics-Skeleton dataset | GFLOPS per prediction | 0.36 | CoAGCN (2-stream) |
| Action Recognition | NTU RGB+D | Accuracy (CS) | 88.9 | CoS-TR* (2-stream) |
| Action Recognition | NTU RGB+D | Accuracy (CV) | 94.8 | CoS-TR* (2-stream) |
| Action Recognition | NTU RGB+D | GFLOPs per pred | 0.3 | CoS-TR* (2-stream) |
| Action Recognition | NTU RGB+D | Accuracy (CS) | 88.3 | CoST-GCN* (2-stream) |
| Action Recognition | NTU RGB+D | Accuracy (CV) | 95 | CoST-GCN* (2-stream) |
| Action Recognition | NTU RGB+D | GFLOPs per pred | 0.32 | CoST-GCN* (2-stream) |
| Action Recognition | NTU RGB+D | Accuracy (CS) | 86.3 | CoST-GCN* |
| Action Recognition | NTU RGB+D | Accuracy (CV) | 93.8 | CoST-GCN* |
| Action Recognition | NTU RGB+D | GFLOPs per pred | 0.16 | CoST-GCN* |
| Action Recognition | NTU RGB+D | Accuracy (CS) | 86.3 | CoS-TR* |
| Action Recognition | NTU RGB+D | Accuracy (CV) | 92.4 | CoS-TR* |
| Action Recognition | NTU RGB+D | GFLOPs per pred | 0.15 | CoS-TR* |
| Action Recognition | NTU RGB+D | Accuracy (CS) | 86 | ST-GCN |
| Action Recognition | NTU RGB+D | Accuracy (CV) | 93.4 | ST-GCN |
| Action Recognition | NTU RGB+D | GFLOPs per pred | 16.73 | ST-GCN |
| Action Recognition | NTU RGB+D | Accuracy (CS) | 86 | CoAGCN* (2-stream) |
| Action Recognition | NTU RGB+D | Accuracy (CV) | 93.1 | CoAGCN* (2-stream) |
| Action Recognition | NTU RGB+D | GFLOPs per pred | 0.44 | CoAGCN* (2-stream) |
| Action Recognition | NTU RGB+D | Accuracy (CS) | 84.1 | CoAGCN* |
| Action Recognition | NTU RGB+D | Accuracy (CV) | 92.6 | CoAGCN* |