TasksSotADatasetsPapersMethodsSubmitAbout
Papers With Code 2

A community resource for machine learning research: papers, code, benchmarks, and state-of-the-art results.

Explore

Notable BenchmarksAll SotADatasetsPapersMethods

Community

Submit ResultsAbout

Data sourced from the PWC Archive (CC-BY-SA 4.0). Built by the community, for the community.

Papers/Interpretable 3D Human Action Analysis with Temporal Convo...

Interpretable 3D Human Action Analysis with Temporal Convolutional Networks

Tae Soo Kim, Austin Reiter

2017-04-143D Action RecognitionSkeleton Based Action RecognitionMultimodal Activity RecognitionAction RecognitionTemporal Action LocalizationActivity Recognition
PaperPDFCode(official)

Abstract

The discriminative power of modern deep learning models for 3D human action recognition is growing ever so potent. In conjunction with the recent resurgence of 3D human action representation with 3D skeletons, the quality and the pace of recent progress have been significant. However, the inner workings of state-of-the-art learning based methods in 3D human action recognition still remain mostly black-box. In this work, we propose to use a new class of models known as Temporal Convolutional Neural Networks (TCN) for 3D human action recognition. Compared to popular LSTM-based Recurrent Neural Network models, given interpretable input such as 3D skeletons, TCN provides us a way to explicitly learn readily interpretable spatio-temporal representations for 3D human action recognition. We provide our strategy in re-designing the TCN with interpretability in mind and how such characteristics of the model is leveraged to construct a powerful 3D activity recognition method. Through this work, we wish to take a step towards a spatio-temporal model that is easier to understand, explain and interpret. The resulting model, Res-TCN, achieves state-of-the-art results on the largest 3D human action recognition dataset, NTU-RGBD.

Results

TaskDatasetMetricValueModel
VideoNTU RGB+DAccuracy (CS)74.3TCN
Temporal Action LocalizationNTU RGB+DAccuracy (CS)74.3TCN
Zero-Shot LearningNTU RGB+DAccuracy (CS)74.3TCN
Activity RecognitionNTU RGB+DAccuracy (CS)74.3TCN
Activity RecognitionEV-ActionAccuracy80.1TCN (Skeleton Kinect)
Activity RecognitionEV-ActionAccuracy64.1TCN (Skeleton Vicon)
Action LocalizationNTU RGB+DAccuracy (CS)74.3TCN
Action DetectionNTU RGB+DAccuracy (CS)74.3TCN
3D Action RecognitionNTU RGB+DAccuracy (CS)74.3TCN
Action RecognitionNTU RGB+DAccuracy (CS)74.3TCN

Related Papers

A Real-Time System for Egocentric Hand-Object Interaction Detection in Industrial Domains2025-07-17DVFL-Net: A Lightweight Distilled Video Focal Modulation Network for Spatio-Temporal Action Recognition2025-07-16ZKP-FedEval: Verifiable and Privacy-Preserving Federated Evaluation using Zero-Knowledge Proofs2025-07-15Zero-shot Skeleton-based Action Recognition with Prototype-guided Feature Alignment2025-07-01EgoAdapt: Adaptive Multisensory Distillation and Policy Learning for Efficient Egocentric Perception2025-06-26Feature Hallucination for Self-supervised Action Recognition2025-06-25CARMA: Context-Aware Situational Grounding of Human-Robot Group Interactions by Combining Vision-Language Models with Object and Action Recognition2025-06-25SEZ-HARN: Self-Explainable Zero-shot Human Activity Recognition Network2025-06-25