TasksSotADatasetsPapersMethodsSubmitAbout
Papers With Code 2

A community resource for machine learning research: papers, code, benchmarks, and state-of-the-art results.

Explore

Notable BenchmarksAll SotADatasetsPapersMethods

Community

Submit ResultsAbout

Data sourced from the PWC Archive (CC-BY-SA 4.0). Built by the community, for the community.

Papers/STAR-Net: Action Recognition using Spatio-Temporal Activat...

STAR-Net: Action Recognition using Spatio-Temporal Activation Reprojection

William McNally, Alexander Wong, John McPhee

2019-02-26Skeleton Based Action RecognitionPose EstimationMultimodal Activity RecognitionAction RecognitionTemporal Action Localization
PaperPDF

Abstract

While depth cameras and inertial sensors have been frequently leveraged for human action recognition, these sensing modalities are impractical in many scenarios where cost or environmental constraints prohibit their use. As such, there has been recent interest on human action recognition using low-cost, readily-available RGB cameras via deep convolutional neural networks. However, many of the deep convolutional neural networks proposed for action recognition thus far have relied heavily on learning global appearance cues directly from imaging data, resulting in highly complex network architectures that are computationally expensive and difficult to train. Motivated to reduce network complexity and achieve higher performance, we introduce the concept of spatio-temporal activation reprojection (STAR). More specifically, we reproject the spatio-temporal activations generated by human pose estimation layers in space and time using a stack of 3D convolutions. Experimental results on UTD-MHAD and J-HMDB demonstrate that an end-to-end architecture based on the proposed STAR framework (which we nickname STAR-Net) is proficient in single-environment and small-scale applications. On UTD-MHAD, STAR-Net outperforms several methods using richer data modalities such as depth and inertial sensors.

Results

TaskDatasetMetricValueModel
VideoJ-HMDBAccuracy (RGB+pose)64.3STAR-Net
Temporal Action LocalizationJ-HMDBAccuracy (RGB+pose)64.3STAR-Net
Zero-Shot LearningJ-HMDBAccuracy (RGB+pose)64.3STAR-Net
Activity RecognitionJ-HMDBAccuracy (RGB+pose)64.3STAR-Net
Activity RecognitionUTD-MHADAccuracy (CS)90STAR-Net
Action LocalizationJ-HMDBAccuracy (RGB+pose)64.3STAR-Net
Action DetectionJ-HMDBAccuracy (RGB+pose)64.3STAR-Net
3D Action RecognitionJ-HMDBAccuracy (RGB+pose)64.3STAR-Net
Action RecognitionJ-HMDBAccuracy (RGB+pose)64.3STAR-Net

Related Papers

$π^3$: Scalable Permutation-Equivariant Visual Geometry Learning2025-07-17Revisiting Reliability in the Reasoning-based Pose Estimation Benchmark2025-07-17DINO-VO: A Feature-based Visual Odometry Leveraging a Visual Foundation Model2025-07-17From Neck to Head: Bio-Impedance Sensing for Head Pose Estimation2025-07-17AthleticsPose: Authentic Sports Motion Dataset on Athletic Field and Evaluation of Monocular 3D Pose Estimation Ability2025-07-17A Real-Time System for Egocentric Hand-Object Interaction Detection in Industrial Domains2025-07-17SpatialTrackerV2: 3D Point Tracking Made Easy2025-07-16SGLoc: Semantic Localization System for Camera Pose Estimation from 3D Gaussian Splatting Representation2025-07-16