TasksSotADatasetsPapersMethodsSubmitAbout
Papers With Code 2

A community resource for machine learning research: papers, code, benchmarks, and state-of-the-art results.

Explore

Notable BenchmarksAll SotADatasetsPapersMethods

Community

Submit ResultsAbout

Data sourced from the PWC Archive (CC-BY-SA 4.0). Built by the community, for the community.

Papers/Action Machine: Rethinking Action Recognition in Trimmed V...

Action Machine: Rethinking Action Recognition in Trimmed Videos

Jiagang Zhu, Wei Zou, Liang Xu, Yiming Hu, Zheng Zhu, Manyu Chang, Jun-Jie Huang, Guan Huang, Dalong Du

2018-12-14Skeleton Based Action RecognitionPose EstimationMultimodal Activity RecognitionAction RecognitionTemporal Action Localization
PaperPDF

Abstract

Existing methods in video action recognition mostly do not distinguish human body from the environment and easily overfit the scenes and objects. In this work, we present a conceptually simple, general and high-performance framework for action recognition in trimmed videos, aiming at person-centric modeling. The method, called Action Machine, takes as inputs the videos cropped by person bounding boxes. It extends the Inflated 3D ConvNet (I3D) by adding a branch for human pose estimation and a 2D CNN for pose-based action recognition, being fast to train and test. Action Machine can benefit from the multi-task training of action recognition and pose estimation, the fusion of predictions from RGB images and poses. On NTU RGB-D, Action Machine achieves the state-of-the-art performance with top-1 accuracies of 97.2% and 94.3% on cross-view and cross-subject respectively. Action Machine also achieves competitive performance on another three smaller action recognition datasets: Northwestern UCLA Multiview Action3D, MSR Daily Activity3D and UTD-MHAD. Code will be made available.

Results

TaskDatasetMetricValueModel
Activity RecognitionNTU RGB+DAccuracy (CS)94.3Action Machine (RGB only)
Activity RecognitionNTU RGB+DAccuracy (CV)97.2Action Machine (RGB only)
Activity RecognitionUTD-MHADAccuracy92.5Action Machine (RGB only)
Activity RecognitionUTD-MHADAccuracy (CS)92.5Action Machine
Activity RecognitionMSR Daily Activity3D datasetAccuracy93Action Machine (RGB only)
Action RecognitionNTU RGB+DAccuracy (CS)94.3Action Machine (RGB only)
Action RecognitionNTU RGB+DAccuracy (CV)97.2Action Machine (RGB only)
Action RecognitionUTD-MHADAccuracy92.5Action Machine (RGB only)

Related Papers

$π^3$: Scalable Permutation-Equivariant Visual Geometry Learning2025-07-17Revisiting Reliability in the Reasoning-based Pose Estimation Benchmark2025-07-17DINO-VO: A Feature-based Visual Odometry Leveraging a Visual Foundation Model2025-07-17From Neck to Head: Bio-Impedance Sensing for Head Pose Estimation2025-07-17AthleticsPose: Authentic Sports Motion Dataset on Athletic Field and Evaluation of Monocular 3D Pose Estimation Ability2025-07-17A Real-Time System for Egocentric Hand-Object Interaction Detection in Industrial Domains2025-07-17SpatialTrackerV2: 3D Point Tracking Made Easy2025-07-16SGLoc: Semantic Localization System for Camera Pose Estimation from 3D Gaussian Splatting Representation2025-07-16