TasksSotADatasetsPapersMethodsSubmitAbout
Papers With Code 2

A community resource for machine learning research: papers, code, benchmarks, and state-of-the-art results.

Explore

Notable BenchmarksAll SotADatasetsPapersMethods

Community

Submit ResultsAbout

Data sourced from the PWC Archive (CC-BY-SA 4.0). Built by the community, for the community.

Papers/FASTER Recurrent Networks for Efficient Video Classification

FASTER Recurrent Networks for Efficient Video Classification

Linchao Zhu, Laura Sevilla-Lara, Du Tran, Matt Feiszli, Yi Yang, Heng Wang

2019-06-10Action ClassificationVideo ClassificationGeneral ClassificationAction RecognitionClassification
PaperPDF

Abstract

Typical video classification methods often divide a video into short clips, do inference on each clip independently, then aggregate the clip-level predictions to generate the video-level results. However, processing visually similar clips independently ignores the temporal structure of the video sequence, and increases the computational cost at inference time. In this paper, we propose a novel framework named FASTER, i.e., Feature Aggregation for Spatio-TEmporal Redundancy. FASTER aims to leverage the redundancy between neighboring clips and reduce the computational cost by learning to aggregate the predictions from models of different complexities. The FASTER framework can integrate high quality representations from expensive models to capture subtle motion information and lightweight representations from cheap models to cover scene changes in the video. A new recurrent network (i.e., FAST-GRU) is designed to aggregate the mixture of different representations. Compared with existing approaches, FASTER can reduce the FLOPs by over 10x? while maintaining the state-of-the-art accuracy across popular datasets, such as Kinetics, UCF-101 and HMDB-51.

Results

TaskDatasetMetricValueModel
VideoKinetics-400Acc@175.1FASTER32
VideoKinetics-400Acc@171.7FASTER16 w/o sp
Activity RecognitionHMDB-51Average accuracy of 3 splits75.7FASTER32 (Kinetics pretrain)
Activity RecognitionUCF1013-fold Accuracy96.9FASTER32
Action RecognitionHMDB-51Average accuracy of 3 splits75.7FASTER32 (Kinetics pretrain)
Action RecognitionUCF1013-fold Accuracy96.9FASTER32

Related Papers

A Real-Time System for Egocentric Hand-Object Interaction Detection in Industrial Domains2025-07-17Adversarial attacks to image classification systems using evolutionary algorithms2025-07-17Efficient Calisthenics Skills Classification through Foreground Instance Selection and Depth Estimation2025-07-16Safeguarding Federated Learning-based Road Condition Classification2025-07-16AI-Enhanced Pediatric Pneumonia Detection: A CNN-Based Approach Using Data Augmentation and Generative Adversarial Networks (GANs)2025-07-13Fuzzy Classification Aggregation for a Continuum of Agents2025-07-06Hybrid-View Attention for csPCa Classification in TRUS2025-07-04Zero-shot Skeleton-based Action Recognition with Prototype-guided Feature Alignment2025-07-01