FASTER Recurrent Networks for Efficient Video Classification

Linchao Zhu, Laura Sevilla-Lara, Du Tran, Matt Feiszli, Yi Yang, Heng Wang

2019-06-10Action Classification Video Classification General Classification Action Recognition Classification

Abstract

Typical video classification methods often divide a video into short clips, do inference on each clip independently, then aggregate the clip-level predictions to generate the video-level results. However, processing visually similar clips independently ignores the temporal structure of the video sequence, and increases the computational cost at inference time. In this paper, we propose a novel framework named FASTER, i.e., Feature Aggregation for Spatio-TEmporal Redundancy. FASTER aims to leverage the redundancy between neighboring clips and reduce the computational cost by learning to aggregate the predictions from models of different complexities. The FASTER framework can integrate high quality representations from expensive models to capture subtle motion information and lightweight representations from cheap models to cover scene changes in the video. A new recurrent network (i.e., FAST-GRU) is designed to aggregate the mixture of different representations. Compared with existing approaches, FASTER can reduce the FLOPs by over 10x? while maintaining the state-of-the-art accuracy across popular datasets, such as Kinetics, UCF-101 and HMDB-51.

Results

Task	Dataset	Metric	Value	Model
Video	Kinetics-400	Acc@1	75.1	FASTER32
Video	Kinetics-400	Acc@1	71.7	FASTER16 w/o sp
Activity Recognition	HMDB-51	Average accuracy of 3 splits	75.7	FASTER32 (Kinetics pretrain)
Activity Recognition	UCF101	3-fold Accuracy	96.9	FASTER32
Action Recognition	HMDB-51	Average accuracy of 3 splits	75.7	FASTER32 (Kinetics pretrain)
Action Recognition	UCF101	3-fold Accuracy	96.9	FASTER32

Related Papers

A Real-Time System for Egocentric Hand-Object Interaction Detection in Industrial Domains2025-07-17 Adversarial attacks to image classification systems using evolutionary algorithms2025-07-17 Efficient Calisthenics Skills Classification through Foreground Instance Selection and Depth Estimation2025-07-16 Safeguarding Federated Learning-based Road Condition Classification2025-07-16 AI-Enhanced Pediatric Pneumonia Detection: A CNN-Based Approach Using Data Augmentation and Generative Adversarial Networks (GANs)2025-07-13 Fuzzy Classification Aggregation for a Continuum of Agents2025-07-06 Hybrid-View Attention for csPCa Classification in TRUS2025-07-04 Zero-shot Skeleton-based Action Recognition with Prototype-guided Feature Alignment2025-07-01