Long-term Temporal Convolutions for Action Recognition

Gül Varol, Ivan Laptev, Cordelia Schmid

2016-04-15Optical Flow Estimation Action Recognition Temporal Action Localization

Abstract

Typical human actions last several seconds and exhibit characteristic spatio-temporal structure. Recent methods attempt to capture this structure and learn action representations with convolutional neural networks. Such representations, however, are typically learned at the level of a few video frames failing to model actions at their full temporal extent. In this work we learn video representations using neural networks with long-term temporal convolutions (LTC). We demonstrate that LTC-CNN models with increased temporal extents improve the accuracy of action recognition. We also study the impact of different low-level representations, such as raw values of video pixels and optical flow vector fields and demonstrate the importance of high-quality optical flow estimation for learning accurate action models. We report state-of-the-art results on two challenging benchmarks for human action recognition UCF101 (92.7%) and HMDB51 (67.2%).

Results

Task	Dataset	Metric	Value	Model
Activity Recognition	HMDB-51	Average accuracy of 3 splits	64.8	LTC
Activity Recognition	UCF101	3-fold Accuracy	91.7	LTC
Action Recognition	HMDB-51	Average accuracy of 3 splits	64.8	LTC
Action Recognition	UCF101	3-fold Accuracy	91.7	LTC

Related Papers

Channel-wise Motion Features for Efficient Motion Segmentation2025-07-17 A Real-Time System for Egocentric Hand-Object Interaction Detection in Industrial Domains2025-07-17 DVFL-Net: A Lightweight Distilled Video Focal Modulation Network for Spatio-Temporal Action Recognition2025-07-16 An Efficient Approach for Muscle Segmentation and 3D Reconstruction Using Keypoint Tracking in MRI Scan2025-07-11 Learning to Track Any Points from Human Motion2025-07-08 TLB-VFI: Temporal-Aware Latent Brownian Bridge Diffusion for Video Frame Interpolation2025-07-07 Zero-shot Skeleton-based Action Recognition with Prototype-guided Feature Alignment2025-07-01 MEMFOF: High-Resolution Training for Memory-Efficient Multi-Frame Optical Flow Estimation2025-06-29