TasksSotADatasetsPapersMethodsSubmitAbout
Papers With Code 2

A community resource for machine learning research: papers, code, benchmarks, and state-of-the-art results.

Explore

Notable BenchmarksAll SotADatasetsPapersMethods

Community

Submit ResultsAbout

Data sourced from the PWC Archive (CC-BY-SA 4.0). Built by the community, for the community.

Papers/Long-term Temporal Convolutions for Action Recognition

Long-term Temporal Convolutions for Action Recognition

Gül Varol, Ivan Laptev, Cordelia Schmid

2016-04-15Optical Flow EstimationAction RecognitionTemporal Action Localization
PaperPDFCode

Abstract

Typical human actions last several seconds and exhibit characteristic spatio-temporal structure. Recent methods attempt to capture this structure and learn action representations with convolutional neural networks. Such representations, however, are typically learned at the level of a few video frames failing to model actions at their full temporal extent. In this work we learn video representations using neural networks with long-term temporal convolutions (LTC). We demonstrate that LTC-CNN models with increased temporal extents improve the accuracy of action recognition. We also study the impact of different low-level representations, such as raw values of video pixels and optical flow vector fields and demonstrate the importance of high-quality optical flow estimation for learning accurate action models. We report state-of-the-art results on two challenging benchmarks for human action recognition UCF101 (92.7%) and HMDB51 (67.2%).

Results

TaskDatasetMetricValueModel
Activity RecognitionHMDB-51Average accuracy of 3 splits64.8LTC
Activity RecognitionUCF1013-fold Accuracy91.7LTC
Action RecognitionHMDB-51Average accuracy of 3 splits64.8LTC
Action RecognitionUCF1013-fold Accuracy91.7LTC

Related Papers

Channel-wise Motion Features for Efficient Motion Segmentation2025-07-17A Real-Time System for Egocentric Hand-Object Interaction Detection in Industrial Domains2025-07-17DVFL-Net: A Lightweight Distilled Video Focal Modulation Network for Spatio-Temporal Action Recognition2025-07-16An Efficient Approach for Muscle Segmentation and 3D Reconstruction Using Keypoint Tracking in MRI Scan2025-07-11Learning to Track Any Points from Human Motion2025-07-08TLB-VFI: Temporal-Aware Latent Brownian Bridge Diffusion for Video Frame Interpolation2025-07-07Zero-shot Skeleton-based Action Recognition with Prototype-guided Feature Alignment2025-07-01MEMFOF: High-Resolution Training for Memory-Efficient Multi-Frame Optical Flow Estimation2025-06-29