TasksSotADatasetsPapersMethodsSubmitAbout
Papers With Code 2

A community resource for machine learning research: papers, code, benchmarks, and state-of-the-art results.

Explore

Notable BenchmarksAll SotADatasetsPapersMethods

Community

Submit ResultsAbout

Data sourced from the PWC Archive (CC-BY-SA 4.0). Built by the community, for the community.

Papers/Beyond Short Snippets: Deep Networks for Video Classificat...

Beyond Short Snippets: Deep Networks for Video Classification

Joe Yue-Hei Ng, Matthew Hausknecht, Sudheendra Vijayanarasimhan, Oriol Vinyals, Rajat Monga, George Toderici

2015-03-31CVPR 2015 6Optical Flow EstimationVideo ClassificationGeneral ClassificationAction RecognitionRetrievalClassification
PaperPDFCode

Abstract

Convolutional neural networks (CNNs) have been extensively applied for image recognition problems giving state-of-the-art results on recognition, detection, segmentation and retrieval. In this work we propose and evaluate several deep neural network architectures to combine image information across a video over longer time periods than previously attempted. We propose two methods capable of handling full length videos. The first method explores various convolutional temporal feature pooling architectures, examining the various design choices which need to be made when adapting a CNN for this task. The second proposed method explicitly models the video as an ordered sequence of frames. For this purpose we employ a recurrent neural network that uses Long Short-Term Memory (LSTM) cells which are connected to the output of the underlying CNN. Our best networks exhibit significant performance improvements over previously published results on the Sports 1 million dataset (73.1% vs. 60.9%) and the UCF-101 datasets with (88.6% vs. 88.0%) and without additional optical flow information (82.6% vs. 72.8%).

Results

TaskDatasetMetricValueModel
Activity RecognitionSports-1MVideo hit@1 71.7Conv pooling
Activity RecognitionSports-1MVideo hit@590.4Conv pooling
Activity RecognitionUCF1013-fold Accuracy88.6Two-stream+LSTM
Action RecognitionSports-1MVideo hit@1 71.7Conv pooling
Action RecognitionSports-1MVideo hit@590.4Conv pooling
Action RecognitionUCF1013-fold Accuracy88.6Two-stream+LSTM

Related Papers

Channel-wise Motion Features for Efficient Motion Segmentation2025-07-17A Real-Time System for Egocentric Hand-Object Interaction Detection in Industrial Domains2025-07-17From Roots to Rewards: Dynamic Tree Reasoning with RL2025-07-17HapticCap: A Multimodal Dataset and Task for Understanding User Experience of Vibration Haptic Signals2025-07-17A Survey of Context Engineering for Large Language Models2025-07-17MCoT-RE: Multi-Faceted Chain-of-Thought and Re-Ranking for Training-Free Zero-Shot Composed Image Retrieval2025-07-17Adversarial attacks to image classification systems using evolutionary algorithms2025-07-17Developing Visual Augmented Q&A System using Scalable Vision Embedding Retrieval & Late Interaction Re-ranker2025-07-16