TasksSotADatasetsPapersMethodsSubmitAbout
Papers With Code 2

A community resource for machine learning research: papers, code, benchmarks, and state-of-the-art results.

Explore

Notable BenchmarksAll SotADatasetsPapersMethods

Community

Submit ResultsAbout

Data sourced from the PWC Archive (CC-BY-SA 4.0). Built by the community, for the community.

Papers/Learning Spatiotemporal Features with 3D Convolutional Net...

Learning Spatiotemporal Features with 3D Convolutional Networks

Du Tran, Lubomir Bourdev, Rob Fergus, Lorenzo Torresani, Manohar Paluri

2014-12-02ICCV 2015 12Action RecognitionAction Recognition In VideosDynamic Facial Expression Recognition
PaperPDFCodeCode(official)CodeCodeCodeCodeCodeCodeCodeCodeCodeCodeCodeCodeCodeCodeCodeCodeCodeCodeCodeCodeCodeCodeCodeCodeCodeCodeCode

Abstract

We propose a simple, yet effective approach for spatiotemporal feature learning using deep 3-dimensional convolutional networks (3D ConvNets) trained on a large scale supervised video dataset. Our findings are three-fold: 1) 3D ConvNets are more suitable for spatiotemporal feature learning compared to 2D ConvNets; 2) A homogeneous architecture with small 3x3x3 convolution kernels in all layers is among the best performing architectures for 3D ConvNets; and 3) Our learned features, namely C3D (Convolutional 3D), with a simple linear classifier outperform state-of-the-art methods on 4 different benchmarks and are comparable with current best methods on the other 2 benchmarks. In addition, the features are compact: achieving 52.8% accuracy on UCF101 dataset with only 10 dimensions and also very efficient to compute due to the fast inference of ConvNets. Finally, they are conceptually very simple and easy to train and use.

Results

TaskDatasetMetricValueModel
Activity RecognitionSports-1MClip Hit@146.1C3D
Activity RecognitionSports-1MVideo hit@1 61.1C3D
Activity RecognitionSports-1MVideo hit@585.5C3D
Activity RecognitionHMDB-51Average accuracy of 3 splits51.6C3D
Activity RecognitionUCF1013-fold Accuracy82.3C3D
Action RecognitionSports-1MClip Hit@146.1C3D
Action RecognitionSports-1MVideo hit@1 61.1C3D
Action RecognitionSports-1MVideo hit@585.5C3D
Action RecognitionHMDB-51Average accuracy of 3 splits51.6C3D
Action RecognitionUCF1013-fold Accuracy82.3C3D

Related Papers

A Real-Time System for Egocentric Hand-Object Interaction Detection in Industrial Domains2025-07-17Zero-shot Skeleton-based Action Recognition with Prototype-guided Feature Alignment2025-07-01EgoAdapt: Adaptive Multisensory Distillation and Policy Learning for Efficient Egocentric Perception2025-06-26Feature Hallucination for Self-supervised Action Recognition2025-06-25CARMA: Context-Aware Situational Grounding of Human-Robot Group Interactions by Combining Vision-Language Models with Object and Action Recognition2025-06-25Enhancing Ambiguous Dynamic Facial Expression Recognition with Soft Label-based Data Augmentation2025-06-25Including Semantic Information via Word Embeddings for Skeleton-based Action Recognition2025-06-23Adapting Vision-Language Models for Evaluating World Models2025-06-22