TasksSotADatasetsPapersMethodsSubmitAbout
Papers With Code 2

A community resource for machine learning research: papers, code, benchmarks, and state-of-the-art results.

Explore

Notable BenchmarksAll SotADatasetsPapersMethods

Community

Submit ResultsAbout

Data sourced from the PWC Archive (CC-BY-SA 4.0). Built by the community, for the community.

Papers/Representation Flow for Action Recognition

Representation Flow for Action Recognition

AJ Piergiovanni, Michael S. Ryoo

2018-10-02CVPR 2019 6Activity Recognition In VideosAction ClassificationOptical Flow EstimationVideo ClassificationVideo UnderstandingAction RecognitionAction Recognition In VideosTemporal Action LocalizationActivity Recognition
PaperPDFCodeCodeCodeCode(official)Code

Abstract

In this paper, we propose a convolutional layer inspired by optical flow algorithms to learn motion representations. Our representation flow layer is a fully-differentiable layer designed to capture the `flow' of any representation channel within a convolutional neural network for action recognition. Its parameters for iterative flow optimization are learned in an end-to-end fashion together with the other CNN model parameters, maximizing the action recognition performance. Furthermore, we newly introduce the concept of learning `flow of flow' representations by stacking multiple representation flow layers. We conducted extensive experimental evaluations, confirming its advantages over previous recognition models using traditional optical flows in both computational speed and performance. Code/models available here: https://piergiaj.github.io/rep-flow-site/

Results

TaskDatasetMetricValueModel
VideoKinetics-400Acc@177.9RepFlow-50 ([2+1]D CNN, FcF, Non-local block)
Activity RecognitionHMDB-51Average accuracy of 3 splits81.1RepFlow-50 ([2+1]D CNN, FcF, Non-local block)
Action RecognitionHMDB-51Average accuracy of 3 splits81.1RepFlow-50 ([2+1]D CNN, FcF, Non-local block)

Related Papers

Channel-wise Motion Features for Efficient Motion Segmentation2025-07-17VideoITG: Multimodal Video Understanding with Instructed Temporal Grounding2025-07-17A Real-Time System for Egocentric Hand-Object Interaction Detection in Industrial Domains2025-07-17DVFL-Net: A Lightweight Distilled Video Focal Modulation Network for Spatio-Temporal Action Recognition2025-07-16UGC-VideoCaptioner: An Omni UGC Video Detail Caption Model and New Benchmarks2025-07-15ZKP-FedEval: Verifiable and Privacy-Preserving Federated Evaluation using Zero-Knowledge Proofs2025-07-15EmbRACE-3K: Embodied Reasoning and Action in Complex Environments2025-07-14Chat with AI: The Surprising Turn of Real-time Video Communication from Human to AI2025-07-14