TasksSotADatasetsPapersMethodsSubmitAbout
Papers With Code 2

A community resource for machine learning research: papers, code, benchmarks, and state-of-the-art results.

Explore

Notable BenchmarksAll SotADatasetsPapersMethods

Community

Submit ResultsAbout

Data sourced from the PWC Archive (CC-BY-SA 4.0). Built by the community, for the community.

Papers/Learn to cycle: Time-consistent feature discovery for acti...

Learn to cycle: Time-consistent feature discovery for action recognition

Alexandros Stergiou, Ronald Poppe

2020-06-15Action ClassificationVideo ClassificationAction RecognitionAction Recognition In Videos
PaperPDFCode(official)

Abstract

Generalizing over temporal variations is a prerequisite for effective action recognition in videos. Despite significant advances in deep neural networks, it remains a challenge to focus on short-term discriminative motions in relation to the overall performance of an action. We address this challenge by allowing some flexibility in discovering relevant spatio-temporal features. We introduce Squeeze and Recursion Temporal Gates (SRTG), an approach that favors inputs with similar activations with potential temporal variations. We implement this idea with a novel CNN block that uses an LSTM to encapsulate feature dynamics, in conjunction with a temporal gate that is responsible for evaluating the consistency of the discovered dynamics and the modeled features. We show consistent improvement when using SRTG blocks, with only a minimal increase in the number of GFLOPs. On Kinetics-700, we perform on par with current state-of-the-art models, and outperform these on HACS, Moments in Time, UCF-101 and HMDB-51.

Results

TaskDatasetMetricValueModel
VideoKinetics-700Top-1 Accuracy56.46SRTG r3d-101
VideoKinetics-700Top-5 Accuracy76.82SRTG r3d-101
VideoKinetics-700Top-1 Accuracy54.17SRTG r(2+1)d-50
VideoKinetics-700Top-5 Accuracy74.62SRTG r(2+1)d-50
VideoKinetics-700Top-1 Accuracy53.52SRTG r3d-50
VideoKinetics-700Top-5 Accuracy74.17SRTG r3d-50
VideoKinetics-700Top-1 Accuracy49.43SRTG r(2+1)d-34
VideoKinetics-700Top-5 Accuracy73.23SRTG r(2+1)d-34
VideoKinetics-700Top-1 Accuracy49.15SRTG r3d-34
VideoKinetics-700Top-5 Accuracy72.68SRTG r3d-34
VideoMiTTop 1 Accuracy33.56SRTG r3d-101
VideoMiTTop 5 Accuracy58.49SRTG r3d-101
VideoMiTTop 1 Accuracy31.6SRTG r(2+1)d-50
VideoMiTTop 5 Accuracy56.8SRTG r(2+1)d-50
VideoMiTTop 1 Accuracy30.72SRTG r3d-50
VideoMiTTop 5 Accuracy55.65SRTG r3d-50
VideoMiTTop 1 Accuracy28.97SRTG r(2+1)d-34
VideoMiTTop 5 Accuracy54.18SRTG r(2+1)d-34
VideoMiTTop 1 Accuracy28.55SRTG r3d-34
VideoMiTTop 5 Accuracy52.35SRTG r3d-34
Activity RecognitionHACSTop 1 Accuracy84.33SRTG r(2+1)d-101
Activity RecognitionHACSTop 5 Accuracy96.85SRTG r(2+1)d-101
Activity RecognitionHACSTop 1 Accuracy83.77SRTG r(2+1)d-50
Activity RecognitionHACSTop 5 Accuracy96.56SRTG r(2+1)d-50
Activity RecognitionHACSTop 1 Accuracy81.66SRTG r3d-101
Activity RecognitionHACSTop 5 Accuracy96.33SRTG r3d-101
Activity RecognitionHACSTop 1 Accuracy80.39SRTG r(2+1)d-34
Activity RecognitionHACSTop 5 Accuracy94.27SRTG r(2+1)d-34
Activity RecognitionHACSTop 1 Accuracy80.36SRTG r3d-50
Activity RecognitionHACSTop 5 Accuracy95.55SRTG r3d-50
Activity RecognitionHACSTop 1 Accuracy78.6SRTG r3d-34
Activity RecognitionHACSTop 5 Accuracy93.57SRTG r3d-34
Action RecognitionHACSTop 1 Accuracy84.33SRTG r(2+1)d-101
Action RecognitionHACSTop 5 Accuracy96.85SRTG r(2+1)d-101
Action RecognitionHACSTop 1 Accuracy83.77SRTG r(2+1)d-50
Action RecognitionHACSTop 5 Accuracy96.56SRTG r(2+1)d-50
Action RecognitionHACSTop 1 Accuracy81.66SRTG r3d-101
Action RecognitionHACSTop 5 Accuracy96.33SRTG r3d-101
Action RecognitionHACSTop 1 Accuracy80.39SRTG r(2+1)d-34
Action RecognitionHACSTop 5 Accuracy94.27SRTG r(2+1)d-34
Action RecognitionHACSTop 1 Accuracy80.36SRTG r3d-50
Action RecognitionHACSTop 5 Accuracy95.55SRTG r3d-50
Action RecognitionHACSTop 1 Accuracy78.6SRTG r3d-34
Action RecognitionHACSTop 5 Accuracy93.57SRTG r3d-34

Related Papers

A Real-Time System for Egocentric Hand-Object Interaction Detection in Industrial Domains2025-07-17Zero-shot Skeleton-based Action Recognition with Prototype-guided Feature Alignment2025-07-01ActAlign: Zero-Shot Fine-Grained Video Classification via Language-Guided Sequence Alignment2025-06-28EgoAdapt: Adaptive Multisensory Distillation and Policy Learning for Efficient Egocentric Perception2025-06-26Feature Hallucination for Self-supervised Action Recognition2025-06-25CARMA: Context-Aware Situational Grounding of Human-Robot Group Interactions by Combining Vision-Language Models with Object and Action Recognition2025-06-25Including Semantic Information via Word Embeddings for Skeleton-based Action Recognition2025-06-23Adapting Vision-Language Models for Evaluating World Models2025-06-22