TasksSotADatasetsPapersMethodsSubmitAbout
Papers With Code 2

A community resource for machine learning research: papers, code, benchmarks, and state-of-the-art results.

Explore

Notable BenchmarksAll SotADatasetsPapersMethods

Community

Submit ResultsAbout

Data sourced from the PWC Archive (CC-BY-SA 4.0). Built by the community, for the community.

Papers/Temporal Relational Reasoning in Videos

Temporal Relational Reasoning in Videos

Bolei Zhou, Alex Andonian, Aude Oliva, Antonio Torralba

2017-11-22ECCV 2018 9Action ClassificationCommon Sense ReasoningHuman-Object Interaction DetectionRelational ReasoningAction RecognitionAction Recognition In VideosActivity Recognition
PaperPDFCodeCodeCodeCodeCode

Abstract

Temporal relational reasoning, the ability to link meaningful transformations of objects or entities over time, is a fundamental property of intelligent species. In this paper, we introduce an effective and interpretable network module, the Temporal Relation Network (TRN), designed to learn and reason about temporal dependencies between video frames at multiple time scales. We evaluate TRN-equipped networks on activity recognition tasks using three recent video datasets - Something-Something, Jester, and Charades - which fundamentally depend on temporal relational reasoning. Our results demonstrate that the proposed TRN gives convolutional neural networks a remarkable capacity to discover temporal relations in videos. Through only sparsely sampled video frames, TRN-equipped networks can accurately predict human-object interactions in the Something-Something dataset and identify various human gestures on the Jester dataset with very competitive performance. TRN-equipped networks also outperform two-stream networks and 3D convolution networks in recognizing daily activities in the Charades dataset. Further analyses show that the models learn intuitive and interpretable visual common sense knowledge in videos.

Results

TaskDatasetMetricValueModel
VideoCharadesMAP25.2MultiScale TRN
Activity RecognitionSomething-Something V1Top 1 Accuracy42.012-Stream TRN
Activity RecognitionSomething-Something V1Top 1 Accuracy34.4M-TRN
Activity RecognitionJester (Gesture Recognition)Val95.31MultiScale TRN
Activity RecognitionSomething-Something V1Top 1 Accuracy42.012-Stream TRN
Activity RecognitionSomething-Something V2Top-1 Accuracy55.522-Stream TRN
Activity RecognitionSomething-Something V2Top-5 Accuracy83.062-Stream TRN
HandJester testTop 1 Accuracy94.78Multiscale TRN
Gesture RecognitionJester testTop 1 Accuracy94.78Multiscale TRN
Action RecognitionSomething-Something V1Top 1 Accuracy42.012-Stream TRN
Action RecognitionSomething-Something V1Top 1 Accuracy34.4M-TRN
Action RecognitionJester (Gesture Recognition)Val95.31MultiScale TRN
Action RecognitionSomething-Something V1Top 1 Accuracy42.012-Stream TRN
Action RecognitionSomething-Something V2Top-1 Accuracy55.522-Stream TRN
Action RecognitionSomething-Something V2Top-5 Accuracy83.062-Stream TRN
Action Recognition In VideosJester (Gesture Recognition)Val95.31MultiScale TRN
Action Recognition In VideosSomething-Something V1Top 1 Accuracy42.012-Stream TRN
Action Recognition In VideosSomething-Something V2Top-1 Accuracy55.522-Stream TRN
Action Recognition In VideosSomething-Something V2Top-5 Accuracy83.062-Stream TRN

Related Papers

Comparing Apples to Oranges: A Dataset & Analysis of LLM Humour Understanding from Traditional Puns to Topical Jokes2025-07-17A Real-Time System for Egocentric Hand-Object Interaction Detection in Industrial Domains2025-07-17ZKP-FedEval: Verifiable and Privacy-Preserving Federated Evaluation using Zero-Knowledge Proofs2025-07-15RoHOI: Robustness Benchmark for Human-Object Interaction Detection2025-07-12Bilateral Collaboration with Large Vision-Language Models for Open Vocabulary Human-Object Interaction Detection2025-07-09LoSiA: Efficient High-Rank Fine-Tuning via Subnet Localization and Optimization2025-07-06Zero-shot Skeleton-based Action Recognition with Prototype-guided Feature Alignment2025-07-01VolumetricSMPL: A Neural Volumetric Body Model for Efficient Interactions, Contacts, and Collisions2025-06-29