TasksSotADatasetsPapersMethodsSubmitAbout
Papers With Code 2

A community resource for machine learning research: papers, code, benchmarks, and state-of-the-art results.

Explore

Notable BenchmarksAll SotADatasetsPapersMethods

Community

Submit ResultsAbout

Data sourced from the PWC Archive (CC-BY-SA 4.0). Built by the community, for the community.

Papers/Temporal Reasoning Graph for Activity Recognition

Temporal Reasoning Graph for Activity Recognition

Jingran Zhang, Fumin Shen, Xing Xu, Heng Tao Shen

2019-08-27Relation ExtractionAction RecognitionTemporal Relation ExtractionActivity Recognition
PaperPDF

Abstract

Despite great success has been achieved in activity analysis, it still has many challenges. Most existing work in activity recognition pay more attention to design efficient architecture or video sampling strategy. However, due to the property of fine-grained action and long term structure in video, activity recognition is expected to reason temporal relation between video sequences. In this paper, we propose an efficient temporal reasoning graph (TRG) to simultaneously capture the appearance features and temporal relation between video sequences at multiple time scales. Specifically, we construct learnable temporal relation graphs to explore temporal relation on the multi-scale range. Additionally, to facilitate multi-scale temporal relation extraction, we design a multi-head temporal adjacent matrix to represent multi-kinds of temporal relations. Eventually, a multi-head temporal relation aggregator is proposed to extract the semantic meaning of those features convolving through the graphs. Extensive experiments are performed on widely-used large-scale datasets, such as Something-Something and Charades, and the results show that our model can achieve state-of-the-art performance. Further analysis shows that temporal relation reasoning with our TRG can extract discriminative features for activity recognition.

Results

TaskDatasetMetricValueModel
Activity RecognitionSomething-Something V1Top 1 Accuracy49.7TRG (Inception-V3)
Activity RecognitionSomething-Something V1Top 1 Accuracy49.5TRG (ResNet-50)
Activity RecognitionSomething-Something V1Top 5 Accuracy86.1TRG (ResNet-50)
Activity RecognitionSomething-Something V2Top-1 Accuracy62.2TRG (ResNet-50)
Activity RecognitionSomething-Something V2Top-5 Accuracy90.3TRG (ResNet-50)
Activity RecognitionSomething-Something V2Top-1 Accuracy61.3TRG (Inception-V3)
Activity RecognitionSomething-Something V2Top-5 Accuracy91.4TRG (Inception-V3)
Action RecognitionSomething-Something V1Top 1 Accuracy49.7TRG (Inception-V3)
Action RecognitionSomething-Something V1Top 1 Accuracy49.5TRG (ResNet-50)
Action RecognitionSomething-Something V1Top 5 Accuracy86.1TRG (ResNet-50)
Action RecognitionSomething-Something V2Top-1 Accuracy62.2TRG (ResNet-50)
Action RecognitionSomething-Something V2Top-5 Accuracy90.3TRG (ResNet-50)
Action RecognitionSomething-Something V2Top-1 Accuracy61.3TRG (Inception-V3)
Action RecognitionSomething-Something V2Top-5 Accuracy91.4TRG (Inception-V3)

Related Papers

A Real-Time System for Egocentric Hand-Object Interaction Detection in Industrial Domains2025-07-17ZKP-FedEval: Verifiable and Privacy-Preserving Federated Evaluation using Zero-Knowledge Proofs2025-07-15DocIE@XLLM25: In-Context Learning for Information Extraction using Fully Synthetic Demonstrations2025-07-08Zero-shot Skeleton-based Action Recognition with Prototype-guided Feature Alignment2025-07-01EgoAdapt: Adaptive Multisensory Distillation and Policy Learning for Efficient Egocentric Perception2025-06-26Multiple Streams of Relation Extraction: Enriching and Recalling in Transformers2025-06-25Feature Hallucination for Self-supervised Action Recognition2025-06-25CARMA: Context-Aware Situational Grounding of Human-Robot Group Interactions by Combining Vision-Language Models with Object and Action Recognition2025-06-25