TasksSotADatasetsPapersMethodsSubmitAbout
Papers With Code 2

A community resource for machine learning research: papers, code, benchmarks, and state-of-the-art results.

Explore

Notable BenchmarksAll SotADatasetsPapersMethods

Community

Submit ResultsAbout

Data sourced from the PWC Archive (CC-BY-SA 4.0). Built by the community, for the community.

Papers/Graph Convolutional Module for Temporal Action Localizatio...

Graph Convolutional Module for Temporal Action Localization in Videos

Runhao Zeng, Wenbing Huang, Mingkui Tan, Yu Rong, Peilin Zhao, Junzhou Huang, Chuang Gan

2021-12-01Action LocalizationAction RecognitionTemporal Action Localization
PaperPDF

Abstract

Temporal action localization has long been researched in computer vision. Existing state-of-the-art action localization methods divide each video into multiple action units (i.e., proposals in two-stage methods and segments in one-stage methods) and then perform action recognition/regression on each of them individually, without explicitly exploiting their relations during learning. In this paper, we claim that the relations between action units play an important role in action localization, and a more powerful action detector should not only capture the local content of each action unit but also allow a wider field of view on the context related to it. To this end, we propose a general graph convolutional module (GCM) that can be easily plugged into existing action localization methods, including two-stage and one-stage paradigms. To be specific, we first construct a graph, where each action unit is represented as a node and their relations between two action units as an edge. Here, we use two types of relations, one for capturing the temporal connections between different action units, and the other one for characterizing their semantic relationship. Particularly for the temporal connections in two-stage methods, we further explore two different kinds of edges, one connecting the overlapping action units and the other one connecting surrounding but disjointed units. Upon the graph we built, we then apply graph convolutional networks (GCNs) to model the relations among different action units, which is able to learn more informative representations to enhance action localization. Experimental results show that our GCM consistently improves the performance of existing action localization methods, including two-stage methods (e.g., CBR and R-C3D) and one-stage methods (e.g., D-SSAD), verifying the generality and effectiveness of our GCM.

Results

TaskDatasetMetricValueModel
VideoActivityNet-1.3mAP34.24GCM
VideoActivityNet-1.3mAP IOU@0.551.03GCM
VideoActivityNet-1.3mAP IOU@0.7535.17GCM
VideoActivityNet-1.3mAP IOU@0.957.44GCM
VideoTHUMOS’14mAP IOU@0.172.5GCM
VideoTHUMOS’14mAP IOU@0.270.9GCM
VideoTHUMOS’14mAP IOU@0.366.5GCM
VideoTHUMOS’14mAP IOU@0.460.8GCM
VideoTHUMOS’14mAP IOU@0.551.9GCM
Temporal Action LocalizationActivityNet-1.3mAP34.24GCM
Temporal Action LocalizationActivityNet-1.3mAP IOU@0.551.03GCM
Temporal Action LocalizationActivityNet-1.3mAP IOU@0.7535.17GCM
Temporal Action LocalizationActivityNet-1.3mAP IOU@0.957.44GCM
Temporal Action LocalizationTHUMOS’14mAP IOU@0.172.5GCM
Temporal Action LocalizationTHUMOS’14mAP IOU@0.270.9GCM
Temporal Action LocalizationTHUMOS’14mAP IOU@0.366.5GCM
Temporal Action LocalizationTHUMOS’14mAP IOU@0.460.8GCM
Temporal Action LocalizationTHUMOS’14mAP IOU@0.551.9GCM
Zero-Shot LearningActivityNet-1.3mAP34.24GCM
Zero-Shot LearningActivityNet-1.3mAP IOU@0.551.03GCM
Zero-Shot LearningActivityNet-1.3mAP IOU@0.7535.17GCM
Zero-Shot LearningActivityNet-1.3mAP IOU@0.957.44GCM
Zero-Shot LearningTHUMOS’14mAP IOU@0.172.5GCM
Zero-Shot LearningTHUMOS’14mAP IOU@0.270.9GCM
Zero-Shot LearningTHUMOS’14mAP IOU@0.366.5GCM
Zero-Shot LearningTHUMOS’14mAP IOU@0.460.8GCM
Zero-Shot LearningTHUMOS’14mAP IOU@0.551.9GCM
Action LocalizationActivityNet-1.3mAP34.24GCM
Action LocalizationActivityNet-1.3mAP IOU@0.551.03GCM
Action LocalizationActivityNet-1.3mAP IOU@0.7535.17GCM
Action LocalizationActivityNet-1.3mAP IOU@0.957.44GCM
Action LocalizationTHUMOS’14mAP IOU@0.172.5GCM
Action LocalizationTHUMOS’14mAP IOU@0.270.9GCM
Action LocalizationTHUMOS’14mAP IOU@0.366.5GCM
Action LocalizationTHUMOS’14mAP IOU@0.460.8GCM
Action LocalizationTHUMOS’14mAP IOU@0.551.9GCM

Related Papers

A Real-Time System for Egocentric Hand-Object Interaction Detection in Industrial Domains2025-07-17DVFL-Net: A Lightweight Distilled Video Focal Modulation Network for Spatio-Temporal Action Recognition2025-07-16Zero-shot Skeleton-based Action Recognition with Prototype-guided Feature Alignment2025-07-01EgoAdapt: Adaptive Multisensory Distillation and Policy Learning for Efficient Egocentric Perception2025-06-26Feature Hallucination for Self-supervised Action Recognition2025-06-25CARMA: Context-Aware Situational Grounding of Human-Robot Group Interactions by Combining Vision-Language Models with Object and Action Recognition2025-06-25Including Semantic Information via Word Embeddings for Skeleton-based Action Recognition2025-06-23Adapting Vision-Language Models for Evaluating World Models2025-06-22