TasksSotADatasetsPapersMethodsSubmitAbout
Papers With Code 2

A community resource for machine learning research: papers, code, benchmarks, and state-of-the-art results.

Explore

Notable BenchmarksAll SotADatasetsPapersMethods

Community

Submit ResultsAbout

Data sourced from the PWC Archive (CC-BY-SA 4.0). Built by the community, for the community.

Papers/Graph Convolutional Networks for Temporal Action Localizat...

Graph Convolutional Networks for Temporal Action Localization

Runhao Zeng, Wenbing Huang, Mingkui Tan, Yu Rong, Peilin Zhao, Junzhou Huang, Chuang Gan

2019-09-07ICCV 2019 10Action ClassificationAction LocalizationTemporal Action Localization
PaperPDFCode(official)

Abstract

Most state-of-the-art action localization systems process each action proposal individually, without explicitly exploiting their relations during learning. However, the relations between proposals actually play an important role in action localization, since a meaningful action always consists of multiple proposals in a video. In this paper, we propose to exploit the proposal-proposal relations using Graph Convolutional Networks (GCNs). First, we construct an action proposal graph, where each proposal is represented as a node and their relations between two proposals as an edge. Here, we use two types of relations, one for capturing the context information for each proposal and the other one for characterizing the correlations between distinct actions. Then we apply the GCNs over the graph to model the relations among different proposals and learn powerful representations for the action classification and localization. Experimental results show that our approach significantly outperforms the state-of-the-art on THUMOS14 (49.1% versus 42.8%). Moreover, augmentation experiments on ActivityNet also verify the efficacy of modeling action proposal relationships. Codes are available at https://github.com/Alvin-Zeng/PGCN.

Results

TaskDatasetMetricValueModel
VideoActivityNet-1.3mAP31.11P-GCN
VideoActivityNet-1.3mAP IOU@0.548.26P-GCN
VideoActivityNet-1.3mAP IOU@0.7533.16P-GCN
VideoActivityNet-1.3mAP IOU@0.953.27P-GCN
VideoTHUMOS’14mAP IOU@0.169.5P-GCN
VideoTHUMOS’14mAP IOU@0.267.8P-GCN
VideoTHUMOS’14mAP IOU@0.363.6P-GCN
VideoTHUMOS’14mAP IOU@0.457.8P-GCN
VideoTHUMOS’14mAP IOU@0.549.1P-GCN
Temporal Action LocalizationActivityNet-1.3mAP31.11P-GCN
Temporal Action LocalizationActivityNet-1.3mAP IOU@0.548.26P-GCN
Temporal Action LocalizationActivityNet-1.3mAP IOU@0.7533.16P-GCN
Temporal Action LocalizationActivityNet-1.3mAP IOU@0.953.27P-GCN
Temporal Action LocalizationTHUMOS’14mAP IOU@0.169.5P-GCN
Temporal Action LocalizationTHUMOS’14mAP IOU@0.267.8P-GCN
Temporal Action LocalizationTHUMOS’14mAP IOU@0.363.6P-GCN
Temporal Action LocalizationTHUMOS’14mAP IOU@0.457.8P-GCN
Temporal Action LocalizationTHUMOS’14mAP IOU@0.549.1P-GCN
Zero-Shot LearningActivityNet-1.3mAP31.11P-GCN
Zero-Shot LearningActivityNet-1.3mAP IOU@0.548.26P-GCN
Zero-Shot LearningActivityNet-1.3mAP IOU@0.7533.16P-GCN
Zero-Shot LearningActivityNet-1.3mAP IOU@0.953.27P-GCN
Zero-Shot LearningTHUMOS’14mAP IOU@0.169.5P-GCN
Zero-Shot LearningTHUMOS’14mAP IOU@0.267.8P-GCN
Zero-Shot LearningTHUMOS’14mAP IOU@0.363.6P-GCN
Zero-Shot LearningTHUMOS’14mAP IOU@0.457.8P-GCN
Zero-Shot LearningTHUMOS’14mAP IOU@0.549.1P-GCN
Action LocalizationActivityNet-1.3mAP31.11P-GCN
Action LocalizationActivityNet-1.3mAP IOU@0.548.26P-GCN
Action LocalizationActivityNet-1.3mAP IOU@0.7533.16P-GCN
Action LocalizationActivityNet-1.3mAP IOU@0.953.27P-GCN
Action LocalizationTHUMOS’14mAP IOU@0.169.5P-GCN
Action LocalizationTHUMOS’14mAP IOU@0.267.8P-GCN
Action LocalizationTHUMOS’14mAP IOU@0.363.6P-GCN
Action LocalizationTHUMOS’14mAP IOU@0.457.8P-GCN
Action LocalizationTHUMOS’14mAP IOU@0.549.1P-GCN

Related Papers

DVFL-Net: A Lightweight Distilled Video Focal Modulation Network for Spatio-Temporal Action Recognition2025-07-16Including Semantic Information via Word Embeddings for Skeleton-based Action Recognition2025-06-23SurgBench: A Unified Large-Scale Benchmark for Surgical Video Analysis2025-06-09From Play to Replay: Composed Video Retrieval for Temporally Fine-Grained Videos2025-06-05Zero-Shot Temporal Interaction Localization for Egocentric Videos2025-06-04A Review on Coarse to Fine-Grained Animal Action Recognition2025-06-01LLM-powered Query Expansion for Enhancing Boundary Prediction in Language-driven Action Localization2025-05-30Spatio-Temporal Joint Density Driven Learning for Skeleton-Based Action Recognition2025-05-29