TasksSotADatasetsPapersMethodsSubmitAbout
Papers With Code 2

A community resource for machine learning research: papers, code, benchmarks, and state-of-the-art results.

Explore

Notable BenchmarksAll SotADatasetsPapersMethods

Community

Submit ResultsAbout

Data sourced from the PWC Archive (CC-BY-SA 4.0). Built by the community, for the community.

Papers/Proposal Relation Network for Temporal Action Detection

Proposal Relation Network for Temporal Action Detection

Xiang Wang, Zhiwu Qing, Ziyuan Huang, Yutong Feng, Shiwei Zhang, Jianwen Jiang, Mingqian Tang, Changxin Gao, Nong Sang

2021-06-20Action DetectionAction ClassificationTemporal Action Localization
PaperPDFCode

Abstract

This technical report presents our solution for temporal action detection task in AcitivityNet Challenge 2021. The purpose of this task is to locate and identify actions of interest in long untrimmed videos. The crucial challenge of the task comes from that the temporal duration of action varies dramatically, and the target actions are typically embedded in a background of irrelevant activities. Our solution builds on BMN, and mainly contains three steps: 1) action classification and feature encoding by Slowfast, CSN and ViViT; 2) proposal generation. We improve BMN by embedding the proposed Proposal Relation Network (PRN), by which we can generate proposals of high quality; 3) action detection. We calculate the detection results by assigning the proposals with corresponding classification results. Finally, we ensemble the results under different settings and achieve 44.7% on the test set, which improves the champion result in ActivityNet 2020 by 1.9% in terms of average mAP.

Results

TaskDatasetMetricValueModel
VideoActivityNet-1.3mAP42PRN+BMN (ensemble)
VideoActivityNet-1.3mAP IOU@0.559.7PRN+BMN (ensemble)
VideoActivityNet-1.3mAP39.4PRN (CSN)
VideoActivityNet-1.3mAP IOU@0.557.9PRN (CSN)
VideoActivityNet-1.3mAP37.5PRN (ViViT)
VideoActivityNet-1.3mAP IOU@0.555.5PRN (ViViT)
Temporal Action LocalizationActivityNet-1.3mAP42PRN+BMN (ensemble)
Temporal Action LocalizationActivityNet-1.3mAP IOU@0.559.7PRN+BMN (ensemble)
Temporal Action LocalizationActivityNet-1.3mAP39.4PRN (CSN)
Temporal Action LocalizationActivityNet-1.3mAP IOU@0.557.9PRN (CSN)
Temporal Action LocalizationActivityNet-1.3mAP37.5PRN (ViViT)
Temporal Action LocalizationActivityNet-1.3mAP IOU@0.555.5PRN (ViViT)
Zero-Shot LearningActivityNet-1.3mAP42PRN+BMN (ensemble)
Zero-Shot LearningActivityNet-1.3mAP IOU@0.559.7PRN+BMN (ensemble)
Zero-Shot LearningActivityNet-1.3mAP39.4PRN (CSN)
Zero-Shot LearningActivityNet-1.3mAP IOU@0.557.9PRN (CSN)
Zero-Shot LearningActivityNet-1.3mAP37.5PRN (ViViT)
Zero-Shot LearningActivityNet-1.3mAP IOU@0.555.5PRN (ViViT)
Action LocalizationActivityNet-1.3mAP42PRN+BMN (ensemble)
Action LocalizationActivityNet-1.3mAP IOU@0.559.7PRN+BMN (ensemble)
Action LocalizationActivityNet-1.3mAP39.4PRN (CSN)
Action LocalizationActivityNet-1.3mAP IOU@0.557.9PRN (CSN)
Action LocalizationActivityNet-1.3mAP37.5PRN (ViViT)
Action LocalizationActivityNet-1.3mAP IOU@0.555.5PRN (ViViT)

Related Papers

DVFL-Net: A Lightweight Distilled Video Focal Modulation Network for Spatio-Temporal Action Recognition2025-07-16CBF-AFA: Chunk-Based Multi-SSL Fusion for Automatic Fluency Assessment2025-06-25MultiHuman-Testbench: Benchmarking Image Generation for Multiple Humans2025-06-25Including Semantic Information via Word Embeddings for Skeleton-based Action Recognition2025-06-23Distributed Activity Detection for Cell-Free Hybrid Near-Far Field Communications2025-06-17SurgBench: A Unified Large-Scale Benchmark for Surgical Video Analysis2025-06-09From Play to Replay: Composed Video Retrieval for Temporally Fine-Grained Videos2025-06-05Zero-Shot Temporal Interaction Localization for Egocentric Videos2025-06-04