TasksSotADatasetsPapersMethodsSubmitAbout
Papers With Code 2

A community resource for machine learning research: papers, code, benchmarks, and state-of-the-art results.

Explore

Notable BenchmarksAll SotADatasetsPapersMethods

Community

Submit ResultsAbout

Data sourced from the PWC Archive (CC-BY-SA 4.0). Built by the community, for the community.

Papers/A Pursuit of Temporal Accuracy in General Activity Detection

A Pursuit of Temporal Accuracy in General Activity Detection

Yuanjun Xiong, Yue Zhao, Li-Min Wang, Dahua Lin, Xiaoou Tang

2017-03-08Action DetectionActivity DetectionGeneral ClassificationTemporal Action Localization
PaperPDFCode(official)

Abstract

Detecting activities in untrimmed videos is an important but challenging task. The performance of existing methods remains unsatisfactory, e.g., they often meet difficulties in locating the beginning and end of a long complex action. In this paper, we propose a generic framework that can accurately detect a wide variety of activities from untrimmed videos. Our first contribution is a novel proposal scheme that can efficiently generate candidates with accurate temporal boundaries. The other contribution is a cascaded classification pipeline that explicitly distinguishes between relevance and completeness of a candidate instance. On two challenging temporal activity detection datasets, THUMOS14 and ActivityNet, the proposed framework significantly outperforms the existing state-of-the-art methods, demonstrating superior accuracy and strong adaptivity in handling activities with various temporal structures.

Results

TaskDatasetMetricValueModel
VideoActivityNet-1.3mAP32.26SSN
VideoActivityNet-1.3mAP IOU@0.539.12SSN
Temporal Action LocalizationActivityNet-1.3mAP32.26SSN
Temporal Action LocalizationActivityNet-1.3mAP IOU@0.539.12SSN
Zero-Shot LearningActivityNet-1.3mAP32.26SSN
Zero-Shot LearningActivityNet-1.3mAP IOU@0.539.12SSN
Action LocalizationActivityNet-1.3mAP32.26SSN
Action LocalizationActivityNet-1.3mAP IOU@0.539.12SSN

Related Papers

DVFL-Net: A Lightweight Distilled Video Focal Modulation Network for Spatio-Temporal Action Recognition2025-07-16CBF-AFA: Chunk-Based Multi-SSL Fusion for Automatic Fluency Assessment2025-06-25MultiHuman-Testbench: Benchmarking Image Generation for Multiple Humans2025-06-25Including Semantic Information via Word Embeddings for Skeleton-based Action Recognition2025-06-23Distributed Activity Detection for Cell-Free Hybrid Near-Far Field Communications2025-06-17Zero-Shot Temporal Interaction Localization for Egocentric Videos2025-06-04Speaker Diarization with Overlapping Community Detection Using Graph Attention Networks and Label Propagation Algorithm2025-06-03Attention Is Not Always the Answer: Optimizing Voice Activity Detection with Simple Feature Fusion2025-06-02