TasksSotADatasetsPapersMethodsSubmitAbout
Papers With Code 2

A community resource for machine learning research: papers, code, benchmarks, and state-of-the-art results.

Explore

Notable BenchmarksAll SotADatasetsPapersMethods

Community

Submit ResultsAbout

Data sourced from the PWC Archive (CC-BY-SA 4.0). Built by the community, for the community.

Papers/3C-Net: Category Count and Center Loss for Weakly-Supervis...

3C-Net: Category Count and Center Loss for Weakly-Supervised Action Localization

Sanath Narayan, Hisham Cholakkal, Fahad Shahbaz Khan, Ling Shao

2019-08-22ICCV 2019 10Weakly Supervised Action LocalizationAction ClassificationAction LocalizationWeakly-supervised Temporal Action LocalizationTemporal Action Localization
PaperPDFCode(official)

Abstract

Temporal action localization is a challenging computer vision problem with numerous real-world applications. Most existing methods require laborious frame-level supervision to train action localization models. In this work, we propose a framework, called 3C-Net, which only requires video-level supervision (weak supervision) in the form of action category labels and the corresponding count. We introduce a novel formulation to learn discriminative action features with enhanced localization capabilities. Our joint formulation has three terms: a classification term to ensure the separability of learned action features, an adapted multi-label center loss term to enhance the action feature discriminability and a counting loss term to delineate adjacent action sequences, leading to improved localization. Comprehensive experiments are performed on two challenging benchmarks: THUMOS14 and ActivityNet 1.2. Our approach sets a new state-of-the-art for weakly-supervised temporal action localization on both datasets. On the THUMOS14 dataset, the proposed method achieves an absolute gain of 4.6% in terms of mean average precision (mAP), compared to the state-of-the-art. Source code is available at https://github.com/naraysa/3c-net.

Results

TaskDatasetMetricValueModel
VideoTHUMOS 2014mAP@0.526.63C-Net
VideoTHUMOS’14mAP@0.526.63C-Net
VideoActivityNet-1.2Mean mAP21.73C-Net
VideoActivityNet-1.2mAP@0.537.23C-Net
VideoTHUMOS’14mAP86.93C-Net
VideoActivityNet-1.2mAP92.43C-Net
VideoTHUMOS'14mAP86.93C-Net
Temporal Action LocalizationTHUMOS 2014mAP@0.526.63C-Net
Temporal Action LocalizationTHUMOS’14mAP@0.526.63C-Net
Temporal Action LocalizationActivityNet-1.2Mean mAP21.73C-Net
Temporal Action LocalizationActivityNet-1.2mAP@0.537.23C-Net
Zero-Shot LearningTHUMOS 2014mAP@0.526.63C-Net
Zero-Shot LearningTHUMOS’14mAP@0.526.63C-Net
Zero-Shot LearningActivityNet-1.2Mean mAP21.73C-Net
Zero-Shot LearningActivityNet-1.2mAP@0.537.23C-Net
Action LocalizationTHUMOS 2014mAP@0.526.63C-Net
Action LocalizationTHUMOS’14mAP@0.526.63C-Net
Action LocalizationActivityNet-1.2Mean mAP21.73C-Net
Action LocalizationActivityNet-1.2mAP@0.537.23C-Net
Weakly Supervised Action LocalizationTHUMOS 2014mAP@0.526.63C-Net
Weakly Supervised Action LocalizationTHUMOS’14mAP@0.526.63C-Net
Weakly Supervised Action LocalizationActivityNet-1.2Mean mAP21.73C-Net
Weakly Supervised Action LocalizationActivityNet-1.2mAP@0.537.23C-Net

Related Papers

DVFL-Net: A Lightweight Distilled Video Focal Modulation Network for Spatio-Temporal Action Recognition2025-07-16Including Semantic Information via Word Embeddings for Skeleton-based Action Recognition2025-06-23SurgBench: A Unified Large-Scale Benchmark for Surgical Video Analysis2025-06-09From Play to Replay: Composed Video Retrieval for Temporally Fine-Grained Videos2025-06-05Zero-Shot Temporal Interaction Localization for Egocentric Videos2025-06-04A Review on Coarse to Fine-Grained Animal Action Recognition2025-06-01LLM-powered Query Expansion for Enhancing Boundary Prediction in Language-driven Action Localization2025-05-30Spatio-Temporal Joint Density Driven Learning for Skeleton-Based Action Recognition2025-05-29