TasksSotADatasetsPapersMethodsSubmitAbout
Papers With Code 2

A community resource for machine learning research: papers, code, benchmarks, and state-of-the-art results.

Explore

Notable BenchmarksAll SotADatasetsPapersMethods

Community

Submit ResultsAbout

Data sourced from the PWC Archive (CC-BY-SA 4.0). Built by the community, for the community.

Papers/CoLA: Weakly-Supervised Temporal Action Localization with ...

CoLA: Weakly-Supervised Temporal Action Localization with Snippet Contrastive Learning

Can Zhang, Meng Cao, Dongming Yang, Jie Chen, Yuexian Zou

2021-03-30CVPR 2021 1Weakly Supervised Action LocalizationAction LocalizationWeakly-supervised Temporal Action LocalizationContrastive LearningTemporal Action Localization
PaperPDFCode(official)

Abstract

Weakly-supervised temporal action localization (WS-TAL) aims to localize actions in untrimmed videos with only video-level labels. Most existing models follow the "localization by classification" procedure: locate temporal regions contributing most to the video-level classification. Generally, they process each snippet (or frame) individually and thus overlook the fruitful temporal context relation. Here arises the single snippet cheating issue: "hard" snippets are too vague to be classified. In this paper, we argue that learning by comparing helps identify these hard snippets and we propose to utilize snippet Contrastive learning to Localize Actions, CoLA for short. Specifically, we propose a Snippet Contrast (SniCo) Loss to refine the hard snippet representation in feature space, which guides the network to perceive precise temporal boundaries and avoid the temporal interval interruption. Besides, since it is infeasible to access frame-level annotations, we introduce a Hard Snippet Mining algorithm to locate the potential hard snippets. Substantial analyses verify that this mining strategy efficaciously captures the hard snippets and SniCo Loss leads to more informative feature representation. Extensive experiments show that CoLA achieves state-of-the-art results on THUMOS'14 and ActivityNet v1.2 datasets. CoLA code is publicly available at https://github.com/zhang-can/CoLA.

Results

TaskDatasetMetricValueModel
VideoTHUMOS 2014mAP@0.1:0.550.3CoLA
VideoTHUMOS 2014mAP@0.1:0.740.9CoLA
VideoTHUMOS 2014mAP@0.532.2CoLA
VideoTHUMOS14avg-mAP (0.1-0.5)50.3CoLA
VideoTHUMOS14avg-mAP (0.1:0.7)40.9CoLA
VideoTHUMOS14avg-mAP (0.3-0.7)32.1CoLA
VideoTHUMOS’14mAP@0.532.2CoLA
VideoActivityNet-1.2Mean mAP26.1CoLA
VideoActivityNet-1.2mAP@0.542.7CoLA
Temporal Action LocalizationTHUMOS 2014mAP@0.1:0.550.3CoLA
Temporal Action LocalizationTHUMOS 2014mAP@0.1:0.740.9CoLA
Temporal Action LocalizationTHUMOS 2014mAP@0.532.2CoLA
Temporal Action LocalizationTHUMOS14avg-mAP (0.1-0.5)50.3CoLA
Temporal Action LocalizationTHUMOS14avg-mAP (0.1:0.7)40.9CoLA
Temporal Action LocalizationTHUMOS14avg-mAP (0.3-0.7)32.1CoLA
Temporal Action LocalizationTHUMOS’14mAP@0.532.2CoLA
Temporal Action LocalizationActivityNet-1.2Mean mAP26.1CoLA
Temporal Action LocalizationActivityNet-1.2mAP@0.542.7CoLA
Zero-Shot LearningTHUMOS 2014mAP@0.1:0.550.3CoLA
Zero-Shot LearningTHUMOS 2014mAP@0.1:0.740.9CoLA
Zero-Shot LearningTHUMOS 2014mAP@0.532.2CoLA
Zero-Shot LearningTHUMOS14avg-mAP (0.1-0.5)50.3CoLA
Zero-Shot LearningTHUMOS14avg-mAP (0.1:0.7)40.9CoLA
Zero-Shot LearningTHUMOS14avg-mAP (0.3-0.7)32.1CoLA
Zero-Shot LearningTHUMOS’14mAP@0.532.2CoLA
Zero-Shot LearningActivityNet-1.2Mean mAP26.1CoLA
Zero-Shot LearningActivityNet-1.2mAP@0.542.7CoLA
Action LocalizationTHUMOS 2014mAP@0.1:0.550.3CoLA
Action LocalizationTHUMOS 2014mAP@0.1:0.740.9CoLA
Action LocalizationTHUMOS 2014mAP@0.532.2CoLA
Action LocalizationTHUMOS14avg-mAP (0.1-0.5)50.3CoLA
Action LocalizationTHUMOS14avg-mAP (0.1:0.7)40.9CoLA
Action LocalizationTHUMOS14avg-mAP (0.3-0.7)32.1CoLA
Action LocalizationTHUMOS’14mAP@0.532.2CoLA
Action LocalizationActivityNet-1.2Mean mAP26.1CoLA
Action LocalizationActivityNet-1.2mAP@0.542.7CoLA
Weakly Supervised Action LocalizationTHUMOS 2014mAP@0.1:0.550.3CoLA
Weakly Supervised Action LocalizationTHUMOS 2014mAP@0.1:0.740.9CoLA
Weakly Supervised Action LocalizationTHUMOS 2014mAP@0.532.2CoLA
Weakly Supervised Action LocalizationTHUMOS14avg-mAP (0.1-0.5)50.3CoLA
Weakly Supervised Action LocalizationTHUMOS14avg-mAP (0.1:0.7)40.9CoLA
Weakly Supervised Action LocalizationTHUMOS14avg-mAP (0.3-0.7)32.1CoLA
Weakly Supervised Action LocalizationTHUMOS’14mAP@0.532.2CoLA
Weakly Supervised Action LocalizationActivityNet-1.2Mean mAP26.1CoLA
Weakly Supervised Action LocalizationActivityNet-1.2mAP@0.542.7CoLA

Related Papers

SemCSE: Semantic Contrastive Sentence Embeddings Using LLM-Generated Summaries For Scientific Abstracts2025-07-17HapticCap: A Multimodal Dataset and Task for Understanding User Experience of Vibration Haptic Signals2025-07-17Overview of the TalentCLEF 2025: Skill and Job Title Intelligence for Human Capital Management2025-07-17SGCL: Unifying Self-Supervised and Supervised Learning for Graph Recommendation2025-07-17Similarity-Guided Diffusion for Contrastive Sequential Recommendation2025-07-16DVFL-Net: A Lightweight Distilled Video Focal Modulation Network for Spatio-Temporal Action Recognition2025-07-16LLM-Driven Dual-Level Multi-Interest Modeling for Recommendation2025-07-15Latent Space Consistency for Sparse-View CT Reconstruction2025-07-15