TasksSotADatasetsPapersMethodsSubmitAbout
Papers With Code 2

A community resource for machine learning research: papers, code, benchmarks, and state-of-the-art results.

Explore

Notable BenchmarksAll SotADatasetsPapersMethods

Community

Submit ResultsAbout

Data sourced from the PWC Archive (CC-BY-SA 4.0). Built by the community, for the community.

Papers/Revisiting Foreground and Background Separation in Weakly-...

Revisiting Foreground and Background Separation in Weakly-supervised Temporal Action Localization: A Clustering-based Approach

Qinying Liu, Zilei Wang, Shenghai Rong, Junjie Li, Yixin Zhang

2023-12-21ICCV 2023 1Weakly Supervised Action LocalizationAction LocalizationWeakly-supervised Temporal Action LocalizationClusteringVideo ClassificationClassificationTemporal Action Localization
PaperPDFCode(official)

Abstract

Weakly-supervised temporal action localization aims to localize action instances in videos with only video-level action labels. Existing methods mainly embrace a localization-by-classification pipeline that optimizes the snippet-level prediction with a video classification loss. However, this formulation suffers from the discrepancy between classification and detection, resulting in inaccurate separation of foreground and background (F\&B) snippets. To alleviate this problem, we propose to explore the underlying structure among the snippets by resorting to unsupervised snippet clustering, rather than heavily relying on the video classification loss. Specifically, we propose a novel clustering-based F\&B separation algorithm. It comprises two core components: a snippet clustering component that groups the snippets into multiple latent clusters and a cluster classification component that further classifies the cluster as foreground or background. As there are no ground-truth labels to train these two components, we introduce a unified self-labeling mechanism based on optimal transport to produce high-quality pseudo-labels that match several plausible prior distributions. This ensures that the cluster assignments of the snippets can be accurately associated with their F\&B labels, thereby boosting the F\&B separation. We evaluate our method on three benchmarks: THUMOS14, ActivityNet v1.2 and v1.3. Our method achieves promising performance on all three benchmarks while being significantly more lightweight than previous methods. Code is available at https://github.com/Qinying-Liu/CASE

Results

TaskDatasetMetricValueModel
VideoTHUMOS 2014mAP@0.1:0.749.2CASE + Zhou et al.
VideoTHUMOS 2014mAP@0.1:0.557.1CASE
VideoTHUMOS 2014mAP@0.1:0.746.2CASE
VideoActivityNet-1.3mAP@0.543.2CASE
VideoActivityNet-1.3mAP@0.5:0.9526.8CASE
VideoActivityNet-1.2Mean mAP27.9CASE
VideoActivityNet-1.2mAP@0.543.8CASE
Temporal Action LocalizationTHUMOS 2014mAP@0.1:0.749.2CASE + Zhou et al.
Temporal Action LocalizationTHUMOS 2014mAP@0.1:0.557.1CASE
Temporal Action LocalizationTHUMOS 2014mAP@0.1:0.746.2CASE
Temporal Action LocalizationActivityNet-1.3mAP@0.543.2CASE
Temporal Action LocalizationActivityNet-1.3mAP@0.5:0.9526.8CASE
Temporal Action LocalizationActivityNet-1.2Mean mAP27.9CASE
Temporal Action LocalizationActivityNet-1.2mAP@0.543.8CASE
Zero-Shot LearningTHUMOS 2014mAP@0.1:0.749.2CASE + Zhou et al.
Zero-Shot LearningTHUMOS 2014mAP@0.1:0.557.1CASE
Zero-Shot LearningTHUMOS 2014mAP@0.1:0.746.2CASE
Zero-Shot LearningActivityNet-1.3mAP@0.543.2CASE
Zero-Shot LearningActivityNet-1.3mAP@0.5:0.9526.8CASE
Zero-Shot LearningActivityNet-1.2Mean mAP27.9CASE
Zero-Shot LearningActivityNet-1.2mAP@0.543.8CASE
Action LocalizationTHUMOS 2014mAP@0.1:0.749.2CASE + Zhou et al.
Action LocalizationTHUMOS 2014mAP@0.1:0.557.1CASE
Action LocalizationTHUMOS 2014mAP@0.1:0.746.2CASE
Action LocalizationActivityNet-1.3mAP@0.543.2CASE
Action LocalizationActivityNet-1.3mAP@0.5:0.9526.8CASE
Action LocalizationActivityNet-1.2Mean mAP27.9CASE
Action LocalizationActivityNet-1.2mAP@0.543.8CASE
Weakly Supervised Action LocalizationTHUMOS 2014mAP@0.1:0.749.2CASE + Zhou et al.
Weakly Supervised Action LocalizationTHUMOS 2014mAP@0.1:0.557.1CASE
Weakly Supervised Action LocalizationTHUMOS 2014mAP@0.1:0.746.2CASE
Weakly Supervised Action LocalizationActivityNet-1.3mAP@0.543.2CASE
Weakly Supervised Action LocalizationActivityNet-1.3mAP@0.5:0.9526.8CASE
Weakly Supervised Action LocalizationActivityNet-1.2Mean mAP27.9CASE
Weakly Supervised Action LocalizationActivityNet-1.2mAP@0.543.8CASE

Related Papers

Tri-Learn Graph Fusion Network for Attributed Graph Clustering2025-07-18Adversarial attacks to image classification systems using evolutionary algorithms2025-07-17Ranking Vectors Clustering: Theory and Applications2025-07-16Efficient Calisthenics Skills Classification through Foreground Instance Selection and Depth Estimation2025-07-16Safeguarding Federated Learning-based Road Condition Classification2025-07-16DVFL-Net: A Lightweight Distilled Video Focal Modulation Network for Spatio-Temporal Action Recognition2025-07-16AI-Enhanced Pediatric Pneumonia Detection: A CNN-Based Approach Using Data Augmentation and Generative Adversarial Networks (GANs)2025-07-13Car Object Counting and Position Estimation via Extension of the CLIP-EBC Framework2025-07-11