TasksSotADatasetsPapersMethodsSubmitAbout
Papers With Code 2

A community resource for machine learning research: papers, code, benchmarks, and state-of-the-art results.

Explore

Notable BenchmarksAll SotADatasetsPapersMethods

Community

Submit ResultsAbout

Data sourced from the PWC Archive (CC-BY-SA 4.0). Built by the community, for the community.

Papers/Weakly Supervised Temporal Action Localization Using Deep ...

Weakly Supervised Temporal Action Localization Using Deep Metric Learning

Ashraful Islam, Richard J. Radke

2020-01-21Action LocalizationMetric LearningWeakly-supervised Temporal Action LocalizationTemporal LocalizationVideo UnderstandingTemporal Action Localization
PaperPDFCode(official)

Abstract

Temporal action localization is an important step towards video understanding. Most current action localization methods depend on untrimmed videos with full temporal annotations of action instances. However, it is expensive and time-consuming to annotate both action labels and temporal boundaries of videos. To this end, we propose a weakly supervised temporal action localization method that only requires video-level action instances as supervision during training. We propose a classification module to generate action labels for each segment in the video, and a deep metric learning module to learn the similarity between different action instances. We jointly optimize a balanced binary cross-entropy loss and a metric loss using a standard backpropagation algorithm. Extensive experiments demonstrate the effectiveness of both of these components in temporal localization. We evaluate our algorithm on two challenging untrimmed video datasets: THUMOS14 and ActivityNet1.2. Our approach improves the current state-of-the-art result for THUMOS14 by 6.5% mAP at IoU threshold 0.5, and achieves competitive performance for ActivityNet1.2.

Results

TaskDatasetMetricValueModel
VideoTHUMOS’14mAP IOU@0.162.3DeepMetricLearner
VideoTHUMOS’14mAP IOU@0.346.8DeepMetricLearner
VideoTHUMOS’14mAP IOU@0.529.6DeepMetricLearner
VideoTHUMOS’14mAP IOU@0.79.7DeepMetricLearner
VideoActivityNet-1.2mAP IOU@0.160.5DeepMetricLearner
VideoActivityNet-1.2mAP IOU@0.348.4DeepMetricLearner
VideoActivityNet-1.2mAP IOU@0.535.2DeepMetricLearner
VideoActivityNet-1.2mAP IOU@0.716.3DeepMetricLearner
Temporal Action LocalizationTHUMOS’14mAP IOU@0.162.3DeepMetricLearner
Temporal Action LocalizationTHUMOS’14mAP IOU@0.346.8DeepMetricLearner
Temporal Action LocalizationTHUMOS’14mAP IOU@0.529.6DeepMetricLearner
Temporal Action LocalizationTHUMOS’14mAP IOU@0.79.7DeepMetricLearner
Temporal Action LocalizationActivityNet-1.2mAP IOU@0.160.5DeepMetricLearner
Temporal Action LocalizationActivityNet-1.2mAP IOU@0.348.4DeepMetricLearner
Temporal Action LocalizationActivityNet-1.2mAP IOU@0.535.2DeepMetricLearner
Temporal Action LocalizationActivityNet-1.2mAP IOU@0.716.3DeepMetricLearner
Zero-Shot LearningTHUMOS’14mAP IOU@0.162.3DeepMetricLearner
Zero-Shot LearningTHUMOS’14mAP IOU@0.346.8DeepMetricLearner
Zero-Shot LearningTHUMOS’14mAP IOU@0.529.6DeepMetricLearner
Zero-Shot LearningTHUMOS’14mAP IOU@0.79.7DeepMetricLearner
Zero-Shot LearningActivityNet-1.2mAP IOU@0.160.5DeepMetricLearner
Zero-Shot LearningActivityNet-1.2mAP IOU@0.348.4DeepMetricLearner
Zero-Shot LearningActivityNet-1.2mAP IOU@0.535.2DeepMetricLearner
Zero-Shot LearningActivityNet-1.2mAP IOU@0.716.3DeepMetricLearner
Action LocalizationTHUMOS’14mAP IOU@0.162.3DeepMetricLearner
Action LocalizationTHUMOS’14mAP IOU@0.346.8DeepMetricLearner
Action LocalizationTHUMOS’14mAP IOU@0.529.6DeepMetricLearner
Action LocalizationTHUMOS’14mAP IOU@0.79.7DeepMetricLearner
Action LocalizationActivityNet-1.2mAP IOU@0.160.5DeepMetricLearner
Action LocalizationActivityNet-1.2mAP IOU@0.348.4DeepMetricLearner
Action LocalizationActivityNet-1.2mAP IOU@0.535.2DeepMetricLearner
Action LocalizationActivityNet-1.2mAP IOU@0.716.3DeepMetricLearner

Related Papers

Unsupervised Ground Metric Learning2025-07-17VideoITG: Multimodal Video Understanding with Instructed Temporal Grounding2025-07-17Are encoders able to learn landmarkers for warm-starting of Hyperparameter Optimization?2025-07-16DVFL-Net: A Lightweight Distilled Video Focal Modulation Network for Spatio-Temporal Action Recognition2025-07-16UGC-VideoCaptioner: An Omni UGC Video Detail Caption Model and New Benchmarks2025-07-15EmbRACE-3K: Embodied Reasoning and Action in Complex Environments2025-07-14Chat with AI: The Surprising Turn of Real-time Video Communication from Human to AI2025-07-14$\texttt{Droid}$: A Resource Suite for AI-Generated Code Detection2025-07-11