TasksSotADatasetsPapersMethodsSubmitAbout
Papers With Code 2

A community resource for machine learning research: papers, code, benchmarks, and state-of-the-art results.

Explore

Notable BenchmarksAll SotADatasetsPapersMethods

Community

Submit ResultsAbout

Data sourced from the PWC Archive (CC-BY-SA 4.0). Built by the community, for the community.

Papers/D2-Net: Weakly-Supervised Action Localization via Discrimi...

D2-Net: Weakly-Supervised Action Localization via Discriminative Embeddings and Denoised Activations

Sanath Narayan, Hisham Cholakkal, Munawar Hayat, Fahad Shahbaz Khan, Ming-Hsuan Yang, Ling Shao

2020-12-11ICCV 2021 10DenoisingWeakly Supervised Action LocalizationAction LocalizationWeakly-supervised Temporal Action LocalizationTemporal Action Localization
PaperPDFCode(official)

Abstract

This work proposes a weakly-supervised temporal action localization framework, called D2-Net, which strives to temporally localize actions using video-level supervision. Our main contribution is the introduction of a novel loss formulation, which jointly enhances the discriminability of latent embeddings and robustness of the output temporal class activations with respect to foreground-background noise caused by weak supervision. The proposed formulation comprises a discriminative and a denoising loss term for enhancing temporal action localization. The discriminative term incorporates a classification loss and utilizes a top-down attention mechanism to enhance the separability of latent foreground-background embeddings. The denoising loss term explicitly addresses the foreground-background noise in class activations by simultaneously maximizing intra-video and inter-video mutual information using a bottom-up attention mechanism. As a result, activations in the foreground regions are emphasized whereas those in the background regions are suppressed, thereby leading to more robust predictions. Comprehensive experiments are performed on multiple benchmarks, including THUMOS14 and ActivityNet1.2. Our D2-Net performs favorably in comparison to the existing methods on all datasets, achieving gains as high as 2.3% in terms of mAP at IoU=0.5 on THUMOS14. Source code is available at https://github.com/naraysa/D2-Net

Results

TaskDatasetMetricValueModel
VideoTHUMOS 2014mAP@0.1:0.551.4D2-Net
VideoTHUMOS 2014mAP@0.535.9D2-Net
VideoFineActionmAP3.35D2-Net
VideoFineActionmAP IOU@0.56.75D2-Net
VideoFineActionmAP IOU@0.753.02D2-Net
VideoFineActionmAP IOU@0.950.82D2-Net
VideoTHUMOS’14mAP@0.535.9D2-Net
VideoActivityNet-1.2Mean mAP26D2-Net
VideoActivityNet-1.2mAP@0.542.3D2-Net
Temporal Action LocalizationTHUMOS 2014mAP@0.1:0.551.4D2-Net
Temporal Action LocalizationTHUMOS 2014mAP@0.535.9D2-Net
Temporal Action LocalizationFineActionmAP3.35D2-Net
Temporal Action LocalizationFineActionmAP IOU@0.56.75D2-Net
Temporal Action LocalizationFineActionmAP IOU@0.753.02D2-Net
Temporal Action LocalizationFineActionmAP IOU@0.950.82D2-Net
Temporal Action LocalizationTHUMOS’14mAP@0.535.9D2-Net
Temporal Action LocalizationActivityNet-1.2Mean mAP26D2-Net
Temporal Action LocalizationActivityNet-1.2mAP@0.542.3D2-Net
Zero-Shot LearningTHUMOS 2014mAP@0.1:0.551.4D2-Net
Zero-Shot LearningTHUMOS 2014mAP@0.535.9D2-Net
Zero-Shot LearningFineActionmAP3.35D2-Net
Zero-Shot LearningFineActionmAP IOU@0.56.75D2-Net
Zero-Shot LearningFineActionmAP IOU@0.753.02D2-Net
Zero-Shot LearningFineActionmAP IOU@0.950.82D2-Net
Zero-Shot LearningTHUMOS’14mAP@0.535.9D2-Net
Zero-Shot LearningActivityNet-1.2Mean mAP26D2-Net
Zero-Shot LearningActivityNet-1.2mAP@0.542.3D2-Net
Action LocalizationTHUMOS 2014mAP@0.1:0.551.4D2-Net
Action LocalizationTHUMOS 2014mAP@0.535.9D2-Net
Action LocalizationFineActionmAP3.35D2-Net
Action LocalizationFineActionmAP IOU@0.56.75D2-Net
Action LocalizationFineActionmAP IOU@0.753.02D2-Net
Action LocalizationFineActionmAP IOU@0.950.82D2-Net
Action LocalizationTHUMOS’14mAP@0.535.9D2-Net
Action LocalizationActivityNet-1.2Mean mAP26D2-Net
Action LocalizationActivityNet-1.2mAP@0.542.3D2-Net
Weakly Supervised Action LocalizationTHUMOS 2014mAP@0.1:0.551.4D2-Net
Weakly Supervised Action LocalizationTHUMOS 2014mAP@0.535.9D2-Net
Weakly Supervised Action LocalizationFineActionmAP3.35D2-Net
Weakly Supervised Action LocalizationFineActionmAP IOU@0.56.75D2-Net
Weakly Supervised Action LocalizationFineActionmAP IOU@0.753.02D2-Net
Weakly Supervised Action LocalizationFineActionmAP IOU@0.950.82D2-Net
Weakly Supervised Action LocalizationTHUMOS’14mAP@0.535.9D2-Net
Weakly Supervised Action LocalizationActivityNet-1.2Mean mAP26D2-Net
Weakly Supervised Action LocalizationActivityNet-1.2mAP@0.542.3D2-Net

Related Papers

fastWDM3D: Fast and Accurate 3D Healthy Tissue Inpainting2025-07-17Diffuman4D: 4D Consistent Human View Synthesis from Sparse-View Videos with Spatio-Temporal Diffusion Models2025-07-17Similarity-Guided Diffusion for Contrastive Sequential Recommendation2025-07-16DVFL-Net: A Lightweight Distilled Video Focal Modulation Network for Spatio-Temporal Action Recognition2025-07-16HUG-VAS: A Hierarchical NURBS-Based Generative Model for Aortic Geometry Synthesis and Controllable Editing2025-07-15AirLLM: Diffusion Policy-based Adaptive LoRA for Remote Fine-Tuning of LLM over the Air2025-07-15A statistical physics framework for optimal learning2025-07-10LangMamba: A Language-driven Mamba Framework for Low-dose CT Denoising with Vision-language Models2025-07-08