TasksSotADatasetsPapersMethodsSubmitAbout
Papers With Code 2

A community resource for machine learning research: papers, code, benchmarks, and state-of-the-art results.

Explore

Notable BenchmarksAll SotADatasetsPapersMethods

Community

Submit ResultsAbout

Data sourced from the PWC Archive (CC-BY-SA 4.0). Built by the community, for the community.

Papers/Marginalized Average Attentional Network for Weakly-Superv...

Marginalized Average Attentional Network for Weakly-Supervised Learning

Yuan Yuan, Yueming Lyu, Xi Shen, Ivor W. Tsang, Dit-yan Yeung

2019-05-21ICLR 2019 5Weakly Supervised Action LocalizationAction LocalizationWeakly-supervised Temporal Action LocalizationTemporal Action Localization
PaperPDF

Abstract

In weakly-supervised temporal action localization, previous works have failed to locate dense and integral regions for each entire action due to the overestimation of the most salient regions. To alleviate this issue, we propose a marginalized average attentional network (MAAN) to suppress the dominant response of the most salient regions in a principled manner. The MAAN employs a novel marginalized average aggregation (MAA) module and learns a set of latent discriminative probabilities in an end-to-end fashion. MAA samples multiple subsets from the video snippet features according to a set of latent discriminative probabilities and takes the expectation over all the averaged subset features. Theoretically, we prove that the MAA module with learned latent discriminative probabilities successfully reduces the difference in responses between the most salient regions and the others. Therefore, MAAN is able to generate better class activation sequences and identify dense and integral action regions in the videos. Moreover, we propose a fast algorithm to reduce the complexity of constructing MAA from O($2^T$) to O($T^2$). Extensive experiments on two large-scale video datasets show that our MAAN achieves superior performance on weakly-supervised temporal action localization

Results

TaskDatasetMetricValueModel
VideoTHUMOS 2014mAP@0.1:0.731.6MAAN
VideoTHUMOS 2014mAP@0.520.3MAAN
VideoActivityNet-1.3mAP@0.533.7MAAN
Temporal Action LocalizationTHUMOS 2014mAP@0.1:0.731.6MAAN
Temporal Action LocalizationTHUMOS 2014mAP@0.520.3MAAN
Temporal Action LocalizationActivityNet-1.3mAP@0.533.7MAAN
Zero-Shot LearningTHUMOS 2014mAP@0.1:0.731.6MAAN
Zero-Shot LearningTHUMOS 2014mAP@0.520.3MAAN
Zero-Shot LearningActivityNet-1.3mAP@0.533.7MAAN
Action LocalizationTHUMOS 2014mAP@0.1:0.731.6MAAN
Action LocalizationTHUMOS 2014mAP@0.520.3MAAN
Action LocalizationActivityNet-1.3mAP@0.533.7MAAN
Weakly Supervised Action LocalizationTHUMOS 2014mAP@0.1:0.731.6MAAN
Weakly Supervised Action LocalizationTHUMOS 2014mAP@0.520.3MAAN
Weakly Supervised Action LocalizationActivityNet-1.3mAP@0.533.7MAAN

Related Papers

DVFL-Net: A Lightweight Distilled Video Focal Modulation Network for Spatio-Temporal Action Recognition2025-07-16Including Semantic Information via Word Embeddings for Skeleton-based Action Recognition2025-06-23Zero-Shot Temporal Interaction Localization for Egocentric Videos2025-06-04A Review on Coarse to Fine-Grained Animal Action Recognition2025-06-01LLM-powered Query Expansion for Enhancing Boundary Prediction in Language-driven Action Localization2025-05-30CLIP-AE: CLIP-assisted Cross-view Audio-Visual Enhancement for Unsupervised Temporal Action Localization2025-05-29DeepConvContext: A Multi-Scale Approach to Timeseries Classification in Human Activity Recognition2025-05-27ProTAL: A Drag-and-Link Video Programming Framework for Temporal Action Localization2025-05-23