TasksSotADatasetsPapersMethodsSubmitAbout
Papers With Code 2

A community resource for machine learning research: papers, code, benchmarks, and state-of-the-art results.

Explore

Notable BenchmarksAll SotADatasetsPapersMethods

Community

Submit ResultsAbout

Data sourced from the PWC Archive (CC-BY-SA 4.0). Built by the community, for the community.

Papers/Point-Level Temporal Action Localization: Bridging Fully-s...

Point-Level Temporal Action Localization: Bridging Fully-supervised Proposals to Weakly-supervised Losses

Chen Ju, Peisen Zhao, Ya zhang, Yanfeng Wang, Qi Tian

2020-12-15Weakly Supervised Action LocalizationAction LocalizationPredictionTemporal Action Localization
PaperPDF

Abstract

Point-Level temporal action localization (PTAL) aims to localize actions in untrimmed videos with only one timestamp annotation for each action instance. Existing methods adopt the frame-level prediction paradigm to learn from the sparse single-frame labels. However, such a framework inevitably suffers from a large solution space. This paper attempts to explore the proposal-based prediction paradigm for point-level annotations, which has the advantage of more constrained solution space and consistent predictions among neighboring frames. The point-level annotations are first used as the keypoint supervision to train a keypoint detector. At the location prediction stage, a simple but effective mapper module, which enables back-propagation of training errors, is then introduced to bridge the fully-supervised framework with weak supervision. To our best of knowledge, this is the first work to leverage the fully-supervised paradigm for the point-level setting. Experiments on THUMOS14, BEOID, and GTEA verify the effectiveness of our proposed method both quantitatively and qualitatively, and demonstrate that our method outperforms state-of-the-art methods.

Results

TaskDatasetMetricValueModel
VideoGTEAmAP@0.1:0.733.7Ju et al.
VideoGTEAmAP@0.521.9Ju et al.
VideoBEOIDmAP@0.1:0.734.9Ju et al.
VideoBEOIDmAP@0.520.9Ju et al.
VideoTHUMOS 2014mAP@0.1:0.555.6Ju et al.
VideoTHUMOS 2014mAP@0.1:0.744.8Ju et al.
VideoTHUMOS 2014mAP@0.535.9Ju et al.
VideoTHUMOS14avg-mAP (0.1-0.5)55.6Ju et al.
VideoTHUMOS14avg-mAP (0.1:0.7)44.8Ju et al.
VideoTHUMOS14avg-mAP (0.3-0.7)35.4Ju et al.
VideoTHUMOS’14mAP@0.535.9Ju et al.
Temporal Action LocalizationGTEAmAP@0.1:0.733.7Ju et al.
Temporal Action LocalizationGTEAmAP@0.521.9Ju et al.
Temporal Action LocalizationBEOIDmAP@0.1:0.734.9Ju et al.
Temporal Action LocalizationBEOIDmAP@0.520.9Ju et al.
Temporal Action LocalizationTHUMOS 2014mAP@0.1:0.555.6Ju et al.
Temporal Action LocalizationTHUMOS 2014mAP@0.1:0.744.8Ju et al.
Temporal Action LocalizationTHUMOS 2014mAP@0.535.9Ju et al.
Temporal Action LocalizationTHUMOS14avg-mAP (0.1-0.5)55.6Ju et al.
Temporal Action LocalizationTHUMOS14avg-mAP (0.1:0.7)44.8Ju et al.
Temporal Action LocalizationTHUMOS14avg-mAP (0.3-0.7)35.4Ju et al.
Temporal Action LocalizationTHUMOS’14mAP@0.535.9Ju et al.
Zero-Shot LearningGTEAmAP@0.1:0.733.7Ju et al.
Zero-Shot LearningGTEAmAP@0.521.9Ju et al.
Zero-Shot LearningBEOIDmAP@0.1:0.734.9Ju et al.
Zero-Shot LearningBEOIDmAP@0.520.9Ju et al.
Zero-Shot LearningTHUMOS 2014mAP@0.1:0.555.6Ju et al.
Zero-Shot LearningTHUMOS 2014mAP@0.1:0.744.8Ju et al.
Zero-Shot LearningTHUMOS 2014mAP@0.535.9Ju et al.
Zero-Shot LearningTHUMOS14avg-mAP (0.1-0.5)55.6Ju et al.
Zero-Shot LearningTHUMOS14avg-mAP (0.1:0.7)44.8Ju et al.
Zero-Shot LearningTHUMOS14avg-mAP (0.3-0.7)35.4Ju et al.
Zero-Shot LearningTHUMOS’14mAP@0.535.9Ju et al.
Action LocalizationGTEAmAP@0.1:0.733.7Ju et al.
Action LocalizationGTEAmAP@0.521.9Ju et al.
Action LocalizationBEOIDmAP@0.1:0.734.9Ju et al.
Action LocalizationBEOIDmAP@0.520.9Ju et al.
Action LocalizationTHUMOS 2014mAP@0.1:0.555.6Ju et al.
Action LocalizationTHUMOS 2014mAP@0.1:0.744.8Ju et al.
Action LocalizationTHUMOS 2014mAP@0.535.9Ju et al.
Action LocalizationTHUMOS14avg-mAP (0.1-0.5)55.6Ju et al.
Action LocalizationTHUMOS14avg-mAP (0.1:0.7)44.8Ju et al.
Action LocalizationTHUMOS14avg-mAP (0.3-0.7)35.4Ju et al.
Action LocalizationTHUMOS’14mAP@0.535.9Ju et al.
Weakly Supervised Action LocalizationGTEAmAP@0.1:0.733.7Ju et al.
Weakly Supervised Action LocalizationGTEAmAP@0.521.9Ju et al.
Weakly Supervised Action LocalizationBEOIDmAP@0.1:0.734.9Ju et al.
Weakly Supervised Action LocalizationBEOIDmAP@0.520.9Ju et al.
Weakly Supervised Action LocalizationTHUMOS 2014mAP@0.1:0.555.6Ju et al.
Weakly Supervised Action LocalizationTHUMOS 2014mAP@0.1:0.744.8Ju et al.
Weakly Supervised Action LocalizationTHUMOS 2014mAP@0.535.9Ju et al.
Weakly Supervised Action LocalizationTHUMOS14avg-mAP (0.1-0.5)55.6Ju et al.
Weakly Supervised Action LocalizationTHUMOS14avg-mAP (0.1:0.7)44.8Ju et al.
Weakly Supervised Action LocalizationTHUMOS14avg-mAP (0.3-0.7)35.4Ju et al.
Weakly Supervised Action LocalizationTHUMOS’14mAP@0.535.9Ju et al.

Related Papers

Multi-Strategy Improved Snake Optimizer Accelerated CNN-LSTM-Attention-Adaboost for Trajectory Prediction2025-07-21DVFL-Net: A Lightweight Distilled Video Focal Modulation Network for Spatio-Temporal Action Recognition2025-07-16Generative Click-through Rate Prediction with Applications to Search Advertising2025-07-15Conformation-Aware Structure Prediction of Antigen-Recognizing Immune Proteins2025-07-11Foundation models for time series forecasting: Application in conformal prediction2025-07-09Predicting Graph Structure via Adapted Flux Balance Analysis2025-07-08Speech Quality Assessment Model Based on Mixture of Experts: System-Level Performance Enhancement and Utterance-Level Challenge Analysis2025-07-08A Wireless Foundation Model for Multi-Task Prediction2025-07-08