TasksSotADatasetsPapersMethodsSubmitAbout
Papers With Code 2

A community resource for machine learning research: papers, code, benchmarks, and state-of-the-art results.

Explore

Notable BenchmarksAll SotADatasetsPapersMethods

Community

Submit ResultsAbout

Data sourced from the PWC Archive (CC-BY-SA 4.0). Built by the community, for the community.

Papers/Self-Supervised Learning for Semi-Supervised Temporal Acti...

Self-Supervised Learning for Semi-Supervised Temporal Action Proposal

Xiang Wang, Shiwei Zhang, Zhiwu Qing, Yuanjie Shao, Changxin Gao, Nong Sang

2021-04-07CVPR 2021 1Self-Supervised LearningSemi-Supervised Action DetectionTemporal Action Localization
PaperPDFCode(official)

Abstract

Self-supervised learning presents a remarkable performance to utilize unlabeled data for various video tasks. In this paper, we focus on applying the power of self-supervised methods to improve semi-supervised action proposal generation. Particularly, we design an effective Self-supervised Semi-supervised Temporal Action Proposal (SSTAP) framework. The SSTAP contains two crucial branches, i.e., temporal-aware semi-supervised branch and relation-aware self-supervised branch. The semi-supervised branch improves the proposal model by introducing two temporal perturbations, i.e., temporal feature shift and temporal feature flip, in the mean teacher framework. The self-supervised branch defines two pretext tasks, including masked feature reconstruction and clip-order prediction, to learn the relation of temporal clues. By this means, SSTAP can better explore unlabeled videos, and improve the discriminative abilities of learned action features. We extensively evaluate the proposed SSTAP on THUMOS14 and ActivityNet v1.3 datasets. The experimental results demonstrate that SSTAP significantly outperforms state-of-the-art semi-supervised methods and even matches fully-supervised methods. Code is available at https://github.com/wangxiang1230/SSTAP.

Results

TaskDatasetMetricValueModel
VideoActivityNet-1.3mAP34.48SSTAP@100%+
VideoActivityNet-1.3mAP IOU@0.550.72SSTAP@100%+
VideoActivityNet-1.3mAP IOU@0.7535.28SSTAP@100%+
VideoActivityNet-1.3mAP IOU@0.957.87SSTAP@100%+
Temporal Action LocalizationActivityNet-1.3mAP34.48SSTAP@100%+
Temporal Action LocalizationActivityNet-1.3mAP IOU@0.550.72SSTAP@100%+
Temporal Action LocalizationActivityNet-1.3mAP IOU@0.7535.28SSTAP@100%+
Temporal Action LocalizationActivityNet-1.3mAP IOU@0.957.87SSTAP@100%+
Zero-Shot LearningActivityNet-1.3mAP34.48SSTAP@100%+
Zero-Shot LearningActivityNet-1.3mAP IOU@0.550.72SSTAP@100%+
Zero-Shot LearningActivityNet-1.3mAP IOU@0.7535.28SSTAP@100%+
Zero-Shot LearningActivityNet-1.3mAP IOU@0.957.87SSTAP@100%+
Action LocalizationActivityNet-1.3mAP34.48SSTAP@100%+
Action LocalizationActivityNet-1.3mAP IOU@0.550.72SSTAP@100%+
Action LocalizationActivityNet-1.3mAP IOU@0.7535.28SSTAP@100%+
Action LocalizationActivityNet-1.3mAP IOU@0.957.87SSTAP@100%+

Related Papers

A Semi-Supervised Learning Method for the Identification of Bad Exposures in Large Imaging Surveys2025-07-17DVFL-Net: A Lightweight Distilled Video Focal Modulation Network for Spatio-Temporal Action Recognition2025-07-16Self-supervised Learning on Camera Trap Footage Yields a Strong Universal Face Embedder2025-07-14Speech Quality Assessment Model Based on Mixture of Experts: System-Level Performance Enhancement and Utterance-Level Challenge Analysis2025-07-08World4Drive: End-to-End Autonomous Driving via Intention-aware Physical Latent World Model2025-07-01ShapeEmbed: a self-supervised learning framework for 2D contour quantification2025-07-01RetFiner: A Vision-Language Refinement Scheme for Retinal Foundation Models2025-06-27Boosting Generative Adversarial Transferability with Self-supervised Vision Transformer Features2025-06-26