TasksSotADatasetsPapersMethodsSubmitAbout
Papers With Code 2

A community resource for machine learning research: papers, code, benchmarks, and state-of-the-art results.

Explore

Notable BenchmarksAll SotADatasetsPapersMethods

Community

Submit ResultsAbout

Data sourced from the PWC Archive (CC-BY-SA 4.0). Built by the community, for the community.

Papers/A Perceptual Prediction Framework for Self Supervised Even...

A Perceptual Prediction Framework for Self Supervised Event Segmentation

Sathyanarayanan N. Aakur, Sudeep Sarkar

2018-11-12CVPR 2019 6Event SegmentationRepresentation LearningAction LocalizationUnsupervised Action SegmentationPredictionAction Recognition
PaperPDFCode(official)

Abstract

Temporal segmentation of long videos is an important problem, that has largely been tackled through supervised learning, often requiring large amounts of annotated training data. In this paper, we tackle the problem of self-supervised temporal segmentation of long videos that alleviate the need for any supervision. We introduce a self-supervised, predictive learning framework that draws inspiration from cognitive psychology to segment long, visually complex videos into individual, stable segments that share the same semantics. We also introduce a new adaptive learning paradigm that helps reduce the effect of catastrophic forgetting in recurrent neural networks. Extensive experiments on three publicly available datasets - Breakfast Actions, 50 Salads, and INRIA Instructional Videos datasets show the efficacy of the proposed approach. We show that the proposed approach is able to outperform weakly-supervised and other unsupervised learning approaches by up to 24% and have competitive performance compared to fully supervised approaches. We also show that the proposed approach is able to learn highly discriminative features that help improve action recognition when used in a representation learning paradigm.

Results

TaskDatasetMetricValueModel
Action Localization50 SaladsAcc60.6LSTM+AL
Action LocalizationYoutube INRIA InstructionalF139.7LSTM+AL
Action LocalizationBreakfastAcc42.9LSTM+AL
Action LocalizationBreakfastmIoU46.9LSTM+AL
Action Segmentation50 SaladsAcc60.6LSTM+AL
Action SegmentationYoutube INRIA InstructionalF139.7LSTM+AL
Action SegmentationBreakfastAcc42.9LSTM+AL
Action SegmentationBreakfastmIoU46.9LSTM+AL

Related Papers

Multi-Strategy Improved Snake Optimizer Accelerated CNN-LSTM-Attention-Adaboost for Trajectory Prediction2025-07-21Touch in the Wild: Learning Fine-Grained Manipulation with a Portable Visuo-Tactile Gripper2025-07-20Spectral Bellman Method: Unifying Representation and Exploration in RL2025-07-17Boosting Team Modeling through Tempo-Relational Representation Learning2025-07-17A Real-Time System for Egocentric Hand-Object Interaction Detection in Industrial Domains2025-07-17Similarity-Guided Diffusion for Contrastive Sequential Recommendation2025-07-16Are encoders able to learn landmarkers for warm-starting of Hyperparameter Optimization?2025-07-16Language-Guided Contrastive Audio-Visual Masked Autoencoder with Automatically Generated Audio-Visual-Text Triplets from Videos2025-07-16