TasksSotADatasetsPapersMethodsSubmitAbout
Papers With Code 2

A community resource for machine learning research: papers, code, benchmarks, and state-of-the-art results.

Explore

Notable BenchmarksAll SotADatasetsPapersMethods

Community

Submit ResultsAbout

Data sourced from the PWC Archive (CC-BY-SA 4.0). Built by the community, for the community.

Papers/Pretext-Contrastive Learning: Toward Good Practices in Sel...

Pretext-Contrastive Learning: Toward Good Practices in Self-supervised Video Representation Leaning

Li Tao, Xueting Wang, Toshihiko Yamasaki

2020-10-29Video RetrievalSelf-Supervised LearningData AugmentationContrastive LearningSelf-supervised Video RetrievalSelf-Supervised Action Recognition
PaperPDFCode(official)

Abstract

Recently, pretext-task based methods are proposed one after another in self-supervised video feature learning. Meanwhile, contrastive learning methods also yield good performance. Usually, new methods can beat previous ones as claimed that they could capture "better" temporal information. However, there exist setting differences among them and it is hard to conclude which is better. It would be much more convincing in comparison if these methods have reached as closer to their performance limits as possible. In this paper, we start from one pretext-task baseline, exploring how far it can go by combining it with contrastive learning, data pre-processing, and data augmentation. A proper setting has been found from extensive experiments, with which huge improvements over the baselines can be achieved, indicating a joint optimization framework can boost both pretext task and contrastive learning. We denote the joint optimization framework as Pretext-Contrastive Learning (PCL). The other two pretext task baselines are used to validate the effectiveness of PCL. And we can easily outperform current state-of-the-art methods in the same training manner, showing the effectiveness and the generality of our proposal. It is convenient to treat PCL as a standard training strategy and apply it to many other works in self-supervised video feature learning.

Results

TaskDatasetMetricValueModel
Activity RecognitionUCF1013-fold Accuracy82.3PCL (ResNet-18)
Activity RecognitionHMDB51Top-1 Accuracy43.2PCL (ResNet-18)
Action RecognitionUCF1013-fold Accuracy82.3PCL (ResNet-18)
Action RecognitionHMDB51Top-1 Accuracy43.2PCL (ResNet-18)

Related Papers

A Semi-Supervised Learning Method for the Identification of Bad Exposures in Large Imaging Surveys2025-07-17Overview of the TalentCLEF 2025: Skill and Job Title Intelligence for Human Capital Management2025-07-17Pixel Perfect MegaMed: A Megapixel-Scale Vision-Language Foundation Model for Generating High Resolution Medical Images2025-07-17SemCSE: Semantic Contrastive Sentence Embeddings Using LLM-Generated Summaries For Scientific Abstracts2025-07-17HapticCap: A Multimodal Dataset and Task for Understanding User Experience of Vibration Haptic Signals2025-07-17SGCL: Unifying Self-Supervised and Supervised Learning for Graph Recommendation2025-07-17Similarity-Guided Diffusion for Contrastive Sequential Recommendation2025-07-16Data Augmentation in Time Series Forecasting through Inverted Framework2025-07-15