TasksSotADatasetsPapersMethodsSubmitAbout
Papers With Code 2

A community resource for machine learning research: papers, code, benchmarks, and state-of-the-art results.

Explore

Notable BenchmarksAll SotADatasetsPapersMethods

Community

Submit ResultsAbout

Data sourced from the PWC Archive (CC-BY-SA 4.0). Built by the community, for the community.

Papers/Self-supervised Co-training for Video Representation Learn...

Self-supervised Co-training for Video Representation Learning

Tengda Han, Weidi Xie, Andrew Zisserman

2020-10-19NeurIPS 2020 12Self-Supervised Action Recognition LinearVideo RetrievalRepresentation LearningOptical Flow EstimationContrastive LearningAction RecognitionRetrievalSelf-Supervised Action Recognition
PaperPDFCode(official)

Abstract

The objective of this paper is visual-only self-supervised video representation learning. We make the following contributions: (i) we investigate the benefit of adding semantic-class positives to instance-based Info Noise Contrastive Estimation (InfoNCE) training, showing that this form of supervised contrastive learning leads to a clear improvement in performance; (ii) we propose a novel self-supervised co-training scheme to improve the popular infoNCE loss, exploiting the complementary information from different views, RGB streams and optical flow, of the same data source by using one view to obtain positive class samples for the other; (iii) we thoroughly evaluate the quality of the learnt representation on two different downstream tasks: action recognition and video retrieval. In both cases, the proposed approach demonstrates state-of-the-art or comparable performance with other self-supervised approaches, whilst being significantly more efficient to train, i.e. requiring far less training data to achieve similar performance.

Results

TaskDatasetMetricValueModel
Activity RecognitionUCF101 (finetuned)3-fold Accuracy87.9CoCLR
Activity RecognitionUCF1013-fold Accuracy74.5CoCLR
Activity RecognitionHMDB51Top-1 Accuracy46.1CoCLR
Activity RecognitionHMDB51 (finetuned)Top-1 Accuracy54.6CoCLR
Action RecognitionUCF101 (finetuned)3-fold Accuracy87.9CoCLR
Action RecognitionUCF1013-fold Accuracy74.5CoCLR
Action RecognitionHMDB51Top-1 Accuracy46.1CoCLR
Action RecognitionHMDB51 (finetuned)Top-1 Accuracy54.6CoCLR

Related Papers

Touch in the Wild: Learning Fine-Grained Manipulation with a Portable Visuo-Tactile Gripper2025-07-20Spectral Bellman Method: Unifying Representation and Exploration in RL2025-07-17Boosting Team Modeling through Tempo-Relational Representation Learning2025-07-17Channel-wise Motion Features for Efficient Motion Segmentation2025-07-17SemCSE: Semantic Contrastive Sentence Embeddings Using LLM-Generated Summaries For Scientific Abstracts2025-07-17HapticCap: A Multimodal Dataset and Task for Understanding User Experience of Vibration Haptic Signals2025-07-17Overview of the TalentCLEF 2025: Skill and Job Title Intelligence for Human Capital Management2025-07-17SGCL: Unifying Self-Supervised and Supervised Learning for Graph Recommendation2025-07-17