TasksSotADatasetsPapersMethodsSubmitAbout
Papers With Code 2

A community resource for machine learning research: papers, code, benchmarks, and state-of-the-art results.

Explore

Notable BenchmarksAll SotADatasetsPapersMethods

Community

Submit ResultsAbout

Data sourced from the PWC Archive (CC-BY-SA 4.0). Built by the community, for the community.

Papers/Self-Supervised Video Representation Learning with Meta-Co...

Self-Supervised Video Representation Learning with Meta-Contrastive Network

Yuanze Lin, Xun Guo, Yan Lu

2021-08-19ICCV 2021 10Meta-LearningVideo RetrievalRepresentation LearningSelf-Supervised LearningContrastive LearningAction RecognitionRetrievalTemporal Action LocalizationSelf-Supervised Action Recognition
PaperPDF

Abstract

Self-supervised learning has been successfully applied to pre-train video representations, which aims at efficient adaptation from pre-training domain to downstream tasks. Existing approaches merely leverage contrastive loss to learn instance-level discrimination. However, lack of category information will lead to hard-positive problem that constrains the generalization ability of this kind of methods. We find that the multi-task process of meta learning can provide a solution to this problem. In this paper, we propose a Meta-Contrastive Network (MCN), which combines the contrastive learning and meta learning, to enhance the learning ability of existing self-supervised approaches. Our method contains two training stages based on model-agnostic meta learning (MAML), each of which consists of a contrastive branch and a meta branch. Extensive evaluations demonstrate the effectiveness of our method. For two downstream tasks, i.e., video action recognition and video retrieval, MCN outperforms state-of-the-art approaches on UCF101 and HMDB51 datasets. To be more specific, with R(2+1)D backbone, MCN achieves Top-1 accuracies of 84.8% and 54.5% for video action recognition, as well as 52.5% and 23.7% for video retrieval.

Results

TaskDatasetMetricValueModel
Activity RecognitionUCF1013-fold Accuracy85.4MCN (R3D-18; RGB)
Activity RecognitionUCF1013-fold Accuracy84.8MCN (R2+1D; RGB)
Activity RecognitionHMDB51Top-1 Accuracy54.8MCN (R3D-18; RGB)
Activity RecognitionHMDB51Top-1 Accuracy54.5MCN (R2+1D; RGB)
Action RecognitionUCF1013-fold Accuracy85.4MCN (R3D-18; RGB)
Action RecognitionUCF1013-fold Accuracy84.8MCN (R2+1D; RGB)
Action RecognitionHMDB51Top-1 Accuracy54.8MCN (R3D-18; RGB)
Action RecognitionHMDB51Top-1 Accuracy54.5MCN (R2+1D; RGB)

Related Papers

Touch in the Wild: Learning Fine-Grained Manipulation with a Portable Visuo-Tactile Gripper2025-07-20Spectral Bellman Method: Unifying Representation and Exploration in RL2025-07-17Boosting Team Modeling through Tempo-Relational Representation Learning2025-07-17A Semi-Supervised Learning Method for the Identification of Bad Exposures in Large Imaging Surveys2025-07-17SemCSE: Semantic Contrastive Sentence Embeddings Using LLM-Generated Summaries For Scientific Abstracts2025-07-17HapticCap: A Multimodal Dataset and Task for Understanding User Experience of Vibration Haptic Signals2025-07-17Overview of the TalentCLEF 2025: Skill and Job Title Intelligence for Human Capital Management2025-07-17SGCL: Unifying Self-Supervised and Supervised Learning for Graph Recommendation2025-07-17