Self-Supervised Learning of Pretext-Invariant Representations

Ishan Misra, Laurens van der Maaten

2019-12-04CVPR 2020 6Self-Supervised Image Classification Representation Learning Self-Supervised Learning Contrastive Learning object-detection Object Detection Semi-Supervised Image Classification

Paper PDF Code Code Code Code Code Code Code

Abstract

The goal of self-supervised learning from images is to construct image representations that are semantically meaningful via pretext tasks that do not require semantic annotations for a large training set of images. Many pretext tasks lead to representations that are covariant with image transformations. We argue that, instead, semantic representations ought to be invariant under such transformations. Specifically, we develop Pretext-Invariant Representation Learning (PIRL, pronounced as "pearl") that learns invariant representations based on pretext tasks. We use PIRL with a commonly used pretext task that involves solving jigsaw puzzles. We find that PIRL substantially improves the semantic quality of the learned image representations. Our approach sets a new state-of-the-art in self-supervised learning from images on several popular benchmarks for self-supervised learning. Despite being unsupervised, PIRL outperforms supervised pre-training in learning image representations for object detection. Altogether, our results demonstrate the potential of self-supervised learning of image representations with good invariance properties.

Results

Task	Dataset	Metric	Value	Model
Contrastive Learning	imagenet-1k	ImageNet Top-1 Accuracy	63.6	ResNet50

Related Papers

Touch in the Wild: Learning Fine-Grained Manipulation with a Portable Visuo-Tactile Gripper2025-07-20 Spectral Bellman Method: Unifying Representation and Exploration in RL2025-07-17 Boosting Team Modeling through Tempo-Relational Representation Learning2025-07-17 A Semi-Supervised Learning Method for the Identification of Bad Exposures in Large Imaging Surveys2025-07-17 SemCSE: Semantic Contrastive Sentence Embeddings Using LLM-Generated Summaries For Scientific Abstracts2025-07-17 HapticCap: A Multimodal Dataset and Task for Understanding User Experience of Vibration Haptic Signals2025-07-17 Overview of the TalentCLEF 2025: Skill and Job Title Intelligence for Human Capital Management2025-07-17 SGCL: Unifying Self-Supervised and Supervised Learning for Graph Recommendation2025-07-17