VICReg: Variance-Invariance-Covariance Regularization for Self-Supervised Learning

Adrien Bardes, Jean Ponce, Yann Lecun

2021-05-11NeurIPS 2021 12Self-Supervised Image Classification Representation Learning Self-Supervised Learning Semi-Supervised Image Classification

Paper PDF Code Code Code Code(official)Code Code

Abstract

Recent self-supervised methods for image representation learning are based on maximizing the agreement between embedding vectors from different views of the same image. A trivial solution is obtained when the encoder outputs constant vectors. This collapse problem is often avoided through implicit biases in the learning architecture, that often lack a clear justification or interpretation. In this paper, we introduce VICReg (Variance-Invariance-Covariance Regularization), a method that explicitly avoids the collapse problem with a simple regularization term on the variance of the embeddings along each dimension individually. VICReg combines the variance term with a decorrelation mechanism based on redundancy reduction and covariance regularization, and achieves results on par with the state of the art on several downstream tasks. In addition, we show that incorporating our new variance term into other methods helps stabilize the training and leads to performance improvements.

Results

Task	Dataset	Metric	Value	Model
Image Classification	ImageNet	Top 1 Accuracy	73.2	VICReg (ResNet50)
Image Classification	ImageNet	Top 5 Accuracy	91.1	VICReg (ResNet50)

Related Papers

Touch in the Wild: Learning Fine-Grained Manipulation with a Portable Visuo-Tactile Gripper2025-07-20 Spectral Bellman Method: Unifying Representation and Exploration in RL2025-07-17 Boosting Team Modeling through Tempo-Relational Representation Learning2025-07-17 A Semi-Supervised Learning Method for the Identification of Bad Exposures in Large Imaging Surveys2025-07-17 Similarity-Guided Diffusion for Contrastive Sequential Recommendation2025-07-16 Are encoders able to learn landmarkers for warm-starting of Hyperparameter Optimization?2025-07-16 Language-Guided Contrastive Audio-Visual Masked Autoencoder with Automatically Generated Audio-Visual-Text Triplets from Videos2025-07-16 A Mixed-Primitive-based Gaussian Splatting Method for Surface Reconstruction2025-07-15