Contrastive Representation Distillation

Yonglong Tian, Dilip Krishnan, Phillip Isola

2019-10-23ICLR 2020 1Model Compression Transfer Learning Contrastive Learning Knowledge Distillation

Abstract

Often we wish to transfer representational knowledge from one neural network to another. Examples include distilling a large network into a smaller one, transferring knowledge from one sensory modality to a second, or ensembling a collection of models into a single estimator. Knowledge distillation, the standard approach to these problems, minimizes the KL divergence between the probabilistic outputs of a teacher and student network. We demonstrate that this objective ignores important structural knowledge of the teacher network. This motivates an alternative objective by which we train a student to capture significantly more information in the teacher's representation of the data. We formulate this objective as contrastive learning. Experiments demonstrate that our resulting new objective outperforms knowledge distillation and other cutting-edge distillers on a variety of knowledge transfer tasks, including single model compression, ensemble distillation, and cross-modal transfer. Our method sets a new state-of-the-art in many transfer tasks, and sometimes even outperforms the teacher network when combined with knowledge distillation. Code: http://github.com/HobbitLong/RepDistiller.

Results

Task	Dataset	Metric	Value	Model
Knowledge Distillation	CIFAR-100	Top-1 Accuracy (%)	75.51	resnet8x4 (T: resnet32x4 S: resnet8x4)
Knowledge Distillation	CIFAR-100	Top-1 Accuracy (%)	74.29	vgg8 (T:vgg13 S:vgg8)
Knowledge Distillation	CIFAR-100	Top-1 Accuracy (%)	71.56	resnet110 (T:resnet110 S:resnet20)
Knowledge Distillation	ImageNet	Top-1 accuracy %	71.38	CRD (T: ResNet-34 S:ResNet-18)

Related Papers

LINR-PCGC: Lossless Implicit Neural Representations for Point Cloud Geometry Compression2025-07-21 Visual-Language Model Knowledge Distillation Method for Image Quality Assessment2025-07-21 RaMen: Multi-Strategy Multi-Modal Learning for Bundle Construction2025-07-18 Disentangling coincident cell events using deep transfer learning and compressive sensing2025-07-17 SemCSE: Semantic Contrastive Sentence Embeddings Using LLM-Generated Summaries For Scientific Abstracts2025-07-17 HapticCap: A Multimodal Dataset and Task for Understanding User Experience of Vibration Haptic Signals2025-07-17 Overview of the TalentCLEF 2025: Skill and Job Title Intelligence for Human Capital Management2025-07-17 SGCL: Unifying Self-Supervised and Supervised Learning for Graph Recommendation2025-07-17