TasksSotADatasetsPapersMethodsSubmitAbout
Papers With Code 2

A community resource for machine learning research: papers, code, benchmarks, and state-of-the-art results.

Explore

Notable BenchmarksAll SotADatasetsPapersMethods

Community

Submit ResultsAbout

Data sourced from the PWC Archive (CC-BY-SA 4.0). Built by the community, for the community.

Papers/Contrastive Representation Distillation

Contrastive Representation Distillation

Yonglong Tian, Dilip Krishnan, Phillip Isola

2019-10-23ICLR 2020 1Model CompressionTransfer LearningContrastive LearningKnowledge Distillation
PaperPDFCodeCode(official)CodeCode

Abstract

Often we wish to transfer representational knowledge from one neural network to another. Examples include distilling a large network into a smaller one, transferring knowledge from one sensory modality to a second, or ensembling a collection of models into a single estimator. Knowledge distillation, the standard approach to these problems, minimizes the KL divergence between the probabilistic outputs of a teacher and student network. We demonstrate that this objective ignores important structural knowledge of the teacher network. This motivates an alternative objective by which we train a student to capture significantly more information in the teacher's representation of the data. We formulate this objective as contrastive learning. Experiments demonstrate that our resulting new objective outperforms knowledge distillation and other cutting-edge distillers on a variety of knowledge transfer tasks, including single model compression, ensemble distillation, and cross-modal transfer. Our method sets a new state-of-the-art in many transfer tasks, and sometimes even outperforms the teacher network when combined with knowledge distillation. Code: http://github.com/HobbitLong/RepDistiller.

Results

TaskDatasetMetricValueModel
Knowledge DistillationCIFAR-100Top-1 Accuracy (%)75.51resnet8x4 (T: resnet32x4 S: resnet8x4)
Knowledge DistillationCIFAR-100Top-1 Accuracy (%)74.29vgg8 (T:vgg13 S:vgg8)
Knowledge DistillationCIFAR-100Top-1 Accuracy (%)71.56resnet110 (T:resnet110 S:resnet20)
Knowledge DistillationImageNetTop-1 accuracy %71.38CRD (T: ResNet-34 S:ResNet-18)

Related Papers

LINR-PCGC: Lossless Implicit Neural Representations for Point Cloud Geometry Compression2025-07-21Visual-Language Model Knowledge Distillation Method for Image Quality Assessment2025-07-21RaMen: Multi-Strategy Multi-Modal Learning for Bundle Construction2025-07-18Disentangling coincident cell events using deep transfer learning and compressive sensing2025-07-17SemCSE: Semantic Contrastive Sentence Embeddings Using LLM-Generated Summaries For Scientific Abstracts2025-07-17HapticCap: A Multimodal Dataset and Task for Understanding User Experience of Vibration Haptic Signals2025-07-17Overview of the TalentCLEF 2025: Skill and Job Title Intelligence for Human Capital Management2025-07-17SGCL: Unifying Self-Supervised and Supervised Learning for Graph Recommendation2025-07-17