TasksSotADatasetsPapersMethodsSubmitAbout
Papers With Code 2

A community resource for machine learning research: papers, code, benchmarks, and state-of-the-art results.

Explore

Notable BenchmarksAll SotADatasetsPapersMethods

Community

Submit ResultsAbout

Data sourced from the PWC Archive (CC-BY-SA 4.0). Built by the community, for the community.

Papers/Information Theoretic Representation Distillation

Information Theoretic Representation Distillation

Roy Miles, Adrian Lopez Rodriguez, Krystian Mikolajczyk

2021-12-01Question AnsweringClassification with Binary Weight NetworkKnowledge Distillation
PaperPDFCode(official)

Abstract

Despite the empirical success of knowledge distillation, current state-of-the-art methods are computationally expensive to train, which makes them difficult to adopt in practice. To address this problem, we introduce two distinct complementary losses inspired by a cheap entropy-like estimator. These losses aim to maximise the correlation and mutual information between the student and teacher representations. Our method incurs significantly less training overheads than other approaches and achieves competitive performance to the state-of-the-art on the knowledge distillation and cross-model transfer tasks. We further demonstrate the effectiveness of our method on a binary distillation task, whereby it leads to a new state-of-the-art for binary quantisation and approaches the performance of a full precision model. Code: www.github.com/roymiles/ITRD

Results

TaskDatasetMetricValueModel
Question AnsweringSQuAD1.1EM81.5BERT - 6 Layers
Question AnsweringSQuAD1.1F188.5BERT - 6 Layers
Question AnsweringSQuAD1.1EM77.7BERT - 3 Layers
Question AnsweringSQuAD1.1F185.8BERT - 3 Layers
Knowledge DistillationCIFAR-100Top-1 Accuracy (%)76.68resnet8x4 (T: resnet32x4 S: resnet8x4)
Knowledge DistillationCIFAR-100Top-1 Accuracy (%)74.93vgg8 (T:vgg13 S:vgg8)
Knowledge DistillationCIFAR-100Top-1 Accuracy (%)71.99resnet110 (T:resnet110 S:resnet20)
Knowledge DistillationImageNetTop-1 accuracy %71.68ITRD (T: ResNet-34 S:ResNet-18)

Related Papers

Visual-Language Model Knowledge Distillation Method for Image Quality Assessment2025-07-21From Roots to Rewards: Dynamic Tree Reasoning with RL2025-07-17Enter the Mind Palace: Reasoning and Planning for Long-term Active Embodied Question Answering2025-07-17Vision-and-Language Training Helps Deploy Taxonomic Knowledge but Does Not Fundamentally Alter It2025-07-17City-VLM: Towards Multidomain Perception Scene Understanding via Multimodal Incomplete Learning2025-07-17Uncertainty-Aware Cross-Modal Knowledge Distillation with Prototype Learning for Multimodal Brain-Computer Interfaces2025-07-17Describe Anything Model for Visual Question Answering on Text-rich Images2025-07-16Is This Just Fantasy? Language Model Representations Reflect Human Judgments of Event Plausibility2025-07-16