SpectralKD: A Unified Framework for Interpreting and Distilling Vision Transformers via Spectral Analysis

Huiyuan Tian, Bonan Xu, Shijian Li, Gang Pan

2024-12-26Transfer Learning Knowledge Distillation

Abstract

Knowledge Distillation (KD) has achieved widespread success in compressing large Vision Transformers (ViTs), but a unified theoretical framework for both ViTs and KD is still lacking. In this paper, we propose SpectralKD, a novel unified analytical framework that offers deeper insights into ViTs and optimizes KD via spectral analysis. Our model-wise analysis reveals that CaiT concentrates information in their first and last few layers, informing optimal layer selection for KD. Surprisingly, our layer-wise analysis discovers that Swin Transformer and CaiT exhibit similar spectral encoding patterns despite their architectural differences, leading to feature map alignment guideline. Building on these insights, we propose a simple yet effective spectral alignment method for KD. Benefiting from the deeper understanding by above analysis results, even such a simple strategy achieves state-of-the-art performance on ImageNet-1K without introducing any trainable parameters, improving DeiT-Tiny by $+5.2\%$ and Swin-Tiny by $+1.4\%$ in top-1 accuracy. Furthermore, our post-training analysis reveals that distilled students can reproduce spectral patterns similar to their teachers, opening a new area we term ``distillation dynamics". Code and experimental logs are available in https://github.com/thy960112/SpectralKD.

Results

Task	Dataset	Metric	Value	Model
Knowledge Distillation	ImageNet	Top-1 accuracy %	82.7	SpectralKD (T:Swin-S S:Swin-T)
Knowledge Distillation	ImageNet	Top-1 accuracy %	82.2	SpectralKD (T:Cait-S24 S:DeiT-S)
Knowledge Distillation	ImageNet	Top-1 accuracy %	77.4	SpectralKD (T:Cait-S24 S:DeiT-T)

SpectralKD: A Unified Framework for Interpreting and Distilling Vision Transformers via Spectral Analysis

Abstract

Results

Related Papers

SpectralKD: A Unified Framework for Interpreting and Distilling Vision Transformers via Spectral Analysis

Abstract

Results

Related Papers