Knapsack Pruning with Inner Distillation

Yonathan Aflalo, Asaf Noy, Ming Lin, Itamar Friedman, Lihi Zelnik

2020-02-19Neural Architecture Search Network Pruning Knowledge Distillation

Abstract

Neural network pruning reduces the computational cost of an over-parameterized network to improve its efficiency. Popular methods vary from $\ell_1$-norm sparsification to Neural Architecture Search (NAS). In this work, we propose a novel pruning method that optimizes the final accuracy of the pruned network and distills knowledge from the over-parameterized parent network's inner layers. To enable this approach, we formulate the network pruning as a Knapsack Problem which optimizes the trade-off between the importance of neurons and their associated computational cost. Then we prune the network channels while maintaining the high-level structure of the network. The pruned network is fine-tuned under the supervision of the parent network using its inner network knowledge, a technique we refer to as the Inner Knowledge Distillation. Our method leads to state-of-the-art pruning results on ImageNet, CIFAR-10 and CIFAR-100 using ResNet backbones. To prune complex network structures such as convolutions with skip-links and depth-wise convolutions, we propose a block grouping approach to cope with these structures. Through this we produce compact architectures with the same FLOPs as EfficientNet-B0 and MobileNetV3 but with higher accuracy, by $1\%$ and $0.3\%$ respectively on ImageNet, and faster runtime on GPU.

Results

Task	Dataset	Metric	Value	Model
Network Pruning	ImageNet	Accuracy	78	ResNet50 2.5 GFLOPS
Network Pruning	ImageNet	GFLOPs	2.5	ResNet50 2.5 GFLOPS
Network Pruning	ImageNet	Accuracy	77.7	ResNet50 2.0 GFLOPS
Network Pruning	ImageNet	GFLOPs	2	ResNet50 2.0 GFLOPS

Related Papers

Visual-Language Model Knowledge Distillation Method for Image Quality Assessment2025-07-21 DASViT: Differentiable Architecture Search for Vision Transformer2025-07-17 Uncertainty-Aware Cross-Modal Knowledge Distillation with Prototype Learning for Multimodal Brain-Computer Interfaces2025-07-17 DVFL-Net: A Lightweight Distilled Video Focal Modulation Network for Spatio-Temporal Action Recognition2025-07-16 HanjaBridge: Resolving Semantic Ambiguity in Korean LLMs via Hanja-Augmented Pre-Training2025-07-15 Feature Distillation is the Better Choice for Model-Heterogeneous Federated Learning2025-07-14 KAT-V1: Kwai-AutoThink Technical Report2025-07-11 Towards Collaborative Fairness in Federated Learning Under Imbalanced Covariate Shift2025-07-11