TasksSotADatasetsPapersMethodsSubmitAbout
Papers With Code 2

A community resource for machine learning research: papers, code, benchmarks, and state-of-the-art results.

Explore

Notable BenchmarksAll SotADatasetsPapersMethods

Community

Submit ResultsAbout

Data sourced from the PWC Archive (CC-BY-SA 4.0). Built by the community, for the community.

Papers/Blockwisely Supervised Neural Architecture Search with Kno...

Blockwisely Supervised Neural Architecture Search with Knowledge Distillation

Changlin Li, Jiefeng Peng, Liuchun Yuan, Guangrun Wang, Xiaodan Liang, Liang Lin, Xiaojun Chang

2019-11-29Neural Architecture SearchKnowledge Distillation
PaperPDFCode(official)

Abstract

Neural Architecture Search (NAS), aiming at automatically designing network architectures by machines, is hoped and expected to bring about a new revolution in machine learning. Despite these high expectation, the effectiveness and efficiency of existing NAS solutions are unclear, with some recent works going so far as to suggest that many existing NAS solutions are no better than random architecture selection. The inefficiency of NAS solutions may be attributed to inaccurate architecture evaluation. Specifically, to speed up NAS, recent works have proposed under-training different candidate architectures in a large search space concurrently by using shared network parameters; however, this has resulted in incorrect architecture ratings and furthered the ineffectiveness of NAS. In this work, we propose to modularize the large search space of NAS into blocks to ensure that the potential candidate architectures are fully trained; this reduces the representation shift caused by the shared parameters and leads to the correct rating of the candidates. Thanks to the block-wise search, we can also evaluate all of the candidate architectures within a block. Moreover, we find that the knowledge of a network model lies not only in the network parameters but also in the network architecture. Therefore, we propose to distill the neural architecture (DNA) knowledge from a teacher model as the supervision to guide our block-wise architecture search, which significantly improves the effectiveness of NAS. Remarkably, the capacity of our searched architecture has exceeded the teacher model, demonstrating the practicability and scalability of our method. Finally, our method achieves a state-of-the-art 78.4\% top-1 accuracy on ImageNet in a mobile setting, which is about a 2.1\% gain over EfficientNet-B0. All of our searched models along with the evaluation code are available online.

Results

TaskDatasetMetricValueModel
Neural Architecture SearchCIFAR-100Percentage Error11.7DNA-c
Neural Architecture SearchImageNetAccuracy78.4DNA-d
Neural Architecture SearchImageNetTop-1 Error Rate21.6DNA-d
Neural Architecture SearchImageNetAccuracy77.8DNA-c
Neural Architecture SearchImageNetTop-1 Error Rate22.2DNA-c
Neural Architecture SearchImageNetAccuracy77.5DNA-b
Neural Architecture SearchImageNetTop-1 Error Rate22.5DNA-b
Neural Architecture SearchImageNetAccuracy77.1DNA-a
Neural Architecture SearchImageNetTop-1 Error Rate22.9DNA-a
AutoMLCIFAR-100Percentage Error11.7DNA-c
AutoMLImageNetAccuracy78.4DNA-d
AutoMLImageNetTop-1 Error Rate21.6DNA-d
AutoMLImageNetAccuracy77.8DNA-c
AutoMLImageNetTop-1 Error Rate22.2DNA-c
AutoMLImageNetAccuracy77.5DNA-b
AutoMLImageNetTop-1 Error Rate22.5DNA-b
AutoMLImageNetAccuracy77.1DNA-a
AutoMLImageNetTop-1 Error Rate22.9DNA-a

Related Papers

Visual-Language Model Knowledge Distillation Method for Image Quality Assessment2025-07-21DASViT: Differentiable Architecture Search for Vision Transformer2025-07-17Uncertainty-Aware Cross-Modal Knowledge Distillation with Prototype Learning for Multimodal Brain-Computer Interfaces2025-07-17DVFL-Net: A Lightweight Distilled Video Focal Modulation Network for Spatio-Temporal Action Recognition2025-07-16HanjaBridge: Resolving Semantic Ambiguity in Korean LLMs via Hanja-Augmented Pre-Training2025-07-15Feature Distillation is the Better Choice for Model-Heterogeneous Federated Learning2025-07-14KAT-V1: Kwai-AutoThink Technical Report2025-07-11Towards Collaborative Fairness in Federated Learning Under Imbalanced Covariate Shift2025-07-11