Hanxiao Liu, Karen Simonyan, Yiming Yang
This paper addresses the scalability challenge of architecture search by formulating the task in a differentiable manner. Unlike conventional approaches of applying evolution or reinforcement learning over a discrete and non-differentiable search space, our method is based on the continuous relaxation of the architecture representation, allowing efficient search of the architecture using gradient descent. Extensive experiments on CIFAR-10, ImageNet, Penn Treebank and WikiText-2 show that our algorithm excels in discovering high-performance convolutional architectures for image classification and recurrent architectures for language modeling, while being orders of magnitude faster than state-of-the-art non-differentiable techniques. Our implementation has been made publicly available to facilitate further research on efficient architecture search algorithms.
| Task | Dataset | Metric | Value | Model |
|---|---|---|---|---|
| Language Modelling | Penn Treebank (Word Level) | Test perplexity | 56.1 | Differentiable NAS |
| Language Modelling | Penn Treebank (Word Level) | Validation perplexity | 58.3 | Differentiable NAS |
| Neural Architecture Search | NAS-Bench-201, ImageNet-16-120 | Accuracy (Test) | 16.43 | DARTS (first order) |
| Neural Architecture Search | NAS-Bench-201, ImageNet-16-120 | Search time (s) | 10890 | DARTS (first order) |
| Neural Architecture Search | NAS-Bench-201, ImageNet-16-120 | Accuracy (Test) | 16.43 | DARTS (second order) |
| Neural Architecture Search | NAS-Bench-201, ImageNet-16-120 | Search time (s) | 29902 | DARTS (second order) |
| Neural Architecture Search | CIFAR-10 Image Classification | Percentage error | 2.83 | DARTS + c/o |
| Neural Architecture Search | CIFAR-10 Image Classification | Search Time (GPU days) | 4 | DARTS + c/o |
| Neural Architecture Search | CIFAR-10 | Parameters | 3.3 | DARTS (second order) |
| Neural Architecture Search | CIFAR-10 | Search Time (GPU days) | 4 | DARTS (second order) |
| Neural Architecture Search | CIFAR-10 | Parameters | 3.3 | DARTS (first order) |
| Neural Architecture Search | CIFAR-10 | Search Time (GPU days) | 1.5 | DARTS (first order) |
| Neural Architecture Search | ImageNet | Accuracy | 73.3 | DARTS |
| Neural Architecture Search | ImageNet | Params | 4.9 | DARTS |
| Neural Architecture Search | ImageNet | Top-1 Error Rate | 26.7 | DARTS |
| AutoML | NAS-Bench-201, ImageNet-16-120 | Accuracy (Test) | 16.43 | DARTS (first order) |
| AutoML | NAS-Bench-201, ImageNet-16-120 | Search time (s) | 10890 | DARTS (first order) |
| AutoML | NAS-Bench-201, ImageNet-16-120 | Accuracy (Test) | 16.43 | DARTS (second order) |
| AutoML | NAS-Bench-201, ImageNet-16-120 | Search time (s) | 29902 | DARTS (second order) |
| AutoML | CIFAR-10 Image Classification | Percentage error | 2.83 | DARTS + c/o |
| AutoML | CIFAR-10 Image Classification | Search Time (GPU days) | 4 | DARTS + c/o |
| AutoML | CIFAR-10 | Parameters | 3.3 | DARTS (second order) |
| AutoML | CIFAR-10 | Search Time (GPU days) | 4 | DARTS (second order) |
| AutoML | CIFAR-10 | Parameters | 3.3 | DARTS (first order) |
| AutoML | CIFAR-10 | Search Time (GPU days) | 1.5 | DARTS (first order) |
| AutoML | ImageNet | Accuracy | 73.3 | DARTS |
| AutoML | ImageNet | Params | 4.9 | DARTS |
| AutoML | ImageNet | Top-1 Error Rate | 26.7 | DARTS |