GPUNet: Searching the Deployable Convolution Neural Networks for GPUs

Linnan Wang, Chenhan Yu, Satish Salian, Slawomir Kierat, Szymon Migacz, Alex Fit Florea

2022-04-26Neural Architecture Search

Abstract

Customizing Convolution Neural Networks (CNN) for production use has been a challenging task for DL practitioners. This paper intends to expedite the model customization with a model hub that contains the optimized models tiered by their inference latency using Neural Architecture Search (NAS). To achieve this goal, we build a distributed NAS system to search on a novel search space that consists of prominent factors to impact latency and accuracy. Since we target GPU, we name the NAS optimized models as GPUNet, which establishes a new SOTA Pareto frontier in inference latency and accuracy. Within 1$ms$, GPUNet is 2x faster than EfficientNet-X and FBNetV3 with even better accuracy. We also validate GPUNet on detection tasks, and GPUNet consistently outperforms EfficientNet-X and FBNetV3 on COCO detection tasks in both latency and accuracy. All of these data validate that our NAS system is effective and generic to handle different design tasks. With this NAS system, we expand GPUNet to cover a wide range of latency targets such that DL practitioners can deploy our models directly in different scenarios.

Results

Task	Dataset	Metric	Value	Model
Neural Architecture Search	ImageNet	Top-1 Error Rate	16.4	GPUNet-D3
Neural Architecture Search	ImageNet	Top-1 Error Rate	17.5	GPUNet-D1
Neural Architecture Search	ImageNet	Top-1 Error Rate	20.3	GPUNet-D0
AutoML	ImageNet	Top-1 Error Rate	16.4	GPUNet-D3
AutoML	ImageNet	Top-1 Error Rate	17.5	GPUNet-D1
AutoML	ImageNet	Top-1 Error Rate	20.3	GPUNet-D0

Related Papers

DASViT: Differentiable Architecture Search for Vision Transformer2025-07-17 AnalogNAS-Bench: A NAS Benchmark for Analog In-Memory Computing2025-06-23 From Tiny Machine Learning to Tiny Deep Learning: A Survey2025-06-21 One-Shot Neural Architecture Search with Network Similarity Directed Initialization for Pathological Image Classification2025-06-17 DDS-NAS: Dynamic Data Selection within Neural Architecture Search via On-line Hard Example Mining applied to Image Classification2025-06-17 MARCO: Hardware-Aware Neural Architecture Search for Edge Devices with Multi-Agent Reinforcement Learning and Conformal Prediction Filtering2025-06-16 Finding Optimal Kernel Size and Dimension in Convolutional Neural Networks An Architecture Optimization Approach2025-06-16 Directed Acyclic Graph Convolutional Networks2025-06-13