HyT-NAS: Hybrid Transformers Neural Architecture Search for Edge Devices

Lotfi Abdelkrim Mecharbat, Hadjer Benmeziane, Hamza Ouarnoughi, Smail Niar

2023-03-08Image Classification Neural Architecture Search Hardware Aware Neural Architecture Search object-detection Object Detection

Paper PDF

Abstract

Vision Transformers have enabled recent attention-based Deep Learning (DL) architectures to achieve remarkable results in Computer Vision (CV) tasks. However, due to the extensive computational resources required, these architectures are rarely implemented on resource-constrained platforms. Current research investigates hybrid handcrafted convolution-based and attention-based models for CV tasks such as image classification and object detection. In this paper, we propose HyT-NAS, an efficient Hardware-aware Neural Architecture Search (HW-NAS) including hybrid architectures targeting vision tasks on tiny devices. HyT-NAS improves state-of-the-art HW-NAS by enriching the search space and enhancing the search strategy as well as the performance predictors. Our experiments show that HyT-NAS achieves a similar hypervolume with less than ~5x training evaluations. Our resulting architecture outperforms MLPerf MobileNetV1 by 6.3% accuracy improvement with 3.5x less number of parameters on Visual Wake Words.

Results

Task	Dataset	Metric	Value	Model
Image Classification	Visual Wake Words	Accuracy	92.25	HyT-NAS-BA
Image Classification	Visual Wake Words	Accuracy	86.55	ProxylessNAS
Image Classification	Visual Wake Words	Accuracy	86.34	MobileNetV2 (x0.35)
Image Classification	Visual Wake Words	Accuracy	83.7	MobileNetV1

Related Papers

Automatic Classification and Segmentation of Tunnel Cracks Based on Deep Learning and Visual Explanations2025-07-18 Adversarial attacks to image classification systems using evolutionary algorithms2025-07-17 Efficient Adaptation of Pre-trained Vision Transformer underpinned by Approximately Orthogonal Fine-Tuning Strategy2025-07-17 Federated Learning for Commercial Image Sources2025-07-17 MUPAX: Multidimensional Problem Agnostic eXplainable AI2025-07-17 DASViT: Differentiable Architecture Search for Vision Transformer2025-07-17 A Real-Time System for Egocentric Hand-Object Interaction Detection in Industrial Domains2025-07-17 RS-TinyNet: Stage-wise Feature Fusion Network for Detecting Tiny Objects in Remote Sensing Images2025-07-17