HMQ: Hardware Friendly Mixed Precision Quantization Block for CNNs

Hai Victor Habi, Roy H. Jennings, Arnon Netzer

2020-07-20ECCV 2020 8Quantization

Abstract

Recent work in network quantization produced state-of-the-art results using mixed precision quantization. An imperative requirement for many efficient edge device hardware implementations is that their quantizers are uniform and with power-of-two thresholds. In this work, we introduce the Hardware Friendly Mixed Precision Quantization Block (HMQ) in order to meet this requirement. The HMQ is a mixed precision quantization block that repurposes the Gumbel-Softmax estimator into a smooth estimator of a pair of quantization parameters, namely, bit-width and threshold. HMQs use this to search over a finite space of quantization schemes. Empirically, we apply HMQs to quantize classification models trained on CIFAR10 and ImageNet. For ImageNet, we quantize four different architectures and show that, in spite of the added restrictions to our quantization scheme, we achieve competitive and, in some cases, state-of-the-art results.

Results

Task	Dataset	Metric	Value	Model
Quantization	ImageNet	Activation bits	8	EfficientNet-B0-W8A8
Quantization	ImageNet	Top-1 Accuracy (%)	76.4	EfficientNet-B0-W8A8
Quantization	ImageNet	Weight bits	8	EfficientNet-B0-W8A8
Quantization	ImageNet	Activation bits	4	EfficientNet-B0-W4A4
Quantization	ImageNet	Top-1 Accuracy (%)	76	EfficientNet-B0-W4A4
Quantization	ImageNet	Weight bits	4	EfficientNet-B0-W4A4
Quantization	ImageNet	Activation bits	4	ResNet50-W3A4
Quantization	ImageNet	Top-1 Accuracy (%)	75.45	ResNet50-W3A4
Quantization	ImageNet	Weight bits	3	ResNet50-W3A4
Quantization	ImageNet	Top-1 Accuracy (%)	70.9	MobileNetV2

Related Papers

Efficient Deployment of Spiking Neural Networks on SpiNNaker2 for DVS Gesture Recognition Using Neuromorphic Intermediate Representation2025-09-04 An End-to-End DNN Inference Framework for the SpiNNaker2 Neuromorphic MPSoC2025-07-18 Task-Specific Audio Coding for Machines: Machine-Learned Latent Features Are Codes for That Machine2025-07-17 Angle Estimation of a Single Source with Massive Uniform Circular Arrays2025-07-17 Quantized Rank Reduction: A Communications-Efficient Federated Learning Scheme for Network-Critical Applications2025-07-15 MGVQ: Could VQ-VAE Beat VAE? A Generalizable Tokenizer with Multi-group Quantization2025-07-14 Lightweight Federated Learning over Wireless Edge Networks2025-07-13 Vision Foundation Models as Effective Visual Tokenizers for Autoregressive Image Generation2025-07-11