Hai Victor Habi, Reuven Peretz, Elad Cohen, Lior Dikstein, Oranit Dror, Idit Diamant, Roy H. Jennings, Arnon Netzer
Neural network quantization enables the deployment of models on edge devices. An essential requirement for their hardware efficiency is that the quantizers are hardware-friendly: uniform, symmetric, and with power-of-two thresholds. To the best of our knowledge, current post-training quantization methods do not support all of these constraints simultaneously. In this work, we introduce a hardware-friendly post training quantization (HPTQ) framework, which addresses this problem by synergistically combining several known quantization methods. We perform a large-scale study on four tasks: classification, object detection, semantic segmentation and pose estimation over a wide variety of network architectures. Our extensive experiments show that competitive results can be obtained under hardware-friendly constraints.
| Task | Dataset | Metric | Value | Model |
|---|---|---|---|---|
| Quantization | ImageNet | Activation bits | 8 | Xception W8A8 |
| Quantization | ImageNet | Top-1 Accuracy (%) | 78.972 | Xception W8A8 |
| Quantization | ImageNet | Weight bits | 8 | Xception W8A8 |
| Quantization | ImageNet | Activation bits | 8 | EfficientNet-B0 ReLU W8A8 |
| Quantization | ImageNet | Top-1 Accuracy (%) | 77.092 | EfficientNet-B0 ReLU W8A8 |
| Quantization | ImageNet | Weight bits | 8 | EfficientNet-B0 ReLU W8A8 |
| Quantization | ImageNet | Activation bits | 8 | EfficientNet-B0 W8A8 |
| Quantization | ImageNet | Top-1 Accuracy (%) | 74.216 | EfficientNet-B0 W8A8 |
| Quantization | ImageNet | Weight bits | 8 | EfficientNet-B0 W8A8 |
| Quantization | ImageNet | Activation bits | 8 | DenseNet-121 W8A8 |
| Quantization | ImageNet | Top-1 Accuracy (%) | 73.356 | DenseNet-121 W8A8 |
| Quantization | ImageNet | Weight bits | 8 | DenseNet-121 W8A8 |
| Quantization | ImageNet | Activation bits | 8 | MobileNetV2 W8A8 |
| Quantization | ImageNet | Top-1 Accuracy (%) | 71.46 | MobileNetV2 W8A8 |
| Quantization | ImageNet | Weight bits | 8 | MobileNetV2 W8A8 |
| Quantization | COCO (Common Objects in Context) | MAP | 34.3 | SSD ResNet50 V1 FPN 640x640 |