Q8BERT: Quantized 8Bit BERT

Ofir Zafrir, Guy Boudoukh, Peter Izsak, Moshe Wasserblat

2019-10-14Sentiment Analysis Quantization Natural Language Inference Semantic Textual Similarity Linguistic Acceptability

Paper PDF Code Code Code(official)Code Code(official)

Abstract

Recently, pre-trained Transformer based language models such as BERT and GPT, have shown great improvement in many Natural Language Processing (NLP) tasks. However, these models contain a large amount of parameters. The emergence of even larger and more accurate models such as GPT2 and Megatron, suggest a trend of large pre-trained Transformer models. However, using these large models in production environments is a complex task requiring a large amount of compute, memory and power resources. In this work we show how to perform quantization-aware training during the fine-tuning phase of BERT in order to compress BERT by $4\times$ with minimal accuracy loss. Furthermore, the produced quantized model can accelerate inference speed if it is optimized for 8bit Integer supporting hardware.

Results

Task	Dataset	Metric	Value	Model
Natural Language Inference	QNLI	Accuracy	93	Q8BERT (Zafrir et al., 2019)
Natural Language Inference	RTE	Accuracy	84.8	Q8BERT (Zafrir et al., 2019)
Natural Language Inference	MultiNLI	Matched	85.6	Q8BERT (Zafrir et al., 2019)
Semantic Textual Similarity	MRPC	Accuracy	89.7	Q8BERT (Zafrir et al., 2019)
Semantic Textual Similarity	STS Benchmark	Pearson Correlation	0.911	Q8BERT (Zafrir et al., 2019)
Sentiment Analysis	SST-2 Binary classification	Accuracy	94.7	Q8BERT (Zafrir et al., 2019)
Linguistic Acceptability	CoLA	Accuracy	65	Q8BERT (Zafrir et al., 2019)

Related Papers

Efficient Deployment of Spiking Neural Networks on SpiNNaker2 for DVS Gesture Recognition Using Neuromorphic Intermediate Representation2025-09-04 An End-to-End DNN Inference Framework for the SpiNNaker2 Neuromorphic MPSoC2025-07-18 AdaptiSent: Context-Aware Adaptive Attention for Multimodal Aspect-Based Sentiment Analysis2025-07-17 Task-Specific Audio Coding for Machines: Machine-Learned Latent Features Are Codes for That Machine2025-07-17 Angle Estimation of a Single Source with Massive Uniform Circular Arrays2025-07-17 SemCSE: Semantic Contrastive Sentence Embeddings Using LLM-Generated Summaries For Scientific Abstracts2025-07-17 AI Wizards at CheckThat! 2025: Enhancing Transformer-Based Embeddings with Sentiment for Subjectivity Detection in News Articles2025-07-15 DCR: Quantifying Data Contamination in LLMs Evaluation2025-07-15