TasksSotADatasetsPapersMethodsSubmitAbout
Papers With Code 2

A community resource for machine learning research: papers, code, benchmarks, and state-of-the-art results.

Explore

Notable BenchmarksAll SotADatasetsPapersMethods

Community

Submit ResultsAbout

Data sourced from the PWC Archive (CC-BY-SA 4.0). Built by the community, for the community.

Papers/Q8BERT: Quantized 8Bit BERT

Q8BERT: Quantized 8Bit BERT

Ofir Zafrir, Guy Boudoukh, Peter Izsak, Moshe Wasserblat

2019-10-14Sentiment AnalysisQuantizationNatural Language InferenceSemantic Textual SimilarityLinguistic Acceptability
PaperPDFCodeCodeCode(official)CodeCode(official)

Abstract

Recently, pre-trained Transformer based language models such as BERT and GPT, have shown great improvement in many Natural Language Processing (NLP) tasks. However, these models contain a large amount of parameters. The emergence of even larger and more accurate models such as GPT2 and Megatron, suggest a trend of large pre-trained Transformer models. However, using these large models in production environments is a complex task requiring a large amount of compute, memory and power resources. In this work we show how to perform quantization-aware training during the fine-tuning phase of BERT in order to compress BERT by $4\times$ with minimal accuracy loss. Furthermore, the produced quantized model can accelerate inference speed if it is optimized for 8bit Integer supporting hardware.

Results

TaskDatasetMetricValueModel
Natural Language InferenceQNLIAccuracy93Q8BERT (Zafrir et al., 2019)
Natural Language InferenceRTEAccuracy84.8Q8BERT (Zafrir et al., 2019)
Natural Language InferenceMultiNLIMatched85.6Q8BERT (Zafrir et al., 2019)
Semantic Textual SimilarityMRPCAccuracy89.7Q8BERT (Zafrir et al., 2019)
Semantic Textual SimilaritySTS BenchmarkPearson Correlation0.911Q8BERT (Zafrir et al., 2019)
Sentiment AnalysisSST-2 Binary classificationAccuracy94.7Q8BERT (Zafrir et al., 2019)
Linguistic AcceptabilityCoLAAccuracy65Q8BERT (Zafrir et al., 2019)

Related Papers

Efficient Deployment of Spiking Neural Networks on SpiNNaker2 for DVS Gesture Recognition Using Neuromorphic Intermediate Representation2025-09-04An End-to-End DNN Inference Framework for the SpiNNaker2 Neuromorphic MPSoC2025-07-18AdaptiSent: Context-Aware Adaptive Attention for Multimodal Aspect-Based Sentiment Analysis2025-07-17Task-Specific Audio Coding for Machines: Machine-Learned Latent Features Are Codes for That Machine2025-07-17Angle Estimation of a Single Source with Massive Uniform Circular Arrays2025-07-17SemCSE: Semantic Contrastive Sentence Embeddings Using LLM-Generated Summaries For Scientific Abstracts2025-07-17AI Wizards at CheckThat! 2025: Enhancing Transformer-Based Embeddings with Sentiment for Subjectivity Detection in News Articles2025-07-15DCR: Quantifying Data Contamination in LLMs Evaluation2025-07-15