TasksSotADatasetsPapersMethodsSubmitAbout
Papers With Code 2

A community resource for machine learning research: papers, code, benchmarks, and state-of-the-art results.

Explore

Notable BenchmarksAll SotADatasetsPapersMethods

Community

Submit ResultsAbout

Data sourced from the PWC Archive (CC-BY-SA 4.0). Built by the community, for the community.

Papers/Q-BERT: Hessian Based Ultra Low Precision Quantization of ...

Q-BERT: Hessian Based Ultra Low Precision Quantization of BERT

Sheng Shen, Zhen Dong, Jiayu Ye, Linjian Ma, Zhewei Yao, Amir Gholami, Michael W. Mahoney, Kurt Keutzer

2019-09-12Sentiment AnalysisQuantizationNatural Language InferenceSemantic Textual SimilarityLinguistic Acceptability
PaperPDF

Abstract

Transformer based architectures have become de-facto models used for a range of Natural Language Processing tasks. In particular, the BERT based models achieved significant accuracy gain for GLUE tasks, CoNLL-03 and SQuAD. However, BERT based models have a prohibitive memory footprint and latency. As a result, deploying BERT based models in resource constrained environments has become a challenging task. In this work, we perform an extensive analysis of fine-tuned BERT models using second order Hessian information, and we use our results to propose a novel method for quantizing BERT models to ultra low precision. In particular, we propose a new group-wise quantization scheme, and we use a Hessian based mix-precision method to compress the model further. We extensively test our proposed method on BERT downstream tasks of SST-2, MNLI, CoNLL-03, and SQuAD. We can achieve comparable performance to baseline with at most $2.3\%$ performance degradation, even with ultra-low precision quantization down to 2 bits, corresponding up to $13\times$ compression of the model parameters, and up to $4\times$ compression of the embedding table as well as activations. Among all tasks, we observed the highest performance loss for BERT fine-tuned on SQuAD. By probing into the Hessian based analysis as well as visualization, we show that this is related to the fact that current training/fine-tuning strategy of BERT does not converge for SQuAD.

Results

TaskDatasetMetricValueModel
Natural Language InferenceQNLIAccuracy93Q-BERT (Shen et al., 2020)
Natural Language InferenceRTEAccuracy84.7Q-BERT (Shen et al., 2020)
Natural Language InferenceMultiNLIMatched87.8Q-BERT (Shen et al., 2020)
Semantic Textual SimilarityMRPCAccuracy88.2Q-BERT (Shen et al., 2020)
Semantic Textual SimilaritySTS BenchmarkPearson Correlation0.911Q-BERT (Shen et al., 2020)
Sentiment AnalysisSST-2 Binary classificationAccuracy94.8Q-BERT (Shen et al., 2020)
Linguistic AcceptabilityCoLAAccuracy65.1Q-BERT (Shen et al., 2020)

Related Papers

Efficient Deployment of Spiking Neural Networks on SpiNNaker2 for DVS Gesture Recognition Using Neuromorphic Intermediate Representation2025-09-04An End-to-End DNN Inference Framework for the SpiNNaker2 Neuromorphic MPSoC2025-07-18AdaptiSent: Context-Aware Adaptive Attention for Multimodal Aspect-Based Sentiment Analysis2025-07-17Task-Specific Audio Coding for Machines: Machine-Learned Latent Features Are Codes for That Machine2025-07-17Angle Estimation of a Single Source with Massive Uniform Circular Arrays2025-07-17SemCSE: Semantic Contrastive Sentence Embeddings Using LLM-Generated Summaries For Scientific Abstracts2025-07-17AI Wizards at CheckThat! 2025: Enhancing Transformer-Based Embeddings with Sentiment for Subjectivity Detection in News Articles2025-07-15DCR: Quantifying Data Contamination in LLMs Evaluation2025-07-15