TasksSotADatasetsPapersMethodsSubmitAbout
Papers With Code 2

A community resource for machine learning research: papers, code, benchmarks, and state-of-the-art results.

Explore

Notable BenchmarksAll SotADatasetsPapersMethods

Community

Submit ResultsAbout

Data sourced from the PWC Archive (CC-BY-SA 4.0). Built by the community, for the community.

Papers/SqueezeBERT: What can computer vision teach NLP about effi...

SqueezeBERT: What can computer vision teach NLP about efficient neural networks?

Forrest N. Iandola, Albert E. Shaw, Ravi Krishna, Kurt W. Keutzer

2020-06-19EMNLP (sustainlp) 2020 11Text ClassificationQuestion AnsweringSentiment AnalysisNatural Language InferenceTransfer LearningSemantic Textual SimilarityLinguistic Acceptability
PaperPDFCodeCode(official)CodeCodeCodeCode

Abstract

Humans read and write hundreds of billions of messages every day. Further, due to the availability of large datasets, large computing systems, and better neural network models, natural language processing (NLP) technology has made significant strides in understanding, proofreading, and organizing these messages. Thus, there is a significant opportunity to deploy NLP in myriad applications to help web users, social networks, and businesses. In particular, we consider smartphones and other mobile devices as crucial platforms for deploying NLP models at scale. However, today's highly-accurate NLP neural network models such as BERT and RoBERTa are extremely computationally expensive, with BERT-base taking 1.7 seconds to classify a text snippet on a Pixel 3 smartphone. In this work, we observe that methods such as grouped convolutions have yielded significant speedups for computer vision networks, but many of these techniques have not been adopted by NLP neural network designers. We demonstrate how to replace several operations in self-attention layers with grouped convolutions, and we use this technique in a novel network architecture called SqueezeBERT, which runs 4.3x faster than BERT-base on the Pixel 3 while achieving competitive accuracy on the GLUE test set. The SqueezeBERT code will be released.

Results

TaskDatasetMetricValueModel
Natural Language InferenceWNLIAccuracy65.1SqueezeBERT
Natural Language InferenceMultiNLIMatched82SqueezeBERT
Natural Language InferenceMultiNLIMismatched81.1SqueezeBERT
Sentiment AnalysisSST-2 Binary classificationAccuracy91.4SqueezeBERT

Related Papers

RaMen: Multi-Strategy Multi-Modal Learning for Bundle Construction2025-07-18Making Language Model a Hierarchical Classifier and Generator2025-07-17From Roots to Rewards: Dynamic Tree Reasoning with RL2025-07-17Enter the Mind Palace: Reasoning and Planning for Long-term Active Embodied Question Answering2025-07-17Vision-and-Language Training Helps Deploy Taxonomic Knowledge but Does Not Fundamentally Alter It2025-07-17City-VLM: Towards Multidomain Perception Scene Understanding via Multimodal Incomplete Learning2025-07-17AdaptiSent: Context-Aware Adaptive Attention for Multimodal Aspect-Based Sentiment Analysis2025-07-17Disentangling coincident cell events using deep transfer learning and compressive sensing2025-07-17