TasksSotADatasetsPapersMethodsSubmitAbout
Papers With Code 2

A community resource for machine learning research: papers, code, benchmarks, and state-of-the-art results.

Explore

Notable BenchmarksAll SotADatasetsPapersMethods

Community

Submit ResultsAbout

Data sourced from the PWC Archive (CC-BY-SA 4.0). Built by the community, for the community.

Papers/How to Train BERT with an Academic Budget

How to Train BERT with an Academic Budget

Peter Izsak, Moshe Berchansky, Omer Levy

2021-04-15EMNLP 2021 11Question AnsweringSentiment AnalysisNatural Language InferenceSemantic Textual SimilarityLinguistic AcceptabilityLanguage Modelling
PaperPDFCode(official)CodeCode(official)Code

Abstract

While large language models a la BERT are used ubiquitously in NLP, pretraining them is considered a luxury that only a few well-funded industry labs can afford. How can one train such models with a more modest budget? We present a recipe for pretraining a masked language model in 24 hours using a single low-end deep learning server. We demonstrate that through a combination of software optimizations, design choices, and hyperparameter tuning, it is possible to produce models that are competitive with BERT-base on GLUE tasks at a fraction of the original pretraining cost.

Results

TaskDatasetMetricValueModel
Question AnsweringQuora Question PairsAccuracy70.724hBERT
Natural Language InferenceQNLIAccuracy90.624hBERT
Natural Language InferenceMultiNLIMatched84.424hBERT
Natural Language InferenceMultiNLIMismatched83.824hBERT
Semantic Textual SimilaritySTS BenchmarkPearson Correlation0.8224hBERT
Sentiment AnalysisSST-2 Binary classificationAccuracy9324hBERT
Linguistic AcceptabilityCoLAAccuracy57.124hBERT

Related Papers

Visual-Language Model Knowledge Distillation Method for Image Quality Assessment2025-07-21From Roots to Rewards: Dynamic Tree Reasoning with RL2025-07-17Enter the Mind Palace: Reasoning and Planning for Long-term Active Embodied Question Answering2025-07-17Vision-and-Language Training Helps Deploy Taxonomic Knowledge but Does Not Fundamentally Alter It2025-07-17City-VLM: Towards Multidomain Perception Scene Understanding via Multimodal Incomplete Learning2025-07-17AdaptiSent: Context-Aware Adaptive Attention for Multimodal Aspect-Based Sentiment Analysis2025-07-17SemCSE: Semantic Contrastive Sentence Embeddings Using LLM-Generated Summaries For Scientific Abstracts2025-07-17Making Language Model a Hierarchical Classifier and Generator2025-07-17