TasksSotADatasetsPapersMethodsSubmitAbout
Papers With Code 2

A community resource for machine learning research: papers, code, benchmarks, and state-of-the-art results.

Explore

Notable BenchmarksAll SotADatasetsPapersMethods

Community

Submit ResultsAbout

Data sourced from the PWC Archive (CC-BY-SA 4.0). Built by the community, for the community.

Papers/ALBERT: A Lite BERT for Self-supervised Learning of Langua...

ALBERT: A Lite BERT for Self-supervised Learning of Language Representations

Zhenzhong Lan, Mingda Chen, Sebastian Goodman, Kevin Gimpel, Piyush Sharma, Radu Soricut

2019-09-26ICLR 2020 1Question AnsweringMulti-task Language UnderstandingNatural Language InferenceCommon Sense ReasoningSelf-Supervised LearningMultimodal Intent RecognitionSemantic Textual SimilarityLinguistic Acceptability
PaperPDFCodeCodeCodeCodeCodeCodeCodeCodeCode(official)CodeCodeCodeCodeCodeCodeCodeCodeCodeCodeCodeCodeCodeCodeCodeCodeCodeCodeCodeCodeCodeCodeCodeCodeCodeCodeCodeCodeCodeCodeCodeCodeCodeCodeCodeCodeCodeCodeCode

Abstract

Increasing model size when pretraining natural language representations often results in improved performance on downstream tasks. However, at some point further model increases become harder due to GPU/TPU memory limitations and longer training times. To address these problems, we present two parameter-reduction techniques to lower memory consumption and increase the training speed of BERT. Comprehensive empirical evidence shows that our proposed methods lead to models that scale much better compared to the original BERT. We also use a self-supervised loss that focuses on modeling inter-sentence coherence, and show it consistently helps downstream tasks with multi-sentence inputs. As a result, our best model establishes new state-of-the-art results on the GLUE, RACE, and \squad benchmarks while having fewer parameters compared to BERT-large. The code and the pretrained models are available at https://github.com/google-research/ALBERT.

Results

TaskDatasetMetricValueModel
Reading ComprehensionPhotoChatF152.2ALBERT-base
Reading ComprehensionPhotoChatPrecision44.8ALBERT-base
Reading ComprehensionPhotoChatRecall62.7ALBERT-base
Question AnsweringMultiTQHits@110.8ALBERT
Question AnsweringMultiTQHits@1045.9ALBERT
Question AnsweringSQuAD2.0 devEM85.1ALBERT xxlarge
Question AnsweringSQuAD2.0 devF188.1ALBERT xxlarge
Question AnsweringSQuAD2.0 devEM83.1ALBERT xlarge
Question AnsweringSQuAD2.0 devF185.9ALBERT xlarge
Question AnsweringSQuAD2.0 devEM79ALBERT large
Question AnsweringSQuAD2.0 devF182.1ALBERT large
Question AnsweringSQuAD2.0 devEM76.1ALBERT base
Question AnsweringSQuAD2.0 devF179.1ALBERT base
Question AnsweringSQuAD2.0EM89.731ALBERT (ensemble model)
Question AnsweringSQuAD2.0F192.215ALBERT (ensemble model)
Question AnsweringSQuAD2.0EM88.107ALBERT (single model)
Question AnsweringSQuAD2.0F190.902ALBERT (single model)
Question AnsweringSQuAD2.0EM88.107ALBERT (single model)
Question AnsweringSQuAD2.0F190.902ALBERT (single model)
Common Sense ReasoningCommonsenseQAAccuracy76.5Albert Lan et al. (2020) (ensemble)
Natural Language InferenceWNLIAccuracy91.8ALBERT
Natural Language InferenceMultiNLIMatched91.3ALBERT
Semantic Textual SimilaritySTS BenchmarkPearson Correlation0.925ALBERT
Sentiment AnalysisSST-2 Binary classificationAccuracy97.1ALBERT
Intent RecognitionPhotoChatF152.2ALBERT-base
Intent RecognitionPhotoChatPrecision44.8ALBERT-base
Intent RecognitionPhotoChatRecall62.7ALBERT-base

Related Papers

From Roots to Rewards: Dynamic Tree Reasoning with RL2025-07-17Enter the Mind Palace: Reasoning and Planning for Long-term Active Embodied Question Answering2025-07-17Vision-and-Language Training Helps Deploy Taxonomic Knowledge but Does Not Fundamentally Alter It2025-07-17City-VLM: Towards Multidomain Perception Scene Understanding via Multimodal Incomplete Learning2025-07-17Comparing Apples to Oranges: A Dataset & Analysis of LLM Humour Understanding from Traditional Puns to Topical Jokes2025-07-17A Semi-Supervised Learning Method for the Identification of Bad Exposures in Large Imaging Surveys2025-07-17SemCSE: Semantic Contrastive Sentence Embeddings Using LLM-Generated Summaries For Scientific Abstracts2025-07-17Describe Anything Model for Visual Question Answering on Text-rich Images2025-07-16