ALBERT: A Lite BERT for Self-supervised Learning of Language Representations

Zhenzhong Lan, Mingda Chen, Sebastian Goodman, Kevin Gimpel, Piyush Sharma, Radu Soricut

2019-09-26ICLR 2020 1Question Answering Multi-task Language Understanding Natural Language Inference Common Sense Reasoning Self-Supervised Learning Multimodal Intent Recognition Semantic Textual Similarity Linguistic Acceptability

Abstract

Increasing model size when pretraining natural language representations often results in improved performance on downstream tasks. However, at some point further model increases become harder due to GPU/TPU memory limitations and longer training times. To address these problems, we present two parameter-reduction techniques to lower memory consumption and increase the training speed of BERT. Comprehensive empirical evidence shows that our proposed methods lead to models that scale much better compared to the original BERT. We also use a self-supervised loss that focuses on modeling inter-sentence coherence, and show it consistently helps downstream tasks with multi-sentence inputs. As a result, our best model establishes new state-of-the-art results on the GLUE, RACE, and \squad benchmarks while having fewer parameters compared to BERT-large. The code and the pretrained models are available at https://github.com/google-research/ALBERT.

Results

Task	Dataset	Metric	Value	Model
Reading Comprehension	PhotoChat	F1	52.2	ALBERT-base
Reading Comprehension	PhotoChat	Precision	44.8	ALBERT-base
Reading Comprehension	PhotoChat	Recall	62.7	ALBERT-base
Question Answering	MultiTQ	Hits@1	10.8	ALBERT
Question Answering	MultiTQ	Hits@10	45.9	ALBERT
Question Answering	SQuAD2.0 dev	EM	85.1	ALBERT xxlarge
Question Answering	SQuAD2.0 dev	F1	88.1	ALBERT xxlarge
Question Answering	SQuAD2.0 dev	EM	83.1	ALBERT xlarge
Question Answering	SQuAD2.0 dev	F1	85.9	ALBERT xlarge
Question Answering	SQuAD2.0 dev	EM	79	ALBERT large
Question Answering	SQuAD2.0 dev	F1	82.1	ALBERT large
Question Answering	SQuAD2.0 dev	EM	76.1	ALBERT base
Question Answering	SQuAD2.0 dev	F1	79.1	ALBERT base
Question Answering	SQuAD2.0	EM	89.731	ALBERT (ensemble model)
Question Answering	SQuAD2.0	F1	92.215	ALBERT (ensemble model)
Question Answering	SQuAD2.0	EM	88.107	ALBERT (single model)
Question Answering	SQuAD2.0	F1	90.902	ALBERT (single model)
Question Answering	SQuAD2.0	EM	88.107	ALBERT (single model)
Question Answering	SQuAD2.0	F1	90.902	ALBERT (single model)
Common Sense Reasoning	CommonsenseQA	Accuracy	76.5	Albert Lan et al. (2020) (ensemble)
Natural Language Inference	WNLI	Accuracy	91.8	ALBERT
Natural Language Inference	MultiNLI	Matched	91.3	ALBERT
Semantic Textual Similarity	STS Benchmark	Pearson Correlation	0.925	ALBERT
Sentiment Analysis	SST-2 Binary classification	Accuracy	97.1	ALBERT
Intent Recognition	PhotoChat	F1	52.2	ALBERT-base
Intent Recognition	PhotoChat	Precision	44.8	ALBERT-base
Intent Recognition	PhotoChat	Recall	62.7	ALBERT-base

ALBERT: A Lite BERT for Self-supervised Learning of Language Representations

Abstract

Results

Related Papers

ALBERT: A Lite BERT for Self-supervised Learning of Language Representations

Abstract

Results

Related Papers