XLNet: Generalized Autoregressive Pretraining for Language Understanding

Zhilin Yang, Zihang Dai, Yiming Yang, Jaime Carbonell, Ruslan Salakhutdinov, Quoc V. Le

2019-06-19NeurIPS 2019 12Text Classification Reading Comprehension Question Answering Chinese Reading Comprehension Paraphrase Identification Sentiment Analysis Natural Language Inference Humor Detection Audio Question Answering Semantic Textual Similarity Language Modelling Document Ranking

Paper PDF Code Code Code Code Code Code Code Code Code Code Code Code Code Code Code(official)Code Code Code Code Code Code Code Code Code Code Code Code

Abstract

With the capability of modeling bidirectional contexts, denoising autoencoding based pretraining like BERT achieves better performance than pretraining approaches based on autoregressive language modeling. However, relying on corrupting the input with masks, BERT neglects dependency between the masked positions and suffers from a pretrain-finetune discrepancy. In light of these pros and cons, we propose XLNet, a generalized autoregressive pretraining method that (1) enables learning bidirectional contexts by maximizing the expected likelihood over all permutations of the factorization order and (2) overcomes the limitations of BERT thanks to its autoregressive formulation. Furthermore, XLNet integrates ideas from Transformer-XL, the state-of-the-art autoregressive model, into pretraining. Empirically, under comparable experiment settings, XLNet outperforms BERT on 20 tasks, often by a large margin, including question answering, natural language inference, sentiment analysis, and document ranking.

Results

Task	Dataset	Metric	Value	Model
Reading Comprehension	RACE	Accuracy (High)	84	XLNet
Reading Comprehension	RACE	Accuracy (Middle)	88.6	XLNet
Question Answering	SQuAD1.1 dev	EM	89.7	XLNet (single model)
Question Answering	SQuAD1.1 dev	F1	95.1	XLNet (single model)
Question Answering	RACE	RACE	81.75	XLNet
Question Answering	RACE	RACE-m	85.45	XLNet
Question Answering	SQuAD1.1	EM	89.898	XLNet (single model)
Question Answering	SQuAD1.1	F1	95.08	XLNet (single model)
Question Answering	SQuAD1.1	EM	89.898	XLNet (single model)
Question Answering	SQuAD1.1	F1	95.08	XLNet (single model)
Question Answering	SQuAD2.0 dev	EM	87.9	XLNet (single model)
Question Answering	SQuAD2.0 dev	F1	90.6	XLNet (single model)
Question Answering	SQuAD2.0	EM	87.926	XLNet (single model)
Question Answering	SQuAD2.0	F1	90.689	XLNet (single model)
Natural Language Inference	WNLI	Accuracy	92.5	XLNet
Natural Language Inference	ANLI test	A1	70.3	XLNet (Large)
Natural Language Inference	ANLI test	A2	50.9	XLNet (Large)
Natural Language Inference	ANLI test	A3	49.4	XLNet (Large)
Natural Language Inference	MultiNLI	Matched	90.8	XLNet (single model)
Semantic Textual Similarity	STS Benchmark	Pearson Correlation	0.925	XLNet (single model)
Semantic Textual Similarity	Quora Question Pairs	Accuracy	90.3	XLNet-Large (ensemble)
Semantic Textual Similarity	Quora Question Pairs	F1	74.2	XLNet-Large (ensemble)
Sentiment Analysis	Yelp Fine-grained classification	Error	27.05	XLNet
Sentiment Analysis	SST-2 Binary classification	Accuracy	97	XLNet (single model)
Sentiment Analysis	SST-2 Binary classification	Accuracy	96.8	XLNet-Large (ensemble)
Sentiment Analysis	Yelp Binary classification	Error	1.37	XLNet
Sentiment Analysis	IMDb	Accuracy	96.21	XLNet
Ad-Hoc Information Retrieval	ClueWeb09-B	ERR@20	20.28	XLNet
Ad-Hoc Information Retrieval	ClueWeb09-B	nDCG@20	31.1	XLNet
Paraphrase Identification	Quora Question Pairs	Accuracy	90.3	XLNet-Large (ensemble)
Paraphrase Identification	Quora Question Pairs	F1	74.2	XLNet-Large (ensemble)
Text Classification	DBpedia	Error	0.62	XLNet
Text Classification	Amazon-5	Error	31.67	XLNet
Text Classification	AG News	Error	4.45	XLNet
Text Classification	Amazon-2	Error	2.11	XLNet
Humor Detection	200k Short Texts for Humor Detection	F1-score	0.92	XLNet Large Cased
Document Ranking	ClueWeb09-B	ERR@20	20.28	XLNet
Document Ranking	ClueWeb09-B	nDCG@20	31.1	XLNet
Classification	DBpedia	Error	0.62	XLNet
Classification	Amazon-5	Error	31.67	XLNet
Classification	AG News	Error	4.45	XLNet
Classification	Amazon-2	Error	2.11	XLNet

XLNet: Generalized Autoregressive Pretraining for Language Understanding

Abstract

Results

Related Papers

XLNet: Generalized Autoregressive Pretraining for Language Understanding

Abstract

Results

Related Papers