TasksSotADatasetsPapersMethodsSubmitAbout
Papers With Code 2

A community resource for machine learning research: papers, code, benchmarks, and state-of-the-art results.

Explore

Notable BenchmarksAll SotADatasetsPapersMethods

Community

Submit ResultsAbout

Data sourced from the PWC Archive (CC-BY-SA 4.0). Built by the community, for the community.

Papers/XLNet: Generalized Autoregressive Pretraining for Language...

XLNet: Generalized Autoregressive Pretraining for Language Understanding

Zhilin Yang, Zihang Dai, Yiming Yang, Jaime Carbonell, Ruslan Salakhutdinov, Quoc V. Le

2019-06-19NeurIPS 2019 12Text ClassificationReading ComprehensionQuestion AnsweringChinese Reading ComprehensionParaphrase IdentificationSentiment AnalysisNatural Language InferenceHumor DetectionAudio Question AnsweringSemantic Textual SimilarityLanguage ModellingDocument Ranking
PaperPDFCodeCodeCodeCodeCodeCodeCodeCodeCodeCodeCodeCodeCodeCodeCode(official)CodeCodeCodeCodeCodeCodeCodeCodeCodeCodeCodeCode

Abstract

With the capability of modeling bidirectional contexts, denoising autoencoding based pretraining like BERT achieves better performance than pretraining approaches based on autoregressive language modeling. However, relying on corrupting the input with masks, BERT neglects dependency between the masked positions and suffers from a pretrain-finetune discrepancy. In light of these pros and cons, we propose XLNet, a generalized autoregressive pretraining method that (1) enables learning bidirectional contexts by maximizing the expected likelihood over all permutations of the factorization order and (2) overcomes the limitations of BERT thanks to its autoregressive formulation. Furthermore, XLNet integrates ideas from Transformer-XL, the state-of-the-art autoregressive model, into pretraining. Empirically, under comparable experiment settings, XLNet outperforms BERT on 20 tasks, often by a large margin, including question answering, natural language inference, sentiment analysis, and document ranking.

Results

TaskDatasetMetricValueModel
Reading ComprehensionRACEAccuracy (High)84XLNet
Reading ComprehensionRACEAccuracy (Middle)88.6XLNet
Question AnsweringSQuAD1.1 devEM89.7XLNet (single model)
Question AnsweringSQuAD1.1 devF195.1XLNet (single model)
Question AnsweringRACERACE81.75XLNet
Question AnsweringRACERACE-m85.45XLNet
Question AnsweringSQuAD1.1EM89.898XLNet (single model)
Question AnsweringSQuAD1.1F195.08XLNet (single model)
Question AnsweringSQuAD1.1EM89.898XLNet (single model)
Question AnsweringSQuAD1.1F195.08XLNet (single model)
Question AnsweringSQuAD2.0 devEM87.9XLNet (single model)
Question AnsweringSQuAD2.0 devF190.6XLNet (single model)
Question AnsweringSQuAD2.0EM87.926XLNet (single model)
Question AnsweringSQuAD2.0F190.689XLNet (single model)
Natural Language InferenceWNLIAccuracy92.5XLNet
Natural Language InferenceANLI testA170.3XLNet (Large)
Natural Language InferenceANLI testA250.9XLNet (Large)
Natural Language InferenceANLI testA349.4XLNet (Large)
Natural Language InferenceMultiNLIMatched90.8XLNet (single model)
Semantic Textual SimilaritySTS BenchmarkPearson Correlation0.925XLNet (single model)
Semantic Textual SimilarityQuora Question PairsAccuracy90.3XLNet-Large (ensemble)
Semantic Textual SimilarityQuora Question PairsF174.2XLNet-Large (ensemble)
Sentiment AnalysisYelp Fine-grained classificationError27.05XLNet
Sentiment AnalysisSST-2 Binary classificationAccuracy97XLNet (single model)
Sentiment AnalysisSST-2 Binary classificationAccuracy96.8XLNet-Large (ensemble)
Sentiment AnalysisYelp Binary classificationError1.37XLNet
Sentiment AnalysisIMDbAccuracy96.21XLNet
Ad-Hoc Information RetrievalClueWeb09-BERR@2020.28XLNet
Ad-Hoc Information RetrievalClueWeb09-BnDCG@2031.1XLNet
Paraphrase IdentificationQuora Question PairsAccuracy90.3XLNet-Large (ensemble)
Paraphrase IdentificationQuora Question PairsF174.2XLNet-Large (ensemble)
Text ClassificationDBpediaError0.62XLNet
Text ClassificationAmazon-5Error31.67XLNet
Text ClassificationAG NewsError4.45XLNet
Text ClassificationAmazon-2Error2.11XLNet
Humor Detection200k Short Texts for Humor DetectionF1-score0.92XLNet Large Cased
Document RankingClueWeb09-BERR@2020.28XLNet
Document RankingClueWeb09-BnDCG@2031.1XLNet
ClassificationDBpediaError0.62XLNet
ClassificationAmazon-5Error31.67XLNet
ClassificationAG NewsError4.45XLNet
ClassificationAmazon-2Error2.11XLNet

Related Papers

Visual-Language Model Knowledge Distillation Method for Image Quality Assessment2025-07-21Making Language Model a Hierarchical Classifier and Generator2025-07-17From Roots to Rewards: Dynamic Tree Reasoning with RL2025-07-17Enter the Mind Palace: Reasoning and Planning for Long-term Active Embodied Question Answering2025-07-17Vision-and-Language Training Helps Deploy Taxonomic Knowledge but Does Not Fundamentally Alter It2025-07-17City-VLM: Towards Multidomain Perception Scene Understanding via Multimodal Incomplete Learning2025-07-17AdaptiSent: Context-Aware Adaptive Attention for Multimodal Aspect-Based Sentiment Analysis2025-07-17SemCSE: Semantic Contrastive Sentence Embeddings Using LLM-Generated Summaries For Scientific Abstracts2025-07-17