Tasks SotA Datasets Papers Methods Submit About

Papers With Code 2

A community resource for machine learning research: papers, code, benchmarks, and state-of-the-art results.

Explore

Notable Benchmarks All SotA Datasets Papers Methods

Community

Submit Results About

Data sourced from the PWC Archive (CC-BY-SA 4.0). Built by the community, for the community.

Models/XLNet

XLNet

Reported on 55 benchmarks across 14 tasks · 8 papers · 32 SOTA

Note: results are matched by exact model name. Different papers may use the same name for different model variants.

Natural Language Processing36 results

Text ClassificationonIMDb Movie Reviews
Accuracy (2 classes)· 2024-01-30
0.9387
SOTA
Breaking Free Transformer Models: Task-specific Context Attribution Promises Improved Generalizability Without Fine-tuning Pre-trained LLMs arXiv:2401.16638
Text ClassificationonCivil Comments
Recall· 2023-01-26
0.9254
SOTA
A benchmark for toxic comment classification on Civil Comments dataset arXiv:2301.11125
Binary text classificationonTweepFake
Accuracy (%)· 2020-07-31
87.7
best: 94.3 (GigaCheck (Mistral-7B))
SOTA
TweepFake: about Detecting Deepfake Tweets arXiv:2008.00036
Binary text classificationonTweepFake
F1 score· 2020-07-31
0.882
best: 0.942 (GigaCheck (Mistral-7B))
SOTA
TweepFake: about Detecting Deepfake Tweets arXiv:2008.00036
Negation DetectiononBioScope : Full Papers
F1· uses extra data· 2020-01-09
94.4
SOTA
Resolving the Scope of Speculation and Negation using Transformer-Based Architectures arXiv:2001.02885
Negation DetectiononSFU Review Corpus
F1· uses extra data· 2020-01-09
91.25
SOTA
Resolving the Scope of Speculation and Negation using Transformer-Based Architectures arXiv:2001.02885
Negation DetectiononBioScope : Abstracts
F1· uses extra data· 2020-01-09
95.74
best: 98.94 (NegBioELECTRA)
SOTA
Resolving the Scope of Speculation and Negation using Transformer-Based Architectures arXiv:2001.02885
Speculation DetectiononSFU Review Corpus
F1· uses extra data· 2020-01-09
91
SOTA
Resolving the Scope of Speculation and Negation using Transformer-Based Architectures arXiv:2001.02885
Speculation DetectiononBioScope : Full Papers
F1· uses extra data· 2020-01-09
96.91
SOTA
Resolving the Scope of Speculation and Negation using Transformer-Based Architectures arXiv:2001.02885
Speculation DetectiononBioScope : Abstracts
F1· uses extra data· 2020-01-09
97.87
best: 98.37 (NegBioELECTRA)
SOTA
Resolving the Scope of Speculation and Negation using Transformer-Based Architectures arXiv:2001.02885
Reading ComprehensiononRACE
Accuracy (High)· 2019-06-19
84
best: 92.6 (ALBERTxxlarge+DUMA(ensemble))
SOTA
XLNet: Generalized Autoregressive Pretraining for Language Understanding arXiv:1906.08237
Reading ComprehensiononRACE
Accuracy (Middle)· 2019-06-19
88.6
best: 93.1 (Megatron-BERT (ensemble))
SOTA
XLNet: Generalized Autoregressive Pretraining for Language Understanding arXiv:1906.08237
Question AnsweringonRACE
RACE· 2019-06-19
81.75
SOTA
XLNet: Generalized Autoregressive Pretraining for Language Understanding arXiv:1906.08237
Question AnsweringonRACE
RACE-m· 2019-06-19
85.45
SOTA
XLNet: Generalized Autoregressive Pretraining for Language Understanding arXiv:1906.08237
Natural Language InferenceonWNLI
Accuracy· 2019-06-19
92.5
best: 95.9 (Turing NLR v5 XXL 5.4B (fine-tuned))
SOTA
XLNet: Generalized Autoregressive Pretraining for Language Understanding arXiv:1906.08237
Sentiment AnalysisonYelp Fine-grained classification
Error· 2019-06-19
27.05
SOTA
XLNet: Generalized Autoregressive Pretraining for Language Understanding arXiv:1906.08237
Sentiment AnalysisonYelp Binary classification
Error· 2019-06-19
1.37
SOTA
XLNet: Generalized Autoregressive Pretraining for Language Understanding arXiv:1906.08237
Sentiment AnalysisonIMDb
Accuracy· uses extra data· 2019-06-19
96.21
best: 96.68 (RoBERTa-large with LlamBERT)
SOTA
XLNet: Generalized Autoregressive Pretraining for Language Understanding arXiv:1906.08237
Ad-Hoc Information RetrievalonClueWeb09-B
ERR@20· 2019-06-19
20.28
SOTA
XLNet: Generalized Autoregressive Pretraining for Language Understanding arXiv:1906.08237
Ad-Hoc Information RetrievalonClueWeb09-B
nDCG@20· 2019-06-19
31.1
SOTA
XLNet: Generalized Autoregressive Pretraining for Language Understanding arXiv:1906.08237
Text ClassificationonDBpedia
Error· 2019-06-19
0.62
SOTA
XLNet: Generalized Autoregressive Pretraining for Language Understanding arXiv:1906.08237
Text ClassificationonAmazon-5
Error· 2019-06-19
31.67
SOTA
XLNet: Generalized Autoregressive Pretraining for Language Understanding arXiv:1906.08237
Text ClassificationonAG News
Error· 2019-06-19
4.45
SOTA
XLNet: Generalized Autoregressive Pretraining for Language Understanding arXiv:1906.08237
Text ClassificationonAmazon-2
Error· 2019-06-19
2.11
SOTA
XLNet: Generalized Autoregressive Pretraining for Language Understanding arXiv:1906.08237
Document RankingonClueWeb09-B
ERR@20· 2019-06-19
20.28
SOTA
XLNet: Generalized Autoregressive Pretraining for Language Understanding arXiv:1906.08237
Document RankingonClueWeb09-B
nDCG@20· 2019-06-19
31.1
SOTA
XLNet: Generalized Autoregressive Pretraining for Language Understanding arXiv:1906.08237
Text ClassificationonUK Key Stage Readability
F1· 2024-11-26
74
best: 99.6 (ELECTRA + ANN)
What Differentiates Educational Literature? A Multimodal Fusion Approach of Transformers and Computational Linguistics arXiv:2411.17593
Text ClassificationonHateXplain
Accuracy (2 classes)· 2024-01-30
0.816
best: 0.8798 (Space-XLNet)
Breaking Free Transformer Models: Task-specific Context Attribution Promises Improved Generalizability Without Fine-tuning Pre-trained LLMs arXiv:2401.16638
Text ClassificationonHateXplain
F1 Macro· 2024-01-30
0.8156
best: 0.8797 (Space-XLNet)
Breaking Free Transformer Models: Task-specific Context Attribution Promises Improved Generalizability Without Fine-tuning Pre-trained LLMs arXiv:2401.16638
Text ClassificationonCivil Comments
GMB BNSP· 2023-01-26
0.9597
best: 0.9644 (DistilBERT)
A benchmark for toxic comment classification on Civil Comments dataset arXiv:2301.11125
Text ClassificationonCivil Comments
GMB BPSN· 2023-01-26
0.8834
best: 0.901 (RoBERTa Focal Loss)
A benchmark for toxic comment classification on Civil Comments dataset arXiv:2301.11125
Text ClassificationonCivil Comments
GMB Subgroup· 2023-01-26
0.8689
best: 0.8807 (RoBERTa Focal Loss)
A benchmark for toxic comment classification on Civil Comments dataset arXiv:2301.11125
Text ClassificationonCivil Comments
Macro F1· 2023-01-26
0.3336
best: 0.4749 (RoBERTa BCE)
A benchmark for toxic comment classification on Civil Comments dataset arXiv:2301.11125
Text ClassificationonCivil Comments
Micro F1· 2023-01-26
0.4586
best: 0.5958 (Unfreeze Glove ResNet 44)
A benchmark for toxic comment classification on Civil Comments dataset arXiv:2301.11125
Text ClassificationonCivil Comments
Precision· 2023-01-26
0.3045
best: 0.4835 (Unfreeze Glove ResNet 44)
A benchmark for toxic comment classification on Civil Comments dataset arXiv:2301.11125
Named Entity Recognition (NER)onCoNLL 2003 (English)
F1· 2021-12-15
93.28
best: 94.6 (ACE + document-context)
Named entity recognition architecture combining contextual and global features arXiv:2112.08033

Methodology19 results

ClassificationonIMDb Movie Reviews
Accuracy (2 classes)· 2024-01-30
0.9387
SOTA
Breaking Free Transformer Models: Task-specific Context Attribution Promises Improved Generalizability Without Fine-tuning Pre-trained LLMs arXiv:2401.16638
ClassificationonCivil Comments
Recall· 2023-01-26
0.9254
SOTA
A benchmark for toxic comment classification on Civil Comments dataset arXiv:2301.11125
ClassificationonDBpedia
Error· 2019-06-19
0.62
SOTA
XLNet: Generalized Autoregressive Pretraining for Language Understanding arXiv:1906.08237
ClassificationonAmazon-5
Error· 2019-06-19
31.67
SOTA
XLNet: Generalized Autoregressive Pretraining for Language Understanding arXiv:1906.08237
ClassificationonAG News
Error· 2019-06-19
4.45
SOTA
XLNet: Generalized Autoregressive Pretraining for Language Understanding arXiv:1906.08237
ClassificationonAmazon-2
Error· 2019-06-19
2.11
SOTA
XLNet: Generalized Autoregressive Pretraining for Language Understanding arXiv:1906.08237
ClassificationonUK Key Stage Readability
F1· 2024-11-26
74
best: 99.6 (ELECTRA + ANN)
What Differentiates Educational Literature? A Multimodal Fusion Approach of Transformers and Computational Linguistics arXiv:2411.17593
ClassificationonHateXplain
Accuracy (2 classes)· 2024-01-30
0.816
best: 0.8798 (Space-XLNet)
Breaking Free Transformer Models: Task-specific Context Attribution Promises Improved Generalizability Without Fine-tuning Pre-trained LLMs arXiv:2401.16638
ClassificationonHateXplain
F1 Macro· 2024-01-30
0.8156
best: 0.8797 (Space-XLNet)
Breaking Free Transformer Models: Task-specific Context Attribution Promises Improved Generalizability Without Fine-tuning Pre-trained LLMs arXiv:2401.16638
Data MiningonIMDb Movie Reviews
Accuracy· 2023-08-07
94.8
best: 95.6 (ELECTRA)
Analysis of the Evolution of Advanced Transformer-Based Language Models: Experiments on Opinion Mining arXiv:2308.03235
Data MiningonIMDb Movie Reviews
F1· 2023-08-07
94.9
best: 95.6 (ELECTRA)
Analysis of the Evolution of Advanced Transformer-Based Language Models: Experiments on Opinion Mining arXiv:2308.03235
Interpretable Machine LearningonIMDb Movie Reviews
Accuracy· 2023-08-07
94.8
best: 95.6 (ELECTRA)
Analysis of the Evolution of Advanced Transformer-Based Language Models: Experiments on Opinion Mining arXiv:2308.03235
Interpretable Machine LearningonIMDb Movie Reviews
F1· 2023-08-07
94.9
best: 95.6 (ELECTRA)
Analysis of the Evolution of Advanced Transformer-Based Language Models: Experiments on Opinion Mining arXiv:2308.03235
ClassificationonCivil Comments
GMB BNSP· 2023-01-26
0.9597
best: 0.9644 (DistilBERT)
A benchmark for toxic comment classification on Civil Comments dataset arXiv:2301.11125
ClassificationonCivil Comments
GMB BPSN· 2023-01-26
0.8834
best: 0.901 (RoBERTa Focal Loss)
A benchmark for toxic comment classification on Civil Comments dataset arXiv:2301.11125
ClassificationonCivil Comments
GMB Subgroup· 2023-01-26
0.8689
best: 0.8807 (RoBERTa Focal Loss)
A benchmark for toxic comment classification on Civil Comments dataset arXiv:2301.11125
ClassificationonCivil Comments
Macro F1· 2023-01-26
0.3336
best: 0.4749 (RoBERTa BCE)
A benchmark for toxic comment classification on Civil Comments dataset arXiv:2301.11125
ClassificationonCivil Comments
Micro F1· 2023-01-26
0.4586
best: 0.5958 (Unfreeze Glove ResNet 44)
A benchmark for toxic comment classification on Civil Comments dataset arXiv:2301.11125
ClassificationonCivil Comments
Precision· 2023-01-26
0.3045
best: 0.4835 (Unfreeze Glove ResNet 44)
A benchmark for toxic comment classification on Civil Comments dataset arXiv:2301.11125