TasksSotADatasetsPapersMethodsSubmitAbout
Papers With Code 2

A community resource for machine learning research: papers, code, benchmarks, and state-of-the-art results.

Explore

Notable BenchmarksAll SotADatasetsPapersMethods

Community

Submit ResultsAbout

Data sourced from the PWC Archive (CC-BY-SA 4.0). Built by the community, for the community.

Models/XLNet

XLNet

Reported on 55 benchmarks across 14 tasks · 8 papers · 32 SOTA

Note: results are matched by exact model name. Different papers may use the same name for different model variants.

Natural Language Processing36 results

  • Text ClassificationonIMDb Movie Reviews
    Accuracy (2 classes)· 2024-01-30
    0.9387
    SOTA
    Breaking Free Transformer Models: Task-specific Context Attribution Promises Improved Generalizability Without Fine-tuning Pre-trained LLMsarXiv:2401.16638
  • Text ClassificationonCivil Comments
    Recall· 2023-01-26
    0.9254
    SOTA
    A benchmark for toxic comment classification on Civil Comments datasetarXiv:2301.11125
  • Binary text classificationonTweepFake
    Accuracy (%)· 2020-07-31
    87.7
    best: 94.3 (GigaCheck (Mistral-7B))
    SOTA
    TweepFake: about Detecting Deepfake TweetsarXiv:2008.00036
  • Binary text classificationonTweepFake
    F1 score· 2020-07-31
    0.882
    best: 0.942 (GigaCheck (Mistral-7B))
    SOTA
    TweepFake: about Detecting Deepfake TweetsarXiv:2008.00036
  • Negation DetectiononBioScope : Full Papers
    F1· uses extra data· 2020-01-09
    94.4
    SOTA
    Resolving the Scope of Speculation and Negation using Transformer-Based ArchitecturesarXiv:2001.02885
  • Negation DetectiononSFU Review Corpus
    F1· uses extra data· 2020-01-09
    91.25
    SOTA
    Resolving the Scope of Speculation and Negation using Transformer-Based ArchitecturesarXiv:2001.02885
  • Negation DetectiononBioScope : Abstracts
    F1· uses extra data· 2020-01-09
    95.74
    best: 98.94 (NegBioELECTRA)
    SOTA
    Resolving the Scope of Speculation and Negation using Transformer-Based ArchitecturesarXiv:2001.02885
  • Speculation DetectiononSFU Review Corpus
    F1· uses extra data· 2020-01-09
    91
    SOTA
    Resolving the Scope of Speculation and Negation using Transformer-Based ArchitecturesarXiv:2001.02885
  • Speculation DetectiononBioScope : Full Papers
    F1· uses extra data· 2020-01-09
    96.91
    SOTA
    Resolving the Scope of Speculation and Negation using Transformer-Based ArchitecturesarXiv:2001.02885
  • Speculation DetectiononBioScope : Abstracts
    F1· uses extra data· 2020-01-09
    97.87
    best: 98.37 (NegBioELECTRA)
    SOTA
    Resolving the Scope of Speculation and Negation using Transformer-Based ArchitecturesarXiv:2001.02885
  • Reading ComprehensiononRACE
    Accuracy (High)· 2019-06-19
    84
    best: 92.6 (ALBERTxxlarge+DUMA(ensemble))
    SOTA
    XLNet: Generalized Autoregressive Pretraining for Language UnderstandingarXiv:1906.08237
  • Reading ComprehensiononRACE
    Accuracy (Middle)· 2019-06-19
    88.6
    best: 93.1 (Megatron-BERT (ensemble))
    SOTA
    XLNet: Generalized Autoregressive Pretraining for Language UnderstandingarXiv:1906.08237
  • Question AnsweringonRACE
    RACE· 2019-06-19
    81.75
    SOTA
    XLNet: Generalized Autoregressive Pretraining for Language UnderstandingarXiv:1906.08237
  • Question AnsweringonRACE
    RACE-m· 2019-06-19
    85.45
    SOTA
    XLNet: Generalized Autoregressive Pretraining for Language UnderstandingarXiv:1906.08237
  • Natural Language InferenceonWNLI
    Accuracy· 2019-06-19
    92.5
    best: 95.9 (Turing NLR v5 XXL 5.4B (fine-tuned))
    SOTA
    XLNet: Generalized Autoregressive Pretraining for Language UnderstandingarXiv:1906.08237
  • Sentiment AnalysisonYelp Fine-grained classification
    Error· 2019-06-19
    27.05
    SOTA
    XLNet: Generalized Autoregressive Pretraining for Language UnderstandingarXiv:1906.08237
  • Sentiment AnalysisonYelp Binary classification
    Error· 2019-06-19
    1.37
    SOTA
    XLNet: Generalized Autoregressive Pretraining for Language UnderstandingarXiv:1906.08237
  • Sentiment AnalysisonIMDb
    Accuracy· uses extra data· 2019-06-19
    96.21
    best: 96.68 (RoBERTa-large with LlamBERT)
    SOTA
    XLNet: Generalized Autoregressive Pretraining for Language UnderstandingarXiv:1906.08237
  • Ad-Hoc Information RetrievalonClueWeb09-B
    ERR@20· 2019-06-19
    20.28
    SOTA
    XLNet: Generalized Autoregressive Pretraining for Language UnderstandingarXiv:1906.08237
  • Ad-Hoc Information RetrievalonClueWeb09-B
    nDCG@20· 2019-06-19
    31.1
    SOTA
    XLNet: Generalized Autoregressive Pretraining for Language UnderstandingarXiv:1906.08237
  • Text ClassificationonDBpedia
    Error· 2019-06-19
    0.62
    SOTA
    XLNet: Generalized Autoregressive Pretraining for Language UnderstandingarXiv:1906.08237
  • Text ClassificationonAmazon-5
    Error· 2019-06-19
    31.67
    SOTA
    XLNet: Generalized Autoregressive Pretraining for Language UnderstandingarXiv:1906.08237
  • Text ClassificationonAG News
    Error· 2019-06-19
    4.45
    SOTA
    XLNet: Generalized Autoregressive Pretraining for Language UnderstandingarXiv:1906.08237
  • Text ClassificationonAmazon-2
    Error· 2019-06-19
    2.11
    SOTA
    XLNet: Generalized Autoregressive Pretraining for Language UnderstandingarXiv:1906.08237
  • Document RankingonClueWeb09-B
    ERR@20· 2019-06-19
    20.28
    SOTA
    XLNet: Generalized Autoregressive Pretraining for Language UnderstandingarXiv:1906.08237
  • Document RankingonClueWeb09-B
    nDCG@20· 2019-06-19
    31.1
    SOTA
    XLNet: Generalized Autoregressive Pretraining for Language UnderstandingarXiv:1906.08237
  • Text ClassificationonUK Key Stage Readability
    F1· 2024-11-26
    74
    best: 99.6 (ELECTRA + ANN)
    What Differentiates Educational Literature? A Multimodal Fusion Approach of Transformers and Computational LinguisticsarXiv:2411.17593
  • Text ClassificationonHateXplain
    Accuracy (2 classes)· 2024-01-30
    0.816
    best: 0.8798 (Space-XLNet)
    Breaking Free Transformer Models: Task-specific Context Attribution Promises Improved Generalizability Without Fine-tuning Pre-trained LLMsarXiv:2401.16638
  • Text ClassificationonHateXplain
    F1 Macro· 2024-01-30
    0.8156
    best: 0.8797 (Space-XLNet)
    Breaking Free Transformer Models: Task-specific Context Attribution Promises Improved Generalizability Without Fine-tuning Pre-trained LLMsarXiv:2401.16638
  • Text ClassificationonCivil Comments
    GMB BNSP· 2023-01-26
    0.9597
    best: 0.9644 (DistilBERT)
    A benchmark for toxic comment classification on Civil Comments datasetarXiv:2301.11125
  • Text ClassificationonCivil Comments
    GMB BPSN· 2023-01-26
    0.8834
    best: 0.901 (RoBERTa Focal Loss)
    A benchmark for toxic comment classification on Civil Comments datasetarXiv:2301.11125
  • Text ClassificationonCivil Comments
    GMB Subgroup· 2023-01-26
    0.8689
    best: 0.8807 (RoBERTa Focal Loss)
    A benchmark for toxic comment classification on Civil Comments datasetarXiv:2301.11125
  • Text ClassificationonCivil Comments
    Macro F1· 2023-01-26
    0.3336
    best: 0.4749 (RoBERTa BCE)
    A benchmark for toxic comment classification on Civil Comments datasetarXiv:2301.11125
  • Text ClassificationonCivil Comments
    Micro F1· 2023-01-26
    0.4586
    best: 0.5958 (Unfreeze Glove ResNet 44)
    A benchmark for toxic comment classification on Civil Comments datasetarXiv:2301.11125
  • Text ClassificationonCivil Comments
    Precision· 2023-01-26
    0.3045
    best: 0.4835 (Unfreeze Glove ResNet 44)
    A benchmark for toxic comment classification on Civil Comments datasetarXiv:2301.11125
  • Named Entity Recognition (NER)onCoNLL 2003 (English)
    F1· 2021-12-15
    93.28
    best: 94.6 (ACE + document-context)
    Named entity recognition architecture combining contextual and global featuresarXiv:2112.08033

Methodology19 results

  • ClassificationonIMDb Movie Reviews
    Accuracy (2 classes)· 2024-01-30
    0.9387
    SOTA
    Breaking Free Transformer Models: Task-specific Context Attribution Promises Improved Generalizability Without Fine-tuning Pre-trained LLMsarXiv:2401.16638
  • ClassificationonCivil Comments
    Recall· 2023-01-26
    0.9254
    SOTA
    A benchmark for toxic comment classification on Civil Comments datasetarXiv:2301.11125
  • ClassificationonDBpedia
    Error· 2019-06-19
    0.62
    SOTA
    XLNet: Generalized Autoregressive Pretraining for Language UnderstandingarXiv:1906.08237
  • ClassificationonAmazon-5
    Error· 2019-06-19
    31.67
    SOTA
    XLNet: Generalized Autoregressive Pretraining for Language UnderstandingarXiv:1906.08237
  • ClassificationonAG News
    Error· 2019-06-19
    4.45
    SOTA
    XLNet: Generalized Autoregressive Pretraining for Language UnderstandingarXiv:1906.08237
  • ClassificationonAmazon-2
    Error· 2019-06-19
    2.11
    SOTA
    XLNet: Generalized Autoregressive Pretraining for Language UnderstandingarXiv:1906.08237
  • ClassificationonUK Key Stage Readability
    F1· 2024-11-26
    74
    best: 99.6 (ELECTRA + ANN)
    What Differentiates Educational Literature? A Multimodal Fusion Approach of Transformers and Computational LinguisticsarXiv:2411.17593
  • ClassificationonHateXplain
    Accuracy (2 classes)· 2024-01-30
    0.816
    best: 0.8798 (Space-XLNet)
    Breaking Free Transformer Models: Task-specific Context Attribution Promises Improved Generalizability Without Fine-tuning Pre-trained LLMsarXiv:2401.16638
  • ClassificationonHateXplain
    F1 Macro· 2024-01-30
    0.8156
    best: 0.8797 (Space-XLNet)
    Breaking Free Transformer Models: Task-specific Context Attribution Promises Improved Generalizability Without Fine-tuning Pre-trained LLMsarXiv:2401.16638
  • Data MiningonIMDb Movie Reviews
    Accuracy· 2023-08-07
    94.8
    best: 95.6 (ELECTRA)
    Analysis of the Evolution of Advanced Transformer-Based Language Models: Experiments on Opinion MiningarXiv:2308.03235
  • Data MiningonIMDb Movie Reviews
    F1· 2023-08-07
    94.9
    best: 95.6 (ELECTRA)
    Analysis of the Evolution of Advanced Transformer-Based Language Models: Experiments on Opinion MiningarXiv:2308.03235
  • Interpretable Machine LearningonIMDb Movie Reviews
    Accuracy· 2023-08-07
    94.8
    best: 95.6 (ELECTRA)
    Analysis of the Evolution of Advanced Transformer-Based Language Models: Experiments on Opinion MiningarXiv:2308.03235
  • Interpretable Machine LearningonIMDb Movie Reviews
    F1· 2023-08-07
    94.9
    best: 95.6 (ELECTRA)
    Analysis of the Evolution of Advanced Transformer-Based Language Models: Experiments on Opinion MiningarXiv:2308.03235
  • ClassificationonCivil Comments
    GMB BNSP· 2023-01-26
    0.9597
    best: 0.9644 (DistilBERT)
    A benchmark for toxic comment classification on Civil Comments datasetarXiv:2301.11125
  • ClassificationonCivil Comments
    GMB BPSN· 2023-01-26
    0.8834
    best: 0.901 (RoBERTa Focal Loss)
    A benchmark for toxic comment classification on Civil Comments datasetarXiv:2301.11125
  • ClassificationonCivil Comments
    GMB Subgroup· 2023-01-26
    0.8689
    best: 0.8807 (RoBERTa Focal Loss)
    A benchmark for toxic comment classification on Civil Comments datasetarXiv:2301.11125
  • ClassificationonCivil Comments
    Macro F1· 2023-01-26
    0.3336
    best: 0.4749 (RoBERTa BCE)
    A benchmark for toxic comment classification on Civil Comments datasetarXiv:2301.11125
  • ClassificationonCivil Comments
    Micro F1· 2023-01-26
    0.4586
    best: 0.5958 (Unfreeze Glove ResNet 44)
    A benchmark for toxic comment classification on Civil Comments datasetarXiv:2301.11125
  • ClassificationonCivil Comments
    Precision· 2023-01-26
    0.3045
    best: 0.4835 (Unfreeze Glove ResNet 44)
    A benchmark for toxic comment classification on Civil Comments datasetarXiv:2301.11125