TasksSotADatasetsPapersMethodsSubmitAbout
Papers With Code 2

A community resource for machine learning research: papers, code, benchmarks, and state-of-the-art results.

Explore

Notable BenchmarksAll SotADatasetsPapersMethods

Community

Submit ResultsAbout

Data sourced from the PWC Archive (CC-BY-SA 4.0). Built by the community, for the community.

Models/GPT-2

GPT-2

Reported on 57 benchmarks across 12 tasks · 2 papers

Note: results are matched by exact model name. Different papers may use the same name for different model variants.

Natural Language Processing31 results

  • Text ClassificationonRAFT
    Over· 2021-09-28
    0.498
    best: 0.95 (T-Few)
    RAFT: A Real-World Few-Shot Text Classification BenchmarkarXiv:2109.14076
  • Text ClassificationonRAFT
    ADE· 2021-09-28
    0.6
    best: 0.83 (Human (crowdsourced))
    RAFT: A Real-World Few-Shot Text Classification BenchmarkarXiv:2109.14076
  • Text ClassificationonRAFT
    Avg· 2021-09-28
    0.458
    best: 0.758 (T-Few)
    RAFT: A Real-World Few-Shot Text Classification BenchmarkarXiv:2109.14076
  • Text ClassificationonRAFT
    B77· 2021-09-28
    0.121
    best: 0.695 (T-Few)
    RAFT: A Real-World Few-Shot Text Classification BenchmarkarXiv:2109.14076
  • Text ClassificationonRAFT
    NIS· 2021-09-28
    0.561
    best: 0.857 (Human (crowdsourced))
    RAFT: A Real-World Few-Shot Text Classification BenchmarkarXiv:2109.14076
  • Text ClassificationonRAFT
    OSE· 2021-09-28
    0.245
    best: 0.676 (T-Few)
    RAFT: A Real-World Few-Shot Text Classification BenchmarkarXiv:2109.14076
  • Text ClassificationonRAFT
    SOT· 2021-09-28
    0.38
    best: 0.915 (T-Few)
    RAFT: A Real-World Few-Shot Text Classification BenchmarkarXiv:2109.14076
  • Text ClassificationonRAFT
    SRI· 2021-09-28
    0.492
    best: 0.516 (GPT-3)
    RAFT: A Real-World Few-Shot Text Classification BenchmarkarXiv:2109.14076
  • Text ClassificationonRAFT
    TAI· 2021-09-28
    0.612
    best: 0.736 (T-Few)
    RAFT: A Real-World Few-Shot Text Classification BenchmarkarXiv:2109.14076
  • Text ClassificationonRAFT
    TC· 2021-09-28
    0.723
    best: 0.897 (Human (crowdsourced))
    RAFT: A Real-World Few-Shot Text Classification BenchmarkarXiv:2109.14076
  • Text ClassificationonRAFT
    TEH· 2021-09-28
    0.311
    best: 0.722 (Human (crowdsourced))
    RAFT: A Real-World Few-Shot Text Classification BenchmarkarXiv:2109.14076
  • Text ClassificationonRAFT
    ToS· 2021-09-28
    0.498
    best: 0.75 (T-Few)
    RAFT: A Real-World Few-Shot Text Classification BenchmarkarXiv:2109.14076
  • Few-Shot Text ClassificationonRAFT
    Over· 2021-09-28
    0.498
    best: 0.95 (T-Few)
    RAFT: A Real-World Few-Shot Text Classification BenchmarkarXiv:2109.14076
  • Few-Shot Text ClassificationonRAFT
    ADE· 2021-09-28
    0.6
    best: 0.83 (Human (crowdsourced))
    RAFT: A Real-World Few-Shot Text Classification BenchmarkarXiv:2109.14076
  • Few-Shot Text ClassificationonRAFT
    Avg· 2021-09-28
    0.458
    best: 0.758 (T-Few)
    RAFT: A Real-World Few-Shot Text Classification BenchmarkarXiv:2109.14076
  • Few-Shot Text ClassificationonRAFT
    B77· 2021-09-28
    0.121
    best: 0.695 (T-Few)
    RAFT: A Real-World Few-Shot Text Classification BenchmarkarXiv:2109.14076
  • Few-Shot Text ClassificationonRAFT
    NIS· 2021-09-28
    0.561
    best: 0.857 (Human (crowdsourced))
    RAFT: A Real-World Few-Shot Text Classification BenchmarkarXiv:2109.14076
  • Few-Shot Text ClassificationonRAFT
    OSE· 2021-09-28
    0.245
    best: 0.676 (T-Few)
    RAFT: A Real-World Few-Shot Text Classification BenchmarkarXiv:2109.14076
  • Few-Shot Text ClassificationonRAFT
    SOT· 2021-09-28
    0.38
    best: 0.915 (T-Few)
    RAFT: A Real-World Few-Shot Text Classification BenchmarkarXiv:2109.14076
  • Few-Shot Text ClassificationonRAFT
    SRI· 2021-09-28
    0.492
    best: 0.516 (GPT-3)
    RAFT: A Real-World Few-Shot Text Classification BenchmarkarXiv:2109.14076
  • Few-Shot Text ClassificationonRAFT
    TAI· 2021-09-28
    0.612
    best: 0.736 (T-Few)
    RAFT: A Real-World Few-Shot Text Classification BenchmarkarXiv:2109.14076
  • Few-Shot Text ClassificationonRAFT
    TC· 2021-09-28
    0.723
    best: 0.897 (Human (crowdsourced))
    RAFT: A Real-World Few-Shot Text Classification BenchmarkarXiv:2109.14076
  • Few-Shot Text ClassificationonRAFT
    TEH· 2021-09-28
    0.311
    best: 0.722 (Human (crowdsourced))
    RAFT: A Real-World Few-Shot Text Classification BenchmarkarXiv:2109.14076
  • Few-Shot Text ClassificationonRAFT
    ToS· 2021-09-28
    0.498
    best: 0.75 (T-Few)
    RAFT: A Real-World Few-Shot Text Classification BenchmarkarXiv:2109.14076
  • Cross-LingualonReddit Ideological and Extreme Bias Dataset
    weighted-F1 score
    76.43
    best: 79.1 (SVM)
  • Text ClassificationonThreatGram 101 - Extreme Telegram Data
    weighted-F1 score
    66.2
  • Cross-Lingual Document ClassificationonReddit Ideological and Extreme Bias Dataset
    weighted-F1 score
    76.43
    best: 79.1 (SVM)
  • Document SummarizationonCNN / Daily Mail
    ROUGE-1· uses extra data
    29.34
    best: 48.18 (Scrambled code + broken (alter))
  • Document SummarizationonCNN / Daily Mail
    ROUGE-2· uses extra data
    8.27
    best: 22.55 (PEGASUS + SummaReranker)
  • Document SummarizationonCNN / Daily Mail
    ROUGE-L· uses extra data
    26.58
    best: 45.35 (Scrambled code + broken (alter))
  • Response GenerationonSIMMC2.0
    BLEU
    19.2
    best: 34.1 (PaCE)

Methodology17 results

  • Data MiningonIMDb Movie Reviews
    Accuracy· 2023-08-07
    54.5
    best: 95.6 (ELECTRA)
    Analysis of the Evolution of Advanced Transformer-Based Language Models: Experiments on Opinion MiningarXiv:2308.03235
  • Data MiningonIMDb Movie Reviews
    F1· 2023-08-07
    52.9
    best: 95.6 (ELECTRA)
    Analysis of the Evolution of Advanced Transformer-Based Language Models: Experiments on Opinion MiningarXiv:2308.03235
  • Interpretable Machine LearningonIMDb Movie Reviews
    Accuracy· 2023-08-07
    54.5
    best: 95.6 (ELECTRA)
    Analysis of the Evolution of Advanced Transformer-Based Language Models: Experiments on Opinion MiningarXiv:2308.03235
  • Interpretable Machine LearningonIMDb Movie Reviews
    F1· 2023-08-07
    52.9
    best: 95.6 (ELECTRA)
    Analysis of the Evolution of Advanced Transformer-Based Language Models: Experiments on Opinion MiningarXiv:2308.03235
  • ClassificationonRAFT
    Over· 2021-09-28
    0.498
    best: 0.95 (T-Few)
    RAFT: A Real-World Few-Shot Text Classification BenchmarkarXiv:2109.14076
  • ClassificationonRAFT
    ADE· 2021-09-28
    0.6
    best: 0.83 (Human (crowdsourced))
    RAFT: A Real-World Few-Shot Text Classification BenchmarkarXiv:2109.14076
  • ClassificationonRAFT
    Avg· 2021-09-28
    0.458
    best: 0.758 (T-Few)
    RAFT: A Real-World Few-Shot Text Classification BenchmarkarXiv:2109.14076
  • ClassificationonRAFT
    B77· 2021-09-28
    0.121
    best: 0.695 (T-Few)
    RAFT: A Real-World Few-Shot Text Classification BenchmarkarXiv:2109.14076
  • ClassificationonRAFT
    NIS· 2021-09-28
    0.561
    best: 0.857 (Human (crowdsourced))
    RAFT: A Real-World Few-Shot Text Classification BenchmarkarXiv:2109.14076
  • ClassificationonRAFT
    OSE· 2021-09-28
    0.245
    best: 0.676 (T-Few)
    RAFT: A Real-World Few-Shot Text Classification BenchmarkarXiv:2109.14076
  • ClassificationonRAFT
    SOT· 2021-09-28
    0.38
    best: 0.915 (T-Few)
    RAFT: A Real-World Few-Shot Text Classification BenchmarkarXiv:2109.14076
  • ClassificationonRAFT
    SRI· 2021-09-28
    0.492
    best: 0.516 (GPT-3)
    RAFT: A Real-World Few-Shot Text Classification BenchmarkarXiv:2109.14076
  • ClassificationonRAFT
    TAI· 2021-09-28
    0.612
    best: 0.736 (T-Few)
    RAFT: A Real-World Few-Shot Text Classification BenchmarkarXiv:2109.14076
  • ClassificationonRAFT
    TC· 2021-09-28
    0.723
    best: 0.897 (Human (crowdsourced))
    RAFT: A Real-World Few-Shot Text Classification BenchmarkarXiv:2109.14076
  • ClassificationonRAFT
    TEH· 2021-09-28
    0.311
    best: 0.722 (Human (crowdsourced))
    RAFT: A Real-World Few-Shot Text Classification BenchmarkarXiv:2109.14076
  • ClassificationonRAFT
    ToS· 2021-09-28
    0.498
    best: 0.75 (T-Few)
    RAFT: A Real-World Few-Shot Text Classification BenchmarkarXiv:2109.14076
  • ClassificationonThreatGram 101 - Extreme Telegram Data
    weighted-F1 score
    66.2

Medical4 results

  • Language ModellingonPenn Treebank (Word Level)
    Test perplexity· uses extra data
    35.76
    best: 20.5 (GPT-3 (Zero-Shot))
  • Language ModellingonText8
    Bit per Character (BPC)· uses extra data
    0.98
    best: 1.63 (td-LSTM (Zhang et al., 2016))
  • Language ModellingonOne Billion Word
    PPL· uses extra data
    42.16
    best: 20.09 (MDLM (AR baseline))
  • Language ModellingonWikiText-2
    Test perplexity· uses extra data
    18.34
    best: 8.21 (SparseGPT (175B, 50% Sparsity))

Knowledge Base3 results

  • Text SummarizationonCNN / Daily Mail
    ROUGE-1· uses extra data
    29.34
    best: 48.18 (Scrambled code + broken (alter))
  • Text SummarizationonCNN / Daily Mail
    ROUGE-2· uses extra data
    8.27
    best: 24.02 (Pegasus)
  • Text SummarizationonCNN / Daily Mail
    ROUGE-L· uses extra data
    26.58
    best: 45.35 (Scrambled code + broken (alter))

Speech2 results

  • DialogueonSIMMC2.0
    Act F1
    94.5
    best: 97.1 (PaCE)
  • DialogueonSIMMC2.0
    Slot F1
    81.7
    best: 88.3 (BART-large)