TasksSotADatasetsPapersMethodsSubmitAbout
Papers With Code 2

A community resource for machine learning research: papers, code, benchmarks, and state-of-the-art results.

Explore

Notable BenchmarksAll SotADatasetsPapersMethods

Community

Submit ResultsAbout

Data sourced from the PWC Archive (CC-BY-SA 4.0). Built by the community, for the community.

Models/GPT-3

GPT-3

Reported on 50 benchmarks across 6 tasks · 4 papers · 10 SOTA

Note: results are matched by exact model name. Different papers may use the same name for different model variants.

Natural Language Processing37 results

  • Stereotypical Bias AnalysisonCrowS-Pairs
    Disability· 2022-05-02
    76.7
    SOTA
    OPT: Open Pre-trained Transformer Language ModelsarXiv:2205.01068
  • Text ClassificationonRAFT
    Over· 2021-09-28
    0.937
    best: 0.95 (T-Few)
    SOTA
    RAFT: A Real-World Few-Shot Text Classification BenchmarkarXiv:2109.14076
  • Text ClassificationonRAFT
    SRI· 2021-09-28
    0.516
    SOTA
    RAFT: A Real-World Few-Shot Text Classification BenchmarkarXiv:2109.14076
  • Text ClassificationonRAFT
    TAI· 2021-09-28
    0.656
    best: 0.736 (T-Few)
    SOTA
    RAFT: A Real-World Few-Shot Text Classification BenchmarkarXiv:2109.14076
  • Few-Shot Text ClassificationonRAFT
    Over· 2021-09-28
    0.937
    best: 0.95 (T-Few)
    SOTA
    RAFT: A Real-World Few-Shot Text Classification BenchmarkarXiv:2109.14076
  • Few-Shot Text ClassificationonRAFT
    SRI· 2021-09-28
    0.516
    SOTA
    RAFT: A Real-World Few-Shot Text Classification BenchmarkarXiv:2109.14076
  • Few-Shot Text ClassificationonRAFT
    TAI· 2021-09-28
    0.656
    best: 0.736 (T-Few)
    SOTA
    RAFT: A Real-World Few-Shot Text Classification BenchmarkarXiv:2109.14076
  • Stereotypical Bias AnalysisonCrowS-Pairs
    Age· 2022-05-02
    64.4
    best: 70.1 (LLaMA 65B)
    OPT: Open Pre-trained Transformer Language ModelsarXiv:2205.01068
  • Stereotypical Bias AnalysisonCrowS-Pairs
    Gender· 2022-05-02
    62.6
    best: 70.6 (LLaMA 65B)
    OPT: Open Pre-trained Transformer Language ModelsarXiv:2205.01068
  • Stereotypical Bias AnalysisonCrowS-Pairs
    Nationality· 2022-05-02
    61.6
    best: 64.2 (LLaMA 65B)
    OPT: Open Pre-trained Transformer Language ModelsarXiv:2205.01068
  • Stereotypical Bias AnalysisonCrowS-Pairs
    Overall· 2022-05-02
    67.2
    best: 69.5 (OPT-175B)
    OPT: Open Pre-trained Transformer Language ModelsarXiv:2205.01068
  • Stereotypical Bias AnalysisonCrowS-Pairs
    Physical Appearance· 2022-05-02
    74.6
    best: 77.8 (LLaMA 65B)
    OPT: Open Pre-trained Transformer Language ModelsarXiv:2205.01068
  • Stereotypical Bias AnalysisonCrowS-Pairs
    Race/Color· 2022-05-02
    64.7
    best: 68.6 (OPT-175B)
    OPT: Open Pre-trained Transformer Language ModelsarXiv:2205.01068
  • Stereotypical Bias AnalysisonCrowS-Pairs
    Religion· 2022-05-02
    62.6
    best: 70.6 (LLaMA 65B)
    OPT: Open Pre-trained Transformer Language ModelsarXiv:2205.01068
  • Stereotypical Bias AnalysisonCrowS-Pairs
    Sexual Orientation· 2022-05-02
    76.2
    best: 81 (LLaMA 65B)
    OPT: Open Pre-trained Transformer Language ModelsarXiv:2205.01068
  • Stereotypical Bias AnalysisonCrowS-Pairs
    Socioeconomic status· 2022-05-02
    73.8
    best: 76.2 (OPT-175B)
    OPT: Open Pre-trained Transformer Language ModelsarXiv:2205.01068
  • Text ClassificationonRAFT
    ADE· 2021-09-28
    0.686
    best: 0.83 (Human (crowdsourced))
    RAFT: A Real-World Few-Shot Text Classification BenchmarkarXiv:2109.14076
  • Text ClassificationonRAFT
    Avg· 2021-09-28
    0.627
    best: 0.758 (T-Few)
    RAFT: A Real-World Few-Shot Text Classification BenchmarkarXiv:2109.14076
  • Text ClassificationonRAFT
    B77· 2021-09-28
    0.299
    best: 0.695 (T-Few)
    RAFT: A Real-World Few-Shot Text Classification BenchmarkarXiv:2109.14076
  • Text ClassificationonRAFT
    NIS· 2021-09-28
    0.679
    best: 0.857 (Human (crowdsourced))
    RAFT: A Real-World Few-Shot Text Classification BenchmarkarXiv:2109.14076
  • Text ClassificationonRAFT
    OSE· 2021-09-28
    0.431
    best: 0.676 (T-Few)
    RAFT: A Real-World Few-Shot Text Classification BenchmarkarXiv:2109.14076
  • Text ClassificationonRAFT
    SOT· 2021-09-28
    0.769
    best: 0.915 (T-Few)
    RAFT: A Real-World Few-Shot Text Classification BenchmarkarXiv:2109.14076
  • Text ClassificationonRAFT
    TC· 2021-09-28
    0.821
    best: 0.897 (Human (crowdsourced))
    RAFT: A Real-World Few-Shot Text Classification BenchmarkarXiv:2109.14076
  • Text ClassificationonRAFT
    TEH· 2021-09-28
    0.526
    best: 0.722 (Human (crowdsourced))
    RAFT: A Real-World Few-Shot Text Classification BenchmarkarXiv:2109.14076
  • Text ClassificationonRAFT
    ToS· 2021-09-28
    0.574
    best: 0.75 (T-Few)
    RAFT: A Real-World Few-Shot Text Classification BenchmarkarXiv:2109.14076
  • Few-Shot Text ClassificationonRAFT
    ADE· 2021-09-28
    0.686
    best: 0.83 (Human (crowdsourced))
    RAFT: A Real-World Few-Shot Text Classification BenchmarkarXiv:2109.14076
  • Few-Shot Text ClassificationonRAFT
    Avg· 2021-09-28
    0.627
    best: 0.758 (T-Few)
    RAFT: A Real-World Few-Shot Text Classification BenchmarkarXiv:2109.14076
  • Few-Shot Text ClassificationonRAFT
    B77· 2021-09-28
    0.299
    best: 0.695 (T-Few)
    RAFT: A Real-World Few-Shot Text Classification BenchmarkarXiv:2109.14076
  • Few-Shot Text ClassificationonRAFT
    NIS· 2021-09-28
    0.679
    best: 0.857 (Human (crowdsourced))
    RAFT: A Real-World Few-Shot Text Classification BenchmarkarXiv:2109.14076
  • Few-Shot Text ClassificationonRAFT
    OSE· 2021-09-28
    0.431
    best: 0.676 (T-Few)
    RAFT: A Real-World Few-Shot Text Classification BenchmarkarXiv:2109.14076
  • Few-Shot Text ClassificationonRAFT
    SOT· 2021-09-28
    0.769
    best: 0.915 (T-Few)
    RAFT: A Real-World Few-Shot Text Classification BenchmarkarXiv:2109.14076
  • Few-Shot Text ClassificationonRAFT
    TC· 2021-09-28
    0.821
    best: 0.897 (Human (crowdsourced))
    RAFT: A Real-World Few-Shot Text Classification BenchmarkarXiv:2109.14076
  • Few-Shot Text ClassificationonRAFT
    TEH· 2021-09-28
    0.526
    best: 0.722 (Human (crowdsourced))
    RAFT: A Real-World Few-Shot Text Classification BenchmarkarXiv:2109.14076
  • Few-Shot Text ClassificationonRAFT
    ToS· 2021-09-28
    0.574
    best: 0.75 (T-Few)
    RAFT: A Real-World Few-Shot Text Classification BenchmarkarXiv:2109.14076
  • Natural Language InferenceonANLI test
    A1· uses extra data· 2020-05-28
    36.8
    best: 81.8 (T5-3B (explanation prompting))
    Language Models are Few-Shot LearnersarXiv:2005.14165
  • Natural Language InferenceonANLI test
    A2· uses extra data· 2020-05-28
    34
    best: 72.5 (T5-3B (explanation prompting))
    Language Models are Few-Shot LearnersarXiv:2005.14165
  • Natural Language InferenceonANLI test
    A3· uses extra data· 2020-05-28
    40.2
    best: 74.8 (T5-3B (explanation prompting))
    Language Models are Few-Shot LearnersarXiv:2005.14165

Methodology12 results

  • ClassificationonRAFT
    Over· 2021-09-28
    0.937
    best: 0.95 (T-Few)
    SOTA
    RAFT: A Real-World Few-Shot Text Classification BenchmarkarXiv:2109.14076
  • ClassificationonRAFT
    SRI· 2021-09-28
    0.516
    SOTA
    RAFT: A Real-World Few-Shot Text Classification BenchmarkarXiv:2109.14076
  • ClassificationonRAFT
    TAI· 2021-09-28
    0.656
    best: 0.736 (T-Few)
    SOTA
    RAFT: A Real-World Few-Shot Text Classification BenchmarkarXiv:2109.14076
  • ClassificationonRAFT
    ADE· 2021-09-28
    0.686
    best: 0.83 (Human (crowdsourced))
    RAFT: A Real-World Few-Shot Text Classification BenchmarkarXiv:2109.14076
  • ClassificationonRAFT
    Avg· 2021-09-28
    0.627
    best: 0.758 (T-Few)
    RAFT: A Real-World Few-Shot Text Classification BenchmarkarXiv:2109.14076
  • ClassificationonRAFT
    B77· 2021-09-28
    0.299
    best: 0.695 (T-Few)
    RAFT: A Real-World Few-Shot Text Classification BenchmarkarXiv:2109.14076
  • ClassificationonRAFT
    NIS· 2021-09-28
    0.679
    best: 0.857 (Human (crowdsourced))
    RAFT: A Real-World Few-Shot Text Classification BenchmarkarXiv:2109.14076
  • ClassificationonRAFT
    OSE· 2021-09-28
    0.431
    best: 0.676 (T-Few)
    RAFT: A Real-World Few-Shot Text Classification BenchmarkarXiv:2109.14076
  • ClassificationonRAFT
    SOT· 2021-09-28
    0.769
    best: 0.915 (T-Few)
    RAFT: A Real-World Few-Shot Text Classification BenchmarkarXiv:2109.14076
  • ClassificationonRAFT
    TC· 2021-09-28
    0.821
    best: 0.897 (Human (crowdsourced))
    RAFT: A Real-World Few-Shot Text Classification BenchmarkarXiv:2109.14076
  • ClassificationonRAFT
    TEH· 2021-09-28
    0.526
    best: 0.722 (Human (crowdsourced))
    RAFT: A Real-World Few-Shot Text Classification BenchmarkarXiv:2109.14076
  • ClassificationonRAFT
    ToS· 2021-09-28
    0.574
    best: 0.75 (T-Few)
    RAFT: A Real-World Few-Shot Text Classification BenchmarkarXiv:2109.14076

Medical1 result

  • Language ModellingonThe Pile
    Bits per byte· 2022-10-05
    0.742
    best: 1.2253 (GPT-2 Small 124M (pre-trained))
    GLM-130B: An Open Bilingual Pre-trained ModelarXiv:2210.02414