TasksSotADatasetsPapersMethodsSubmitAbout
Papers With Code 2

A community resource for machine learning research: papers, code, benchmarks, and state-of-the-art results.

Explore

Notable BenchmarksAll SotADatasetsPapersMethods

Community

Submit ResultsAbout

Data sourced from the PWC Archive (CC-BY-SA 4.0). Built by the community, for the community.

Models/ChatGPT

ChatGPT

Reported on 23 benchmarks across 5 tasks · 6 papers · 6 SOTA

Note: results are matched by exact model name. Different papers may use the same name for different model variants.

Natural Language Processing23 results

  • Machine TranslationonMulti Lingual Bug Reports
    BERTScore· 2025-02-20
    79
    SOTA
    English Please: Evaluating Machine Translation with Large Language Models for Multilingual Bug ReportsarXiv:2502.14338
  • Question AnsweringonVNHSGE-Literature
    Accuracy· 2023-05-20
    68
    SOTA
    VNHSGE: VietNamese High School Graduation Examination Dataset for Large Language ModelsarXiv:2305.12199
  • Question AnsweringonKQA Pro
    Accuracy· 2023-03-14
    47.93
    SOTA
    Can ChatGPT Replace Traditional KBQA Models? An In-depth Analysis of the Question Answering Performance of the GPT LLM FamilyarXiv:2303.07992
  • Question AnsweringonGraphQuestions
    Accuracy· 2023-03-14
    53.1
    SOTA
    Can ChatGPT Replace Traditional KBQA Models? An In-depth Analysis of the Question Answering Performance of the GPT LLM FamilyarXiv:2303.07992
  • Question AnsweringonWebQuestionsSP
    Accuracy· 2023-03-14
    83.7
    SOTA
    Can ChatGPT Replace Traditional KBQA Models? An In-depth Analysis of the Question Answering Performance of the GPT LLM FamilyarXiv:2303.07992
  • Code GenerationonNLC2CMD
    Accuracy· 2023-02-15
    0.806
    SOTA
    NL2CMD: An Updated Workflow for Natural Language to Bash Commands TranslationarXiv:2302.07845
  • Natural Language InferenceonANLI test
    A1· 2023-05-29
    62.3
    best: 81.8 (T5-3B (explanation prompting))
    A Systematic Study and Comprehensive Evaluation of ChatGPT on Benchmark DatasetsarXiv:2305.18486
  • Natural Language InferenceonANLI test
    A2· 2023-05-29
    52.6
    best: 72.5 (T5-3B (explanation prompting))
    A Systematic Study and Comprehensive Evaluation of ChatGPT on Benchmark DatasetsarXiv:2305.18486
  • Natural Language InferenceonANLI test
    A3· 2023-05-29
    54.1
    best: 74.8 (T5-3B (explanation prompting))
    A Systematic Study and Comprehensive Evaluation of ChatGPT on Benchmark DatasetsarXiv:2305.18486
  • Named Entity Recognition (NER)onCrossNER
    AI· uses extra data· 2023-05-24
    40.7
    best: 61.7 (NuNERZero span)
    PromptNER: Prompting For Named Entity RecognitionarXiv:2305.15444
  • Named Entity Recognition (NER)onCrossNER
    Literature· uses extra data· 2023-05-24
    21.3
    best: 64.9 (NuNERZero span)
    PromptNER: Prompting For Named Entity RecognitionarXiv:2305.15444
  • Named Entity Recognition (NER)onCrossNER
    Music· uses extra data· 2023-05-24
    24.5
    best: 69.9 (NuNERZero span)
    PromptNER: Prompting For Named Entity RecognitionarXiv:2305.15444
  • Named Entity Recognition (NER)onCrossNER
    Politics· uses extra data· 2023-05-24
    20.3
    best: 71.7 (NuNERZero span)
    PromptNER: Prompting For Named Entity RecognitionarXiv:2305.15444
  • Named Entity Recognition (NER)onCrossNER
    Science· uses extra data· 2023-05-24
    40.6
    best: 65.4 (NuNERZero span)
    PromptNER: Prompting For Named Entity RecognitionarXiv:2305.15444
  • Question AnsweringonVNHSGE-English
    Accuracy· 2023-05-20
    79.2
    best: 92.4 (Bing Chat)
    VNHSGE: VietNamese High School Graduation Examination Dataset for Large Language ModelsarXiv:2305.12199
  • Question AnsweringonVNHSGE-History
    Accuracy· 2023-05-20
    56.5
    best: 88.5 (Bing Chat)
    VNHSGE: VietNamese High School Graduation Examination Dataset for Large Language ModelsarXiv:2305.12199
  • Question AnsweringonVNHSGE-Biology
    Accuracy· 2023-05-20
    58
    best: 69 (Bing Chat)
    VNHSGE: VietNamese High School Graduation Examination Dataset for Large Language ModelsarXiv:2305.12199
  • Question AnsweringonVNHSGE Mathematics
    Accuracy· 2023-05-20
    58.8
    best: 60 (Bing Chat)
    VNHSGE: VietNamese High School Graduation Examination Dataset for Large Language ModelsarXiv:2305.12199
  • Question AnsweringonVNHSGE-Civic
    Accuracy· 2023-05-20
    70.5
    best: 85.5 (Bing Chat)
    VNHSGE: VietNamese High School Graduation Examination Dataset for Large Language ModelsarXiv:2305.12199
  • Question AnsweringonVNHSGE-Physics
    Accuracy· 2023-05-20
    61
    best: 66 (Bing Chat)
    VNHSGE: VietNamese High School Graduation Examination Dataset for Large Language ModelsarXiv:2305.12199
  • Question AnsweringonVNHSGE-Geography
    Accuracy· 2023-05-20
    61.5
    best: 85.5 (Bing Chat)
    VNHSGE: VietNamese High School Graduation Examination Dataset for Large Language ModelsarXiv:2305.12199
  • Question AnsweringonVNHSGE-Chemistry
    Accuracy· 2023-05-20
    48
    best: 52.5 (Bing Chat)
    VNHSGE: VietNamese High School Graduation Examination Dataset for Large Language ModelsarXiv:2305.12199
  • Question AnsweringonMultiTQ
    Hits@1
    10.2
    best: 79.7 (Prog-TQA)