Tasks SotA Datasets Papers Methods Submit About

Papers With Code 2

A community resource for machine learning research: papers, code, benchmarks, and state-of-the-art results.

Explore

Notable Benchmarks All SotA Datasets Papers Methods

Community

Submit Results About

Data sourced from the PWC Archive (CC-BY-SA 4.0). Built by the community, for the community.

Models/ChatGPT

ChatGPT

Reported on 23 benchmarks across 5 tasks · 6 papers · 6 SOTA

Note: results are matched by exact model name. Different papers may use the same name for different model variants.

Natural Language Processing23 results

Machine TranslationonMulti Lingual Bug Reports
BERTScore· 2025-02-20
79
SOTA
English Please: Evaluating Machine Translation with Large Language Models for Multilingual Bug Reports arXiv:2502.14338
Question AnsweringonVNHSGE-Literature
Accuracy· 2023-05-20
68
SOTA
VNHSGE: VietNamese High School Graduation Examination Dataset for Large Language Models arXiv:2305.12199
Question AnsweringonKQA Pro
Accuracy· 2023-03-14
47.93
SOTA
Can ChatGPT Replace Traditional KBQA Models? An In-depth Analysis of the Question Answering Performance of the GPT LLM Family arXiv:2303.07992
Question AnsweringonGraphQuestions
Accuracy· 2023-03-14
53.1
SOTA
Can ChatGPT Replace Traditional KBQA Models? An In-depth Analysis of the Question Answering Performance of the GPT LLM Family arXiv:2303.07992
Question AnsweringonWebQuestionsSP
Accuracy· 2023-03-14
83.7
SOTA
Can ChatGPT Replace Traditional KBQA Models? An In-depth Analysis of the Question Answering Performance of the GPT LLM Family arXiv:2303.07992
Code GenerationonNLC2CMD
Accuracy· 2023-02-15
0.806
SOTA
NL2CMD: An Updated Workflow for Natural Language to Bash Commands Translation arXiv:2302.07845
Natural Language InferenceonANLI test
A1· 2023-05-29
62.3
best: 81.8 (T5-3B (explanation prompting))
A Systematic Study and Comprehensive Evaluation of ChatGPT on Benchmark Datasets arXiv:2305.18486
Natural Language InferenceonANLI test
A2· 2023-05-29
52.6
best: 72.5 (T5-3B (explanation prompting))
A Systematic Study and Comprehensive Evaluation of ChatGPT on Benchmark Datasets arXiv:2305.18486
Natural Language InferenceonANLI test
A3· 2023-05-29
54.1
best: 74.8 (T5-3B (explanation prompting))
A Systematic Study and Comprehensive Evaluation of ChatGPT on Benchmark Datasets arXiv:2305.18486
Named Entity Recognition (NER)onCrossNER
AI· uses extra data· 2023-05-24
40.7
best: 61.7 (NuNERZero span)
PromptNER: Prompting For Named Entity Recognition arXiv:2305.15444
Named Entity Recognition (NER)onCrossNER
Literature· uses extra data· 2023-05-24
21.3
best: 64.9 (NuNERZero span)
PromptNER: Prompting For Named Entity Recognition arXiv:2305.15444
Named Entity Recognition (NER)onCrossNER
Music· uses extra data· 2023-05-24
24.5
best: 69.9 (NuNERZero span)
PromptNER: Prompting For Named Entity Recognition arXiv:2305.15444
Named Entity Recognition (NER)onCrossNER
Politics· uses extra data· 2023-05-24
20.3
best: 71.7 (NuNERZero span)
PromptNER: Prompting For Named Entity Recognition arXiv:2305.15444
Named Entity Recognition (NER)onCrossNER
Science· uses extra data· 2023-05-24
40.6
best: 65.4 (NuNERZero span)
PromptNER: Prompting For Named Entity Recognition arXiv:2305.15444
Question AnsweringonVNHSGE-English
Accuracy· 2023-05-20
79.2
best: 92.4 (Bing Chat)
VNHSGE: VietNamese High School Graduation Examination Dataset for Large Language Models arXiv:2305.12199
Question AnsweringonVNHSGE-History
Accuracy· 2023-05-20
56.5
best: 88.5 (Bing Chat)
VNHSGE: VietNamese High School Graduation Examination Dataset for Large Language Models arXiv:2305.12199
Question AnsweringonVNHSGE-Biology
Accuracy· 2023-05-20
58
best: 69 (Bing Chat)
VNHSGE: VietNamese High School Graduation Examination Dataset for Large Language Models arXiv:2305.12199
Question AnsweringonVNHSGE Mathematics
Accuracy· 2023-05-20
58.8
best: 60 (Bing Chat)
VNHSGE: VietNamese High School Graduation Examination Dataset for Large Language Models arXiv:2305.12199
Question AnsweringonVNHSGE-Civic
Accuracy· 2023-05-20
70.5
best: 85.5 (Bing Chat)
VNHSGE: VietNamese High School Graduation Examination Dataset for Large Language Models arXiv:2305.12199
Question AnsweringonVNHSGE-Physics
Accuracy· 2023-05-20
61
best: 66 (Bing Chat)
VNHSGE: VietNamese High School Graduation Examination Dataset for Large Language Models arXiv:2305.12199
Question AnsweringonVNHSGE-Geography
Accuracy· 2023-05-20
61.5
best: 85.5 (Bing Chat)
VNHSGE: VietNamese High School Graduation Examination Dataset for Large Language Models arXiv:2305.12199
Question AnsweringonVNHSGE-Chemistry
Accuracy· 2023-05-20
48
best: 52.5 (Bing Chat)
VNHSGE: VietNamese High School Graduation Examination Dataset for Large Language Models arXiv:2305.12199
Question AnsweringonMultiTQ
Hits@1
10.2
best: 79.7 (Prog-TQA)