Tasks SotA Datasets Papers Methods Submit About

Papers With Code 2

A community resource for machine learning research: papers, code, benchmarks, and state-of-the-art results.

Explore

Notable Benchmarks All SotA Datasets Papers Methods

Community

Submit Results About

Data sourced from the PWC Archive (CC-BY-SA 4.0). Built by the community, for the community.

Models/GPT-2 1.5B

GPT-2 1.5B

Reported on 8 benchmarks across 1 task · 1 paper · 2 SOTA

Note: results are matched by exact model name. Different papers may use the same name for different model variants.

Natural Language Processing8 results

Question AnsweringonTruthfulQA
MC1· 2021-09-08
0.22
best: 0.59 (GPT-4 (RLHF))
SOTA
TruthfulQA: Measuring How Models Mimic Human Falsehoods arXiv:2109.07958
Question AnsweringonTruthfulQA
MC2· 2021-09-08
0.39
best: 0.75 (Mistral-7B-Instruct-v0.2 + TruthX)
SOTA
TruthfulQA: Measuring How Models Mimic Human Falsehoods arXiv:2109.07958
Question AnsweringonTruthfulQA
% info· 2021-09-08
89.84
best: 97.7 (Alpaca 7B + Inference Time Intervention (ITI))
TruthfulQA: Measuring How Models Mimic Human Falsehoods arXiv:2109.07958
Question AnsweringonTruthfulQA
% true· 2021-09-08
29.5
best: 88.6 (Vicuna 7B + Inference Time Intervention (ITI))
TruthfulQA: Measuring How Models Mimic Human Falsehoods arXiv:2109.07958
Question AnsweringonTruthfulQA
% true (GPT-judge)· 2021-09-08
29.87
best: 53.24 (UnifiedQA 3B)
TruthfulQA: Measuring How Models Mimic Human Falsehoods arXiv:2109.07958
Question AnsweringonTruthfulQA
BLEU· 2021-09-08
-4.91
best: -0.16 (UnifiedQA 3B)
TruthfulQA: Measuring How Models Mimic Human Falsehoods arXiv:2109.07958
Question AnsweringonTruthfulQA
BLEURT· 2021-09-08
-0.25
best: 0.08 (UnifiedQA 3B)
TruthfulQA: Measuring How Models Mimic Human Falsehoods arXiv:2109.07958
Question AnsweringonTruthfulQA
ROUGE· 2021-09-08
-9.41
best: 1.76 (UnifiedQA 3B)
TruthfulQA: Measuring How Models Mimic Human Falsehoods arXiv:2109.07958