Tasks
SotA
Datasets
Papers
Methods
Submit
About
SotA
/
Natural Language Processing
/
Question Answering
/
TruthfulQA
Question Answering on TruthfulQA
Metric: MC1 (higher is better)
Leaderboard
Dataset
Loading chart...
Results
Submit a result
Export CSV
Sort:
MC1 (best first)
MC1 (worst first)
Date (newest first)
Date (oldest first)
Model name (A→Z)
#
Model
↕
MC1
▼
Extra Data
Paper
Date
↕
Code
1
GPT-4 (RLHF)
0.59
No
GPT-4 Technical Report
2023-03-15
Code
2
Mistral-7B-Instruct-v0.2 + TruthX
0.56
No
TruthX: Alleviating Hallucinations by Editing La...
2024-02-27
Code
3
LLaMa-2-7B-Chat + TruthX
0.54
No
TruthX: Alleviating Hallucinations by Editing La...
2024-02-27
Code
4
LLaMA-2-Chat-13B + Representation Control (Contrast Vector)
0.54
No
Representation Engineering: A Top-Down Approach ...
2023-10-02
Code
5
LLaMA-2-Chat-7B + Representation Control (Contrast Vector)
0.48
No
Representation Engineering: A Top-Down Approach ...
2023-10-02
Code
6
Vicuna 7B + Inference Time Intervention (ITI)
0.389
No
-
-
-
7
Alpaca 7B + Inference Time Intervention (ITI)
0.319
No
-
-
-
8
Gopher 280B (zero-shot, Our Prompt + Choices)
0.295
No
Scaling Language Models: Methods, Analysis & Ins...
2021-12-08
Code
9
LLaMA 7B + Inference Time Intervention (ITI)
0.288
No
-
-
-
10
GAL 120B
0.26
No
Galactica: A Large Language Model for Science
2022-11-16
Code
11
Gopher 7.1 (zero-shot, QA prompts)
0.25
No
Scaling Language Models: Methods, Analysis & Ins...
2021-12-08
Code
12
GAL 30B
0.24
No
Galactica: A Large Language Model for Science
2022-11-16
Code
13
Gopher 7.1B (zero-shot, Our Prompt + Choices)
0.23
No
Scaling Language Models: Methods, Analysis & Ins...
2021-12-08
Code
14
Gopher 1.4 (zero-shot, QA prompts)
0.23
No
Scaling Language Models: Methods, Analysis & Ins...
2021-12-08
Code
15
GPT-2 1.5B
0.22
No
TruthfulQA: Measuring How Models Mimic Human Fal...
2021-09-08
Code
16
Gopher 1.4B (zero-shot, Our Prompt + Choices)
0.217
No
Scaling Language Models: Methods, Analysis & Ins...
2021-12-08
Code
17
GPT-3 175B
0.21
No
TruthfulQA: Measuring How Models Mimic Human Fal...
2021-09-08
Code
18
OPT 175B
0.21
No
Galactica: A Large Language Model for Science
2022-11-16
Code
19
GPT-J 6B
0.2
No
TruthfulQA: Measuring How Models Mimic Human Fal...
2021-09-08
Code
20
UnifiedQA 3B
0.19
No
TruthfulQA: Measuring How Models Mimic Human Fal...
2021-09-08
Code
21
GAL 125M
0.19
No
Galactica: A Large Language Model for Science
2022-11-16
Code
22
GAL 1.3B
0.19
No
Galactica: A Large Language Model for Science
2022-11-16
Code
23
GAL 6.7B
0.19
No
Galactica: A Large Language Model for Science
2022-11-16
Code
#1
GPT-4 (RLHF)
SOTA
0.59
MC1
· 2023-03-15
GPT-4 Technical Report
Code
#2
Mistral-7B-Instruct-v0.2 + TruthX
0.56
MC1
· 2024-02-27
TruthX: Alleviating Hallucinations by Editing Large Language Models in Truthful Space
Code
#3
LLaMa-2-7B-Chat + TruthX
0.54
MC1
· 2024-02-27
TruthX: Alleviating Hallucinations by Editing Large Language Models in Truthful Space
Code
#4
LLaMA-2-Chat-13B + Representation Control (Contrast Vector)
0.54
MC1
· 2023-10-02
Representation Engineering: A Top-Down Approach to AI Transparency
Code
#5
LLaMA-2-Chat-7B + Representation Control (Contrast Vector)
0.48
MC1
· 2023-10-02
Representation Engineering: A Top-Down Approach to AI Transparency
Code
#6
Vicuna 7B + Inference Time Intervention (ITI)
0.389
MC1
No paper
#7
Alpaca 7B + Inference Time Intervention (ITI)
0.319
MC1
No paper
#8
Gopher 280B (zero-shot, Our Prompt + Choices)
SOTA
0.295
MC1
· 2021-12-08
Scaling Language Models: Methods, Analysis & Insights from Training Gopher
Code
#9
LLaMA 7B + Inference Time Intervention (ITI)
0.288
MC1
No paper
#10
GAL 120B
0.26
MC1
· 2022-11-16
Galactica: A Large Language Model for Science
Code
#11
Gopher 7.1 (zero-shot, QA prompts)
0.25
MC1
· 2021-12-08
Scaling Language Models: Methods, Analysis & Insights from Training Gopher
Code
#12
GAL 30B
0.24
MC1
· 2022-11-16
Galactica: A Large Language Model for Science
Code
#13
Gopher 7.1B (zero-shot, Our Prompt + Choices)
0.23
MC1
· 2021-12-08
Scaling Language Models: Methods, Analysis & Insights from Training Gopher
Code
#14
Gopher 1.4 (zero-shot, QA prompts)
0.23
MC1
· 2021-12-08
Scaling Language Models: Methods, Analysis & Insights from Training Gopher
Code
#15
GPT-2 1.5B
SOTA
0.22
MC1
· 2021-09-08
TruthfulQA: Measuring How Models Mimic Human Falsehoods
Code
#16
Gopher 1.4B (zero-shot, Our Prompt + Choices)
0.217
MC1
· 2021-12-08
Scaling Language Models: Methods, Analysis & Insights from Training Gopher
Code
#17
GPT-3 175B
0.21
MC1
· 2021-09-08
TruthfulQA: Measuring How Models Mimic Human Falsehoods
Code
#18
OPT 175B
0.21
MC1
· 2022-11-16
Galactica: A Large Language Model for Science
Code
#19
GPT-J 6B
0.2
MC1
· 2021-09-08
TruthfulQA: Measuring How Models Mimic Human Falsehoods
Code
#20
UnifiedQA 3B
0.19
MC1
· 2021-09-08
TruthfulQA: Measuring How Models Mimic Human Falsehoods
Code
#21
GAL 125M
0.19
MC1
· 2022-11-16
Galactica: A Large Language Model for Science
Code
#22
GAL 1.3B
0.19
MC1
· 2022-11-16
Galactica: A Large Language Model for Science
Code
#23
GAL 6.7B
0.19
MC1
· 2022-11-16
Galactica: A Large Language Model for Science
Code