Tasks
SotA
Datasets
Papers
Methods
Submit
About
SotA
/
Natural Language Processing
/
Question Answering
/
TruthfulQA
Question Answering on TruthfulQA
Metric: MC1 (higher is better)
Leaderboard
Dataset
Loading chart...
Results
Submit a result
Export CSV
#
Model
↕
MC1
▼
Extra Data
Paper
Date
↕
Code
1
GPT-4 (RLHF)
0.59
No
GPT-4 Technical Report
2023-03-15
Code
2
Mistral-7B-Instruct-v0.2 + TruthX
0.56
No
TruthX: Alleviating Hallucinations by Editing La...
2024-02-27
Code
3
LLaMa-2-7B-Chat + TruthX
0.54
No
TruthX: Alleviating Hallucinations by Editing La...
2024-02-27
Code
4
LLaMA-2-Chat-13B + Representation Control (Contrast Vector)
0.54
No
Representation Engineering: A Top-Down Approach ...
2023-10-02
Code
5
LLaMA-2-Chat-7B + Representation Control (Contrast Vector)
0.48
No
Representation Engineering: A Top-Down Approach ...
2023-10-02
Code
6
Vicuna 7B + Inference Time Intervention (ITI)
0.389
No
-
-
-
7
Alpaca 7B + Inference Time Intervention (ITI)
0.319
No
-
-
-
8
Gopher 280B (zero-shot, Our Prompt + Choices)
0.295
No
Scaling Language Models: Methods, Analysis & Ins...
2021-12-08
Code
9
LLaMA 7B + Inference Time Intervention (ITI)
0.288
No
-
-
-
10
GAL 120B
0.26
No
Galactica: A Large Language Model for Science
2022-11-16
Code
11
Gopher 7.1 (zero-shot, QA prompts)
0.25
No
Scaling Language Models: Methods, Analysis & Ins...
2021-12-08
Code
12
GAL 30B
0.24
No
Galactica: A Large Language Model for Science
2022-11-16
Code
13
Gopher 7.1B (zero-shot, Our Prompt + Choices)
0.23
No
Scaling Language Models: Methods, Analysis & Ins...
2021-12-08
Code
14
Gopher 1.4 (zero-shot, QA prompts)
0.23
No
Scaling Language Models: Methods, Analysis & Ins...
2021-12-08
Code
15
GPT-2 1.5B
0.22
No
TruthfulQA: Measuring How Models Mimic Human Fal...
2021-09-08
Code
16
Gopher 1.4B (zero-shot, Our Prompt + Choices)
0.217
No
Scaling Language Models: Methods, Analysis & Ins...
2021-12-08
Code
17
GPT-3 175B
0.21
No
TruthfulQA: Measuring How Models Mimic Human Fal...
2021-09-08
Code
18
OPT 175B
0.21
No
Galactica: A Large Language Model for Science
2022-11-16
Code
19
GPT-J 6B
0.2
No
TruthfulQA: Measuring How Models Mimic Human Fal...
2021-09-08
Code
20
UnifiedQA 3B
0.19
No
TruthfulQA: Measuring How Models Mimic Human Fal...
2021-09-08
Code
21
GAL 125M
0.19
No
Galactica: A Large Language Model for Science
2022-11-16
Code
22
GAL 1.3B
0.19
No
Galactica: A Large Language Model for Science
2022-11-16
Code
23
GAL 6.7B
0.19
No
Galactica: A Large Language Model for Science
2022-11-16
Code