Question Answering on TruthfulQA

Metric: MC2 (higher is better)

LeaderboardDataset

Loading chart...

Results

Submit a result

Sort:

#	Model↕	MC2▼	Extra Data	Paper	Date↕	Code
1	Mistral-7B-Instruct-v0.2 + TruthX	0.75	No	TruthX: Alleviating Hallucinations by Editing La...	2024-02-27	Code
2	LLaMa-2-7B-Chat + TruthX	0.74	No	TruthX: Alleviating Hallucinations by Editing La...	2024-02-27	Code
3	GPT-2 1.5B	0.39	No	TruthfulQA: Measuring How Models Mimic Human Fal...	2021-09-08	Code
4	GPT-J 6B	0.36	No	TruthfulQA: Measuring How Models Mimic Human Fal...	2021-09-08	Code
5	UnifiedQA 3B	0.35	No	TruthfulQA: Measuring How Models Mimic Human Fal...	2021-09-08	Code
6	GPT-3 175B	0.33	No	TruthfulQA: Measuring How Models Mimic Human Fal...	2021-09-08	Code

#1Mistral-7B-Instruct-v0.2 + TruthXSOTA
0.75
MC2· 2024-02-27
TruthX: Alleviating Hallucinations by Editing Large Language Models in Truthful Space Code
#2LLaMa-2-7B-Chat + TruthX
0.74
MC2· 2024-02-27
TruthX: Alleviating Hallucinations by Editing Large Language Models in Truthful Space Code
#3GPT-2 1.5BSOTA
0.39
MC2· 2021-09-08
TruthfulQA: Measuring How Models Mimic Human Falsehoods Code
#4GPT-J 6B
0.36
MC2· 2021-09-08
TruthfulQA: Measuring How Models Mimic Human Falsehoods Code
#5UnifiedQA 3B
0.35
MC2· 2021-09-08
TruthfulQA: Measuring How Models Mimic Human Falsehoods Code
#6GPT-3 175B
0.33
MC2· 2021-09-08
TruthfulQA: Measuring How Models Mimic Human Falsehoods Code