Question Answering on TruthfulQA

Metric: BLEURT (higher is better)

LeaderboardDataset

Loading chart...

Results

Submit a result

Sort:

#	Model↕	BLEURT▼	Extra Data	Paper	Date↕	Code
1	UnifiedQA 3B	0.08	No	TruthfulQA: Measuring How Models Mimic Human Fal...	2021-09-08	Code
2	GPT-2 1.5B	-0.25	No	TruthfulQA: Measuring How Models Mimic Human Fal...	2021-09-08	Code
3	GPT-J 6B	-0.31	No	TruthfulQA: Measuring How Models Mimic Human Fal...	2021-09-08	Code
4	GPT-3 175B	-0.56	No	TruthfulQA: Measuring How Models Mimic Human Fal...	2021-09-08	Code

#1UnifiedQA 3BSOTA
0.08
BLEURT· 2021-09-08
TruthfulQA: Measuring How Models Mimic Human Falsehoods Code
#2GPT-2 1.5B
-0.25
BLEURT· 2021-09-08
TruthfulQA: Measuring How Models Mimic Human Falsehoods Code
#3GPT-J 6B
-0.31
BLEURT· 2021-09-08
TruthfulQA: Measuring How Models Mimic Human Falsehoods Code
#4GPT-3 175B
-0.56
BLEURT· 2021-09-08
TruthfulQA: Measuring How Models Mimic Human Falsehoods Code