Question Answering on TruthfulQA

Metric: BLEU (higher is better)

LeaderboardDataset

Loading chart...

Results

Submit a result

Sort:

#	Model↕	BLEU▼	Extra Data	Paper	Date↕	Code
1	UnifiedQA 3B	-0.16	No	TruthfulQA: Measuring How Models Mimic Human Fal...	2021-09-08	Code
2	GPT-2 1.5B	-4.91	No	TruthfulQA: Measuring How Models Mimic Human Fal...	2021-09-08	Code
3	GPT-J 6B	-7.58	No	TruthfulQA: Measuring How Models Mimic Human Fal...	2021-09-08	Code
4	GPT-3 175B	-17.38	No	TruthfulQA: Measuring How Models Mimic Human Fal...	2021-09-08	Code

#1UnifiedQA 3BSOTA
-0.16
BLEU· 2021-09-08
TruthfulQA: Measuring How Models Mimic Human Falsehoods Code
#2GPT-2 1.5B
-4.91
BLEU· 2021-09-08
TruthfulQA: Measuring How Models Mimic Human Falsehoods Code
#3GPT-J 6B
-7.58
BLEU· 2021-09-08
TruthfulQA: Measuring How Models Mimic Human Falsehoods Code
#4GPT-3 175B
-17.38
BLEU· 2021-09-08
TruthfulQA: Measuring How Models Mimic Human Falsehoods Code