Metric: % true (GPT-judge) (higher is better)
| # | Model↕ | % true (GPT-judge)▼ | Extra Data | Paper | Date↕ | Code |
|---|---|---|---|---|---|---|
| 1 | UnifiedQA 3B | 53.24 | No | TruthfulQA: Measuring How Models Mimic Human Fal... | 2021-09-08 | Code |
| 2 | GPT-2 1.5B | 29.87 | No | TruthfulQA: Measuring How Models Mimic Human Fal... | 2021-09-08 | Code |
| 3 | GPT-J 6B | 27.17 | No | TruthfulQA: Measuring How Models Mimic Human Fal... | 2021-09-08 | Code |
| 4 | GPT-3 175B | 20.56 | No | TruthfulQA: Measuring How Models Mimic Human Fal... | 2021-09-08 | Code |