Metric: % true (higher is better)
| # | Model↕ | % true▼ | Extra Data | Paper | Date↕ | Code |
|---|---|---|---|---|---|---|
| 1 | Vicuna 7B + Inference Time Intervention (ITI) | 88.6 | No | - | - | - |
| 2 | Alpaca 7B + Inference Time Intervention (ITI) | 66.6 | No | - | - | - |
| 3 | LLaMA 65B | 57 | No | LLaMA: Open and Efficient Foundation Language Mo... | 2023-02-27 | Code |
| 4 | UnifiedQA 3B | 53.86 | No | TruthfulQA: Measuring How Models Mimic Human Fal... | 2021-09-08 | Code |
| 5 | LLaMA 33B | 52 | No | LLaMA: Open and Efficient Foundation Language Mo... | 2023-02-27 | Code |
| 6 | LLaMA 13B | 47 | No | LLaMA: Open and Efficient Foundation Language Mo... | 2023-02-27 | Code |
| 7 | LLaMA 7B + Inference Time Intervention (ITI) | 45.1 | No | - | - | - |
| 8 | LLaMA 7B | 33 | No | LLaMA: Open and Efficient Foundation Language Mo... | 2023-02-27 | Code |
| 9 | GPT-2 1.5B | 29.5 | No | TruthfulQA: Measuring How Models Mimic Human Fal... | 2021-09-08 | Code |
| 10 | GPT-J 6B | 26.68 | No | TruthfulQA: Measuring How Models Mimic Human Fal... | 2021-09-08 | Code |
| 11 | GPT-3 175B | 20.44 | No | TruthfulQA: Measuring How Models Mimic Human Fal... | 2021-09-08 | Code |