Tasks SotA Datasets Papers Methods Submit About

Papers With Code 2

A community resource for machine learning research: papers, code, benchmarks, and state-of-the-art results.

Explore

Notable Benchmarks All SotA Datasets Papers Methods

Community

Submit Results About

Data sourced from the PWC Archive (CC-BY-SA 4.0). Built by the community, for the community.

Models/UnifiedQA 3B

UnifiedQA 3B

Reported on 10 benchmarks across 1 task · 2 papers · 7 SOTA

Note: results are matched by exact model name. Different papers may use the same name for different model variants.

Natural Language Processing10 results

Question AnsweringonTruthfulQA
% true· 2021-09-08
53.86
best: 88.6 (Vicuna 7B + Inference Time Intervention (ITI))
SOTA
TruthfulQA: Measuring How Models Mimic Human Falsehoods arXiv:2109.07958
Question AnsweringonTruthfulQA
% true (GPT-judge)· 2021-09-08
53.24
SOTA
TruthfulQA: Measuring How Models Mimic Human Falsehoods arXiv:2109.07958
Question AnsweringonTruthfulQA
BLEU· 2021-09-08
-0.16
SOTA
TruthfulQA: Measuring How Models Mimic Human Falsehoods arXiv:2109.07958
Question AnsweringonTruthfulQA
BLEURT· 2021-09-08
0.08
SOTA
TruthfulQA: Measuring How Models Mimic Human Falsehoods arXiv:2109.07958
Question AnsweringonTruthfulQA
ROUGE· 2021-09-08
1.76
SOTA
TruthfulQA: Measuring How Models Mimic Human Falsehoods arXiv:2109.07958
Question AnsweringonSIQA
Accuracy· 2020-05-02
79.8
best: 83.2 (Unicorn 11B (fine-tuned))
SOTA
UnifiedQA: Crossing Format Boundaries With a Single QA System arXiv:2005.00700
Question AnsweringonPIQA
Accuracy· 2020-05-02
85.3
best: 90.1 (Unicorn 11B (fine-tuned))
SOTA
UnifiedQA: Crossing Format Boundaries With a Single QA System arXiv:2005.00700
Question AnsweringonTruthfulQA
% info· 2021-09-08
64.5
best: 97.7 (Alpaca 7B + Inference Time Intervention (ITI))
TruthfulQA: Measuring How Models Mimic Human Falsehoods arXiv:2109.07958
Question AnsweringonTruthfulQA
MC1· 2021-09-08
0.19
best: 0.59 (GPT-4 (RLHF))
TruthfulQA: Measuring How Models Mimic Human Falsehoods arXiv:2109.07958
Question AnsweringonTruthfulQA
MC2· 2021-09-08
0.35
best: 0.75 (Mistral-7B-Instruct-v0.2 + TruthX)
TruthfulQA: Measuring How Models Mimic Human Falsehoods arXiv:2109.07958