BiDAF++ (single model)

Reported on 7 benchmarks across 1 task · 1 paper · 3 SOTA

Note: results are matched by exact model name. Different papers may use the same name for different model variants.

Natural Language Processing7 results

Question AnsweringonCoQA
In-domain· 2018-09-27
69.4
best: 82.5 (BERT Large Augmented (single model))
SOTA
A Qualitative Comparison of CoQA, SQuAD 2.0 and QuAC arXiv:1809.10735
Question AnsweringonCoQA
Out-of-domain· 2018-09-27
63.8
best: 77.6 (BERT Large Augmented (single model))
SOTA
A Qualitative Comparison of CoQA, SQuAD 2.0 and QuAC arXiv:1809.10735
Question AnsweringonCoQA
Overall· 2018-09-27
67.8
best: 85 (GPT-3 175B (few-shot, k=32))
SOTA
A Qualitative Comparison of CoQA, SQuAD 2.0 and QuAC arXiv:1809.10735
Question AnsweringonSQuAD1.1
EM
77.573
best: 90.622 ({ANNA} (single model))
Question AnsweringonSQuAD1.1
F1
84.858
best: 95.719 ({ANNA} (single model))
Question AnsweringonSQuAD2.0
EM
65.651
best: 90.939 (IE-Net (ensemble))
Question AnsweringonSQuAD2.0
F1
68.866
best: 93.214 (IE-Net (ensemble))