MultiQA
Reported on 8 benchmarks across 1 task · 1 paper · 1 SOTA
Note: results are matched by exact model name. Different papers may use the same name for different model variants.
Natural Language Processing8 results
- Hits@1· 2020-11-04SOTA29.3best: 79.7 (Prog-TQA)
- Hits@10· 2020-11-0444.1best: 91 (Prog-TQA)
- ANS-EM0.307best: 0.727 (Beam Retrieval)
- ANS-F10.402best: 0.85 (Beam Retrieval)
- JOINT-EM0best: 0.505 (Beam Retrieval)
- JOINT-F10best: 0.775 (Beam Retrieval)
- SUP-EM0best: 0.663 (Beam Retrieval)
- SUP-F10best: 0.901 (Beam Retrieval)