Mistral multi hop with very large sources

Reported on 6 benchmarks across 1 task

Note: results are matched by exact model name. Different papers may use the same name for different model variants.

Natural Language Processing6 results

Question AnsweringonHotpotQA
ANS-EM
0.08
best: 0.727 (Beam Retrieval)
Question AnsweringonHotpotQA
ANS-F1
0.221
best: 0.85 (Beam Retrieval)
Question AnsweringonHotpotQA
JOINT-EM
0
best: 0.505 (Beam Retrieval)
Question AnsweringonHotpotQA
JOINT-F1
0
best: 0.775 (Beam Retrieval)
Question AnsweringonHotpotQA
SUP-EM
0
best: 0.663 (Beam Retrieval)
Question AnsweringonHotpotQA
SUP-F1
0
best: 0.901 (Beam Retrieval)