BERT (Devlin et al., 2019)-Base

Reported on 2 benchmarks across 1 task · 1 paper

Note: results are matched by exact model name. Different papers may use the same name for different model variants.

Natural Language Processing2 results

Question AnsweringonMedMCQA
Dev Set (Acc-%)· 2022-03-27
0.35
best: 66 (Meditron-70B (CoT + SC))
MedMCQA : A Large-scale Multi-Subject Multi-Choice Dataset for Medical domain Question Answering arXiv:2203.14371
Question AnsweringonMedMCQA
Test Set (Acc-%)· 2022-03-27
0.33
best: 0.723 (Med-PaLM 2 (ER))
MedMCQA : A Large-scale Multi-Subject Multi-Choice Dataset for Medical domain Question Answering arXiv:2203.14371