Tasks SotA Datasets Papers Methods Submit About

Papers With Code 2

A community resource for machine learning research: papers, code, benchmarks, and state-of-the-art results.

Explore

Notable Benchmarks All SotA Datasets Papers Methods

Community

Submit Results About

Data sourced from the PWC Archive (CC-BY-SA 4.0). Built by the community, for the community.

Models/Med-PaLM 2 (CoT + SC)

Med-PaLM 2 (CoT + SC)

Reported on 7 benchmarks across 1 task · 1 paper

Note: results are matched by exact model name. Different papers may use the same name for different model variants.

Natural Language Processing7 results

Question AnsweringonPubMedQA
Accuracy· 2023-05-16
74
best: 81.6 (Meditron-70B (CoT + SC))
Towards Expert-Level Medical Question Answering with Large Language Models arXiv:2305.09617
Question AnsweringonMedQA
Accuracy· 2023-05-16
83.7
best: 91.1 (Med-Gemini)
Towards Expert-Level Medical Question Answering with Large Language Models arXiv:2305.09617
Question AnsweringonMMLU (Clinical Knowledge)
Accuracy· 2023-05-16
88.3
best: 88.7 (Med-PaLM 2 (ER))
Towards Expert-Level Medical Question Answering with Large Language Models arXiv:2305.09617
Question AnsweringonMMLU (College Biology)
Accuracy· 2023-05-16
95.1
best: 95.8 (Med-PaLM 2 (ER))
Towards Expert-Level Medical Question Answering with Large Language Models arXiv:2305.09617
Question AnsweringonMMLU (Professional medicine)
Accuracy· 2023-05-16
93.4
best: 95.2 (Med-PaLM 2 (5-shot))
Towards Expert-Level Medical Question Answering with Large Language Models arXiv:2305.09617
Question AnsweringonMMLU (Medical Genetics)
Accuracy· 2023-05-16
89
best: 92 (Med-PaLM 2 (ER))
Towards Expert-Level Medical Question Answering with Large Language Models arXiv:2305.09617
Question AnsweringonMMLU (Anatomy)
Accuracy· 2023-05-16
80
best: 84.4 (Med-PaLM 2 (ER))
Towards Expert-Level Medical Question Answering with Large Language Models arXiv:2305.09617