TasksSotADatasetsPapersMethodsSubmitAbout
Papers With Code 2

A community resource for machine learning research: papers, code, benchmarks, and state-of-the-art results.

Explore

Notable BenchmarksAll SotADatasetsPapersMethods

Community

Submit ResultsAbout

Data sourced from the PWC Archive (CC-BY-SA 4.0). Built by the community, for the community.

SotA/Natural Language Processing/Question Answering/MedQA

Question Answering on MedQA

Metric: Accuracy (higher is better)

LeaderboardDataset
Loading chart...

Results

Submit a result
#Model↕Accuracy▼Extra DataPaperDate↕Code
1Med-Gemini91.1YesCapabilities of Gemini Models in Medicine2024-04-29-
2GPT-490.2YesCan Generalist Foundation Models Outcompete Spec...2023-11-28Code
3Med-PaLM 285.4NoTowards Expert-Level Medical Question Answering ...2023-05-16Code
4Med-PaLM 2 (CoT + SC)83.7NoTowards Expert-Level Medical Question Answering ...2023-05-16Code
5Med-PaLM 2 (5-shot)79.7NoTowards Expert-Level Medical Question Answering ...2023-05-16Code
6MedMobile (3.8B)75.7YesMedMobile: A mobile-sized language model with ex...2024-10-11Code
7Meerkat-7B74.3YesSmall Language Models Learn Enhanced Reasoning S...2024-03-30-
8Meerkat-7B (Single)70.6YesSmall Language Models Learn Enhanced Reasoning S...2024-03-30-
9Meditron-70B (CoT + SC)70.2NoMEDITRON-70B: Scaling Medical Pretraining for La...2023-11-27Code
10Flan-PaLM (540 B)67.6NoLarge Language Models Encode Clinical Knowledge2022-12-26Code
11LLAMA-2 (70B SC CoT)61.5YesMEDITRON-70B: Scaling Medical Pretraining for La...2023-11-27Code
12Shakti-LLM (2.5B)60.3NoSHAKTI: A 2.5 Billion Parameter Small Language M...2024-10-15-
13Codex 5-shot CoT60.2NoCan large language models reason about medical q...2022-07-17Code
14LLAMA-2 (70B)59.2YesMEDITRON-70B: Scaling Medical Pretraining for La...2023-11-27Code
15VOD (BioLinkBERT)55NoVariational Open-Domain Question Answering2022-09-23Code
16BioMedGPT-10B50.4NoBioMedGPT: Open Multimodal Generative Pre-traine...2023-08-18Code
17PubMedGPT (2.7 B)50.3NoLarge Language Models Encode Clinical Knowledge2022-12-26Code
18DRAGON + BioLinkBERT47.5NoDeep Bidirectional Language-Knowledge Graph Pret...2022-10-17Code
19BioLinkBERT (340 M)45.1NoLarge Language Models Encode Clinical Knowledge2022-12-26Code
20GAL 120B (zero-shot)44.4NoGalactica: A Large Language Model for Science2022-11-16Code
21BioLinkBERT (base)40NoLinkBERT: Pretraining Language Models with Docum...2022-03-29Code
22GrapeQA: PEGA39.51NoGrapeQA: GRaph Augmentation and Pruning to Enhan...2023-03-22-
23BioBERT (large)36.7NoBioBERT: a pre-trained biomedical language repre...2019-01-25Code
24BioBERT (base)34.1NoBioBERT: a pre-trained biomedical language repre...2019-01-25Code
25GPT-Neo (2.7 B)33.3NoLarge Language Models Encode Clinical Knowledge2022-12-26Code
26BLOOM (few-shot, k=5)23.3NoGalactica: A Large Language Model for Science2022-11-16Code
27OPT (few-shot, k=5)22.8NoGalactica: A Large Language Model for Science2022-11-16Code