Tasks
SotA
Datasets
Papers
Methods
Submit
About
SotA
/
Natural Language Processing
/
Question Answering
/
MedQA
Question Answering on MedQA
Metric: Accuracy (higher is better)
Leaderboard
Dataset
Loading chart...
Results
Submit a result
Hide extra data
Export CSV
Sort:
Accuracy (best first)
Accuracy (worst first)
Date (newest first)
Date (oldest first)
Model name (A→Z)
#
Model
↕
Accuracy
▼
Extra Data
Paper
Date
↕
Code
1
Med-Gemini
91.1
Yes
Capabilities of Gemini Models in Medicine
2024-04-29
-
2
GPT-4
90.2
Yes
Can Generalist Foundation Models Outcompete Spec...
2023-11-28
Code
3
Med-PaLM 2
85.4
No
Towards Expert-Level Medical Question Answering ...
2023-05-16
Code
4
Med-PaLM 2 (CoT + SC)
83.7
No
Towards Expert-Level Medical Question Answering ...
2023-05-16
Code
5
Med-PaLM 2 (5-shot)
79.7
No
Towards Expert-Level Medical Question Answering ...
2023-05-16
Code
6
MedMobile (3.8B)
75.7
Yes
MedMobile: A mobile-sized language model with ex...
2024-10-11
Code
7
Meerkat-7B
74.3
Yes
Small Language Models Learn Enhanced Reasoning S...
2024-03-30
-
8
Meerkat-7B (Single)
70.6
Yes
Small Language Models Learn Enhanced Reasoning S...
2024-03-30
-
9
Meditron-70B (CoT + SC)
70.2
No
MEDITRON-70B: Scaling Medical Pretraining for La...
2023-11-27
Code
10
Flan-PaLM (540 B)
67.6
No
Large Language Models Encode Clinical Knowledge
2022-12-26
Code
11
LLAMA-2 (70B SC CoT)
61.5
Yes
MEDITRON-70B: Scaling Medical Pretraining for La...
2023-11-27
Code
12
Shakti-LLM (2.5B)
60.3
No
SHAKTI: A 2.5 Billion Parameter Small Language M...
2024-10-15
-
13
Codex 5-shot CoT
60.2
No
Can large language models reason about medical q...
2022-07-17
Code
14
LLAMA-2 (70B)
59.2
Yes
MEDITRON-70B: Scaling Medical Pretraining for La...
2023-11-27
Code
15
VOD (BioLinkBERT)
55
No
Variational Open-Domain Question Answering
2022-09-23
Code
16
BioMedGPT-10B
50.4
No
BioMedGPT: Open Multimodal Generative Pre-traine...
2023-08-18
Code
17
PubMedGPT (2.7 B)
50.3
No
Large Language Models Encode Clinical Knowledge
2022-12-26
Code
18
DRAGON + BioLinkBERT
47.5
No
Deep Bidirectional Language-Knowledge Graph Pret...
2022-10-17
Code
19
BioLinkBERT (340 M)
45.1
No
Large Language Models Encode Clinical Knowledge
2022-12-26
Code
20
GAL 120B (zero-shot)
44.4
No
Galactica: A Large Language Model for Science
2022-11-16
Code
21
BioLinkBERT (base)
40
No
LinkBERT: Pretraining Language Models with Docum...
2022-03-29
Code
22
GrapeQA: PEGA
39.51
No
GrapeQA: GRaph Augmentation and Pruning to Enhan...
2023-03-22
-
23
BioBERT (large)
36.7
No
BioBERT: a pre-trained biomedical language repre...
2019-01-25
Code
24
BioBERT (base)
34.1
No
BioBERT: a pre-trained biomedical language repre...
2019-01-25
Code
25
GPT-Neo (2.7 B)
33.3
No
Large Language Models Encode Clinical Knowledge
2022-12-26
Code
26
BLOOM (few-shot, k=5)
23.3
No
Galactica: A Large Language Model for Science
2022-11-16
Code
27
OPT (few-shot, k=5)
22.8
No
Galactica: A Large Language Model for Science
2022-11-16
Code
#1
Med-Gemini
SOTA
91.1
Accuracy
· Extra Data
· 2024-04-29
Capabilities of Gemini Models in Medicine
#2
GPT-4
SOTA
90.2
Accuracy
· Extra Data
· 2023-11-28
Can Generalist Foundation Models Outcompete Special-Purpose Tuning? Case Study in Medicine
Code
#3
Med-PaLM 2
SOTA
85.4
Accuracy
· 2023-05-16
Towards Expert-Level Medical Question Answering with Large Language Models
Code
#4
Med-PaLM 2 (CoT + SC)
83.7
Accuracy
· 2023-05-16
Towards Expert-Level Medical Question Answering with Large Language Models
Code
#5
Med-PaLM 2 (5-shot)
79.7
Accuracy
· 2023-05-16
Towards Expert-Level Medical Question Answering with Large Language Models
Code
#6
MedMobile (3.8B)
75.7
Accuracy
· Extra Data
· 2024-10-11
MedMobile: A mobile-sized language model with expert-level clinical capabilities
Code
#7
Meerkat-7B
74.3
Accuracy
· Extra Data
· 2024-03-30
Small Language Models Learn Enhanced Reasoning Skills from Medical Textbooks
#8
Meerkat-7B (Single)
70.6
Accuracy
· Extra Data
· 2024-03-30
Small Language Models Learn Enhanced Reasoning Skills from Medical Textbooks
#9
Meditron-70B (CoT + SC)
70.2
Accuracy
· 2023-11-27
MEDITRON-70B: Scaling Medical Pretraining for Large Language Models
Code
#10
Flan-PaLM (540 B)
SOTA
67.6
Accuracy
· 2022-12-26
Large Language Models Encode Clinical Knowledge
Code
#11
LLAMA-2 (70B SC CoT)
61.5
Accuracy
· Extra Data
· 2023-11-27
MEDITRON-70B: Scaling Medical Pretraining for Large Language Models
Code
#12
Shakti-LLM (2.5B)
60.3
Accuracy
· 2024-10-15
SHAKTI: A 2.5 Billion Parameter Small Language Model Optimized for Edge AI and Low-Resource Environments
#13
Codex 5-shot CoT
SOTA
60.2
Accuracy
· 2022-07-17
Can large language models reason about medical questions?
Code
#14
LLAMA-2 (70B)
59.2
Accuracy
· Extra Data
· 2023-11-27
MEDITRON-70B: Scaling Medical Pretraining for Large Language Models
Code
#15
VOD (BioLinkBERT)
55
Accuracy
· 2022-09-23
Variational Open-Domain Question Answering
Code
#16
BioMedGPT-10B
50.4
Accuracy
· 2023-08-18
BioMedGPT: Open Multimodal Generative Pre-trained Transformer for BioMedicine
Code
#17
PubMedGPT (2.7 B)
50.3
Accuracy
· 2022-12-26
Large Language Models Encode Clinical Knowledge
Code
#18
DRAGON + BioLinkBERT
47.5
Accuracy
· 2022-10-17
Deep Bidirectional Language-Knowledge Graph Pretraining
Code
#19
BioLinkBERT (340 M)
45.1
Accuracy
· 2022-12-26
Large Language Models Encode Clinical Knowledge
Code
#20
GAL 120B (zero-shot)
44.4
Accuracy
· 2022-11-16
Galactica: A Large Language Model for Science
Code
#21
BioLinkBERT (base)
SOTA
40
Accuracy
· 2022-03-29
LinkBERT: Pretraining Language Models with Document Links
Code
#22
GrapeQA: PEGA
39.51
Accuracy
· 2023-03-22
GrapeQA: GRaph Augmentation and Pruning to Enhance Question-Answering
#23
BioBERT (large)
SOTA
36.7
Accuracy
· 2019-01-25
BioBERT: a pre-trained biomedical language representation model for biomedical text mining
Code
#24
BioBERT (base)
34.1
Accuracy
· 2019-01-25
BioBERT: a pre-trained biomedical language representation model for biomedical text mining
Code
#25
GPT-Neo (2.7 B)
33.3
Accuracy
· 2022-12-26
Large Language Models Encode Clinical Knowledge
Code
#26
BLOOM (few-shot, k=5)
23.3
Accuracy
· 2022-11-16
Galactica: A Large Language Model for Science
Code
#27
OPT (few-shot, k=5)
22.8
Accuracy
· 2022-11-16
Galactica: A Large Language Model for Science
Code