Tasks
SotA
Datasets
Papers
Methods
Submit
About
SotA
/
Natural Language Processing
/
Question Answering
/
PubMedQA
Question Answering on PubMedQA
Metric: Accuracy (higher is better)
Leaderboard
Dataset
Loading chart...
Results
Submit a result
Export CSV
Sort:
Accuracy (best first)
Accuracy (worst first)
Date (newest first)
Date (oldest first)
Model name (A→Z)
#
Model
↕
Accuracy
▼
Extra Data
Paper
Date
↕
Code
1
Meditron-70B (CoT + SC)
81.6
No
MEDITRON-70B: Scaling Medical Pretraining for La...
2023-11-27
Code
2
BioGPT-Large(1.5B)
81
No
BioGPT: Generative Pre-trained Transformer for B...
2022-10-19
Code
3
RankRAG-llama3-70B (Zero-Shot)
79.8
No
RankRAG: Unifying Context Ranking with Retrieval...
2024-07-02
-
4
Med-PaLM 2 (5-shot)
79.2
No
Towards Expert-Level Medical Question Answering ...
2023-05-16
Code
5
Flan-PaLM (540B, Few-shot)
79
No
Large Language Models Encode Clinical Knowledge
2022-12-26
Code
6
BioGPT(345M)
78.2
No
BioGPT: Generative Pre-trained Transformer for B...
2022-10-19
Code
7
Codex 5-shot CoT
78.2
No
Can large language models reason about medical q...
2022-07-17
Code
8
Human Performance (single annotator)
78
No
PubMedQA: A Dataset for Biomedical Research Ques...
2019-09-13
Code
9
MetaGen Blended RAG (zero-shot)
77.9
No
MetaGen Blended RAG: Higher Accuracy for Domain-...
2025-05-23
Code
10
GAL 120B (zero-shot)
77.6
No
Galactica: A Large Language Model for Science
2022-11-16
Code
11
Flan-PaLM (62B, Few-shot)
77.2
No
Large Language Models Encode Clinical Knowledge
2022-12-26
Code
12
MediSwift-XL
76.8
No
MediSwift: Efficient Sparse Pre-trained Biomedic...
2024-03-01
-
13
Flan-T5-XXL
76.8
No
-
-
-
14
BioMedGPT-10B
76.1
No
BioMedGPT: Open Multimodal Generative Pre-traine...
2023-08-18
Code
15
Claude 3 Opus (5-shot)
75.8
No
-
-
-
16
Flan-PaLM (540B, SC)
75.2
No
Large Language Models Encode Clinical Knowledge
2022-12-26
Code
17
Med-PaLM 2 (ER)
75
No
Towards Expert-Level Medical Question Answering ...
2023-05-16
Code
18
Claude 3 Opus (zero-shot)
74.9
No
-
-
-
19
Med-PaLM 2 (CoT + SC)
74
No
Towards Expert-Level Medical Question Answering ...
2023-05-16
Code
20
BLOOM (zero-shot)
73.6
No
Galactica: A Large Language Model for Science
2022-11-16
Code
21
CoT-T5-11B (1024 Shot)
73.42
No
The CoT Collection: Improving Zero-shot and Few-...
2023-05-23
Code
22
BioLinkBERT (large)
72.2
No
LinkBERT: Pretraining Language Models with Docum...
2022-03-29
Code
23
BioLinkBERT (base)
70.2
No
LinkBERT: Pretraining Language Models with Docum...
2022-03-29
Code
24
OPT (zero-shot)
70.2
No
Galactica: A Large Language Model for Science
2022-11-16
Code
25
Flan-PaLM (8B, Few-shot)
67.6
No
Large Language Models Encode Clinical Knowledge
2022-12-26
Code
26
BioELECTRA uncased
64.2
No
-
-
Code
27
PaLM (62B, Few-shot)
57.8
No
Large Language Models Encode Clinical Knowledge
2022-12-26
Code
28
PubMedBERT uncased
55.84
No
Domain-Specific Language Model Pretraining for B...
2020-07-31
Code
29
PaLM (540B, Few-shot)
55
No
Large Language Models Encode Clinical Knowledge
2022-12-26
Code
30
PaLM (8B, Few-shot)
34
No
Large Language Models Encode Clinical Knowledge
2022-12-26
Code
#1
Meditron-70B (CoT + SC)
SOTA
81.6
Accuracy
· 2023-11-27
MEDITRON-70B: Scaling Medical Pretraining for Large Language Models
Code
#2
BioGPT-Large(1.5B)
SOTA
81
Accuracy
· 2022-10-19
BioGPT: Generative Pre-trained Transformer for Biomedical Text Generation and Mining
Code
#3
RankRAG-llama3-70B (Zero-Shot)
79.8
Accuracy
· 2024-07-02
RankRAG: Unifying Context Ranking with Retrieval-Augmented Generation in LLMs
#4
Med-PaLM 2 (5-shot)
79.2
Accuracy
· 2023-05-16
Towards Expert-Level Medical Question Answering with Large Language Models
Code
#5
Flan-PaLM (540B, Few-shot)
79
Accuracy
· 2022-12-26
Large Language Models Encode Clinical Knowledge
Code
#6
BioGPT(345M)
78.2
Accuracy
· 2022-10-19
BioGPT: Generative Pre-trained Transformer for Biomedical Text Generation and Mining
Code
#7
Codex 5-shot CoT
SOTA
78.2
Accuracy
· 2022-07-17
Can large language models reason about medical questions?
Code
#8
Human Performance (single annotator)
SOTA
78
Accuracy
· 2019-09-13
PubMedQA: A Dataset for Biomedical Research Question Answering
Code
#9
MetaGen Blended RAG (zero-shot)
77.9
Accuracy
· 2025-05-23
MetaGen Blended RAG: Higher Accuracy for Domain-Specific Q&A Without Fine-Tuning
Code
#10
GAL 120B (zero-shot)
77.6
Accuracy
· 2022-11-16
Galactica: A Large Language Model for Science
Code
#11
Flan-PaLM (62B, Few-shot)
77.2
Accuracy
· 2022-12-26
Large Language Models Encode Clinical Knowledge
Code
#12
MediSwift-XL
76.8
Accuracy
· 2024-03-01
MediSwift: Efficient Sparse Pre-trained Biomedical Language Models
#13
Flan-T5-XXL
76.8
Accuracy
No paper
#14
BioMedGPT-10B
76.1
Accuracy
· 2023-08-18
BioMedGPT: Open Multimodal Generative Pre-trained Transformer for BioMedicine
Code
#15
Claude 3 Opus (5-shot)
75.8
Accuracy
No paper
#16
Flan-PaLM (540B, SC)
75.2
Accuracy
· 2022-12-26
Large Language Models Encode Clinical Knowledge
Code
#17
Med-PaLM 2 (ER)
75
Accuracy
· 2023-05-16
Towards Expert-Level Medical Question Answering with Large Language Models
Code
#18
Claude 3 Opus (zero-shot)
74.9
Accuracy
No paper
#19
Med-PaLM 2 (CoT + SC)
74
Accuracy
· 2023-05-16
Towards Expert-Level Medical Question Answering with Large Language Models
Code
#20
BLOOM (zero-shot)
73.6
Accuracy
· 2022-11-16
Galactica: A Large Language Model for Science
Code
#21
CoT-T5-11B (1024 Shot)
73.42
Accuracy
· 2023-05-23
The CoT Collection: Improving Zero-shot and Few-shot Learning of Language Models via Chain-of-Thought Fine-Tuning
Code
#22
BioLinkBERT (large)
72.2
Accuracy
· 2022-03-29
LinkBERT: Pretraining Language Models with Document Links
Code
#23
BioLinkBERT (base)
70.2
Accuracy
· 2022-03-29
LinkBERT: Pretraining Language Models with Document Links
Code
#24
OPT (zero-shot)
70.2
Accuracy
· 2022-11-16
Galactica: A Large Language Model for Science
Code
#25
Flan-PaLM (8B, Few-shot)
67.6
Accuracy
· 2022-12-26
Large Language Models Encode Clinical Knowledge
Code
#26
BioELECTRA uncased
64.2
Accuracy
No paper
Code
#27
PaLM (62B, Few-shot)
57.8
Accuracy
· 2022-12-26
Large Language Models Encode Clinical Knowledge
Code
#28
PubMedBERT uncased
55.84
Accuracy
· 2020-07-31
Domain-Specific Language Model Pretraining for Biomedical Natural Language Processing
Code
#29
PaLM (540B, Few-shot)
55
Accuracy
· 2022-12-26
Large Language Models Encode Clinical Knowledge
Code
#30
PaLM (8B, Few-shot)
34
Accuracy
· 2022-12-26
Large Language Models Encode Clinical Knowledge
Code