TasksSotADatasetsPapersMethodsSubmitAbout
Papers With Code 2

A community resource for machine learning research: papers, code, benchmarks, and state-of-the-art results.

Explore

Notable BenchmarksAll SotADatasetsPapersMethods

Community

Submit ResultsAbout

Data sourced from the PWC Archive (CC-BY-SA 4.0). Built by the community, for the community.

SotA/Natural Language Processing/Question Answering/PubMedQA

Question Answering on PubMedQA

Metric: Accuracy (higher is better)

LeaderboardDataset
Loading chart...

Results

Submit a result
#Model↕Accuracy▼Extra DataPaperDate↕Code
1Meditron-70B (CoT + SC)81.6NoMEDITRON-70B: Scaling Medical Pretraining for La...2023-11-27Code
2BioGPT-Large(1.5B)81NoBioGPT: Generative Pre-trained Transformer for B...2022-10-19Code
3RankRAG-llama3-70B (Zero-Shot)79.8NoRankRAG: Unifying Context Ranking with Retrieval...2024-07-02-
4Med-PaLM 2 (5-shot)79.2NoTowards Expert-Level Medical Question Answering ...2023-05-16Code
5Flan-PaLM (540B, Few-shot)79NoLarge Language Models Encode Clinical Knowledge2022-12-26Code
6BioGPT(345M)78.2NoBioGPT: Generative Pre-trained Transformer for B...2022-10-19Code
7Codex 5-shot CoT78.2NoCan large language models reason about medical q...2022-07-17Code
8Human Performance (single annotator)78NoPubMedQA: A Dataset for Biomedical Research Ques...2019-09-13Code
9MetaGen Blended RAG (zero-shot)77.9NoMetaGen Blended RAG: Higher Accuracy for Domain-...2025-05-23Code
10GAL 120B (zero-shot)77.6NoGalactica: A Large Language Model for Science2022-11-16Code
11Flan-PaLM (62B, Few-shot)77.2NoLarge Language Models Encode Clinical Knowledge2022-12-26Code
12MediSwift-XL76.8NoMediSwift: Efficient Sparse Pre-trained Biomedic...2024-03-01-
13Flan-T5-XXL76.8No---
14BioMedGPT-10B76.1NoBioMedGPT: Open Multimodal Generative Pre-traine...2023-08-18Code
15Claude 3 Opus (5-shot)75.8No---
16Flan-PaLM (540B, SC)75.2NoLarge Language Models Encode Clinical Knowledge2022-12-26Code
17Med-PaLM 2 (ER)75NoTowards Expert-Level Medical Question Answering ...2023-05-16Code
18Claude 3 Opus (zero-shot)74.9No---
19Med-PaLM 2 (CoT + SC)74NoTowards Expert-Level Medical Question Answering ...2023-05-16Code
20BLOOM (zero-shot)73.6NoGalactica: A Large Language Model for Science2022-11-16Code
21CoT-T5-11B (1024 Shot)73.42NoThe CoT Collection: Improving Zero-shot and Few-...2023-05-23Code
22BioLinkBERT (large)72.2NoLinkBERT: Pretraining Language Models with Docum...2022-03-29Code
23BioLinkBERT (base)70.2NoLinkBERT: Pretraining Language Models with Docum...2022-03-29Code
24OPT (zero-shot)70.2NoGalactica: A Large Language Model for Science2022-11-16Code
25Flan-PaLM (8B, Few-shot)67.6NoLarge Language Models Encode Clinical Knowledge2022-12-26Code
26BioELECTRA uncased64.2No--Code
27PaLM (62B, Few-shot)57.8NoLarge Language Models Encode Clinical Knowledge2022-12-26Code
28PubMedBERT uncased55.84NoDomain-Specific Language Model Pretraining for B...2020-07-31Code
29PaLM (540B, Few-shot)55NoLarge Language Models Encode Clinical Knowledge2022-12-26Code
30PaLM (8B, Few-shot)34NoLarge Language Models Encode Clinical Knowledge2022-12-26Code