TasksSotADatasetsPapersMethodsSubmitAbout
Papers With Code 2

A community resource for machine learning research: papers, code, benchmarks, and state-of-the-art results.

Explore

Notable BenchmarksAll SotADatasetsPapersMethods

Community

Submit ResultsAbout

Data sourced from the PWC Archive (CC-BY-SA 4.0). Built by the community, for the community.

SotA/Natural Language Processing/Question Answering/DROP Test

Question Answering on DROP Test

Metric: F1 (higher is better)

LeaderboardDataset
Loading chart...

Results

Submit a result
#Model↕F1▼Extra DataPaperDate↕Code
1QDGAT (ensemble)88.38NoQuestion Directed Graph Attention Network for Nu...2020-09-16-
2POET87.6YesReasoning Like Program Executors2022-01-27Code
3PaLM 2 (few-shot)85NoPaLM 2 Technical Report2023-05-17Code
4BERT+Calculator (ensemble)81.78NoGiving BERT a Calculator: Finding Operations and...2019-08-31-
5NeRd81.71No---
6GPT-4 (few-shot, k=3)80.9NoGPT-4 Technical Report2023-03-15Code
7TASE-BERT80.7NoA Simple and Effective Model for Answering Multi...2019-09-29Code
8MTMSN Large79.88NoA Multi-Type Multi-Span Network for Reading Comp...2019-08-15Code
9GenBERT (+ND+TD)72.4NoInjecting Numerical Reasoning Skills into Langua...2020-04-09Code
10NumNet67.97NoNumNet: Machine Reading Comprehension with Numer...2019-10-15Code
11GPT 3.5 (few-shot, k=3)64.1NoGPT-4 Technical Report2023-03-15Code
12Orca 2-7B60.26NoOrca 2: Teaching Small Language Models How to Re...2023-11-18-
13Orca 2-13B57.97NoOrca 2: Teaching Small Language Models How to Re...2023-11-18-
14NAQA Net47.01NoDROP: A Reading Comprehension Benchmark Requirin...2019-03-01Code
15GPT-3 175B (few-shot, k=32)36.5NoLanguage Models are Few-Shot Learners2020-05-28Code
16BERT32.7NoDROP: A Reading Comprehension Benchmark Requirin...2019-03-01Code