Tasks
SotA
Datasets
Papers
Methods
Submit
About
SotA
/
Natural Language Processing
/
Question Answering
/
DROP Test
Question Answering on DROP Test
Metric: F1 (higher is better)
Leaderboard
Dataset
Loading chart...
Results
Submit a result
Hide extra data
Export CSV
Sort:
F1 (best first)
F1 (worst first)
Date (newest first)
Date (oldest first)
Model name (A→Z)
#
Model
↕
F1
▼
Extra Data
Paper
Date
↕
Code
1
QDGAT (ensemble)
88.38
No
Question Directed Graph Attention Network for Nu...
2020-09-16
-
2
POET
87.6
Yes
Reasoning Like Program Executors
2022-01-27
Code
3
PaLM 2 (few-shot)
85
No
PaLM 2 Technical Report
2023-05-17
Code
4
BERT+Calculator (ensemble)
81.78
No
Giving BERT a Calculator: Finding Operations and...
2019-08-31
-
5
NeRd
81.71
No
-
-
-
6
GPT-4 (few-shot, k=3)
80.9
No
GPT-4 Technical Report
2023-03-15
Code
7
TASE-BERT
80.7
No
A Simple and Effective Model for Answering Multi...
2019-09-29
Code
8
MTMSN Large
79.88
No
A Multi-Type Multi-Span Network for Reading Comp...
2019-08-15
Code
9
GenBERT (+ND+TD)
72.4
No
Injecting Numerical Reasoning Skills into Langua...
2020-04-09
Code
10
NumNet
67.97
No
NumNet: Machine Reading Comprehension with Numer...
2019-10-15
Code
11
GPT 3.5 (few-shot, k=3)
64.1
No
GPT-4 Technical Report
2023-03-15
Code
12
Orca 2-7B
60.26
No
Orca 2: Teaching Small Language Models How to Re...
2023-11-18
-
13
Orca 2-13B
57.97
No
Orca 2: Teaching Small Language Models How to Re...
2023-11-18
-
14
NAQA Net
47.01
No
DROP: A Reading Comprehension Benchmark Requirin...
2019-03-01
Code
15
GPT-3 175B (few-shot, k=32)
36.5
No
Language Models are Few-Shot Learners
2020-05-28
Code
16
BERT
32.7
No
DROP: A Reading Comprehension Benchmark Requirin...
2019-03-01
Code
#1
QDGAT (ensemble)
SOTA
88.38
F1
· 2020-09-16
Question Directed Graph Attention Network for Numerical Reasoning over Text
#2
POET
87.6
F1
· Extra Data
· 2022-01-27
Reasoning Like Program Executors
Code
#3
PaLM 2 (few-shot)
85
F1
· 2023-05-17
PaLM 2 Technical Report
Code
#4
BERT+Calculator (ensemble)
SOTA
81.78
F1
· 2019-08-31
Giving BERT a Calculator: Finding Operations and Arguments with Reading Comprehension
#5
NeRd
81.71
F1
No paper
#6
GPT-4 (few-shot, k=3)
80.9
F1
· 2023-03-15
GPT-4 Technical Report
Code
#7
TASE-BERT
80.7
F1
· 2019-09-29
A Simple and Effective Model for Answering Multi-span Questions
Code
#8
MTMSN Large
SOTA
79.88
F1
· 2019-08-15
A Multi-Type Multi-Span Network for Reading Comprehension that Requires Discrete Reasoning
Code
#9
GenBERT (+ND+TD)
72.4
F1
· 2020-04-09
Injecting Numerical Reasoning Skills into Language Models
Code
#10
NumNet
67.97
F1
· 2019-10-15
NumNet: Machine Reading Comprehension with Numerical Reasoning
Code
#11
GPT 3.5 (few-shot, k=3)
64.1
F1
· 2023-03-15
GPT-4 Technical Report
Code
#12
Orca 2-7B
60.26
F1
· 2023-11-18
Orca 2: Teaching Small Language Models How to Reason
#13
Orca 2-13B
57.97
F1
· 2023-11-18
Orca 2: Teaching Small Language Models How to Reason
#14
NAQA Net
SOTA
47.01
F1
· 2019-03-01
DROP: A Reading Comprehension Benchmark Requiring Discrete Reasoning Over Paragraphs
Code
#15
GPT-3 175B (few-shot, k=32)
36.5
F1
· 2020-05-28
Language Models are Few-Shot Learners
Code
#16
BERT
32.7
F1
· 2019-03-01
DROP: A Reading Comprehension Benchmark Requiring Discrete Reasoning Over Paragraphs
Code