Question Answering on NewsQA

Metric: F1 (higher is better)

LeaderboardDataset

Loading chart...

Results

Hide extra data

Sort:

#	Model↕	F1▼	Extra Data	Paper	Date↕	Code
1	Riple/Saanvi-v0.5-DeepAnalysis	94.01	Yes	DeepSense: A Unified Deep Learning Framework for...	2016-11-07	Code
2	OpenAI/o3-2025-01-31-high	93.13	Yes	o3-mini vs DeepSeek-R1: Which One is Safer?	2025-01-30	Code
3	OpenAI/o4-mini-2025-05-01-high	91.31	Yes	Thinking Like Transformers	2021-06-13	Code
4	OpenAI/o1-2024-12-17-high	88.72	Yes	0/1 Deep Neural Networks via Block Coordinate De...	2022-06-19	-
5	xAI/grok-3-1212	88.24	Yes	XAI for Transformers: Better Explanations throug...	2022-02-15	Code
6	deepseek-r1	86.13	Yes	DeepSeek-R1: Incentivizing Reasoning Capability ...	2025-01-22	Code
7	Riple/Saanvi-v0.1	85.44	No	Time-series Transformer Generative Adversarial N...	2022-05-23	Code
8	Anthropic/claude-3-7-sonnet	82.3	No	-	-	-
9	OpenAI/GPT-4o	81.74	Yes	GPT-4o as the Gold Standard: A Scalable and Gene...	2024-10-03	-
10	Google/Gemini 2.5 Pro	79.91	Yes	Gemini 1.5: Unlocking multimodal understanding a...	2024-03-08	Code
11	SpanBERT	73.6	No	SpanBERT: Improving Pre-training by Representing...	2019-07-24	Code
12	LinkBERT (large)	72.6	Yes	LinkBERT: Pretraining Language Models with Docum...	2022-03-29	Code
13	DyREX	68.53	Yes	DyREx: Dynamic Query Representation for Extracti...	2022-10-26	Code
14	DecaProp	66.3	No	Densely Connected Attention Propagation for Read...	2018-11-10	Code
15	BERT+ASGen	64.5	No	-	-	-
16	AMANDA	63.7	No	A Question-Focused Multi-Factor Attention Networ...	2018-01-25	Code
17	MINIMAL(Dyn)	63.2	Yes	Efficient and Robust Question Answering from Min...	2018-05-21	Code
18	FastQAExt	56.1	Yes	Making Neural QA as Simple as Possible but not S...	2017-03-14	Code

#1Riple/Saanvi-v0.5-DeepAnalysisSOTA
94.01
F1· Extra Data· 2016-11-07
DeepSense: A Unified Deep Learning Framework for Time-Series Mobile Sensing Data Processing Code
#2OpenAI/o3-2025-01-31-high
93.13
F1· Extra Data· 2025-01-30
o3-mini vs DeepSeek-R1: Which One is Safer?Code
#3OpenAI/o4-mini-2025-05-01-high
91.31
F1· Extra Data· 2021-06-13
Thinking Like Transformers Code
#4OpenAI/o1-2024-12-17-high
88.72
F1· Extra Data· 2022-06-19
0/1 Deep Neural Networks via Block Coordinate Descent
#5xAI/grok-3-1212
88.24
F1· Extra Data· 2022-02-15
XAI for Transformers: Better Explanations through Conservative Propagation Code
#6deepseek-r1
86.13
F1· Extra Data· 2025-01-22
DeepSeek-R1: Incentivizing Reasoning Capability in LLMs via Reinforcement Learning Code
#7Riple/Saanvi-v0.1
85.44
F1· 2022-05-23
Time-series Transformer Generative Adversarial Networks Code
#8Anthropic/claude-3-7-sonnet
82.3
F1
No paper
#9OpenAI/GPT-4o
81.74
F1· Extra Data· 2024-10-03
GPT-4o as the Gold Standard: A Scalable and General Purpose Approach to Filter Language Model Pretraining Data
#10Google/Gemini 2.5 Pro
79.91
F1· Extra Data· 2024-03-08
Gemini 1.5: Unlocking multimodal understanding across millions of tokens of context Code
#11SpanBERT
73.6
F1· 2019-07-24
SpanBERT: Improving Pre-training by Representing and Predicting Spans Code
#12LinkBERT (large)
72.6
F1· Extra Data· 2022-03-29
LinkBERT: Pretraining Language Models with Document Links Code
#13DyREX
68.53
F1· Extra Data· 2022-10-26
DyREx: Dynamic Query Representation for Extractive Question Answering Code
#14DecaProp
66.3
F1· 2018-11-10
Densely Connected Attention Propagation for Reading Comprehension Code
#15BERT+ASGen
64.5
F1
No paper
#16AMANDA
63.7
F1· 2018-01-25
A Question-Focused Multi-Factor Attention Network for Question Answering Code
#17MINIMAL(Dyn)
63.2
F1· Extra Data· 2018-05-21
Efficient and Robust Question Answering from Minimal Context over Documents Code
#18FastQAExt
56.1
F1· Extra Data· 2017-03-14
Making Neural QA as Simple as Possible but not Simpler Code