Visual Question Answering (VQA) on ChartQA

Metric: 1:1 Accuracy (higher is better)

LeaderboardDataset

Loading chart...

Results

Hide extra data

Sort:

#	Model↕	1:1 Accuracy▼	Extra Data	Paper	Date↕	Code
1	ChartPaLI-5B + PaLM 2-S	81.3	Yes	Chart-based Reasoning: Transferring Capabilities...	2024-03-19	-
2	Gemini Ultra	80.8	No	Gemini: A Family of Highly Capable Multimodal Mo...	2023-12-19	Code
3	DePlot+FlanPaLM+Codex (PoT Self-Consistency)	79.3	No	DePlot: One-shot visual language reasoning by pl...	2022-12-20	Code
4	ChartPaLI-5B	77.3	Yes	Chart-based Reasoning: Transferring Capabilities...	2024-03-19	-
5	DePlot+Codex (PoT Self-Consistency)	76.7	No	DePlot: One-shot visual language reasoning by pl...	2022-12-20	Code
6	ScreenAI 5B (4.62 B params, w/ OCR)	76.7	Yes	ScreenAI: A Vision-Language Model for UI and Inf...	2024-02-07	Code
7	SMoLA-PaLI-X Specialist Model	74.6	Yes	Omni-SMoLA: Boosting Generalist Multimodal Model...	2023-12-01	-
8	SMoLA-PaLI-X Generalist Model	73.8	Yes	Omni-SMoLA: Boosting Generalist Multimodal Model...	2023-12-01	-
9	MatCha4096 + LaMenDa	72.64	Yes	-	-	-
10	PaLI-X (Single-task FT w/ OCR)	72.3	Yes	PaLI-X: On Scaling up a Multilingual Vision and ...	2023-05-29	Code
11	PaLI-X (Single-task FT)	70.9	Yes	PaLI-X: On Scaling up a Multilingual Vision and ...	2023-05-29	Code
12	PaLI-X (Multi-task FT)	70.6	Yes	PaLI-X: On Scaling up a Multilingual Vision and ...	2023-05-29	Code
13	DePlot+FlanPaLM (Self-Consistency)	70.5	No	DePlot: One-shot visual language reasoning by pl...	2022-12-20	Code
14	PaLI-3	70	No	PaLI-3 Vision Language Models: Smaller, Faster, ...	2023-10-13	Code
15	PaLI-3 (w/ OCR)	69.5	No	PaLI-3 Vision Language Models: Smaller, Faster, ...	2023-10-13	Code
16	DePlot+FlanPaLM (CoT)	67.3	No	DePlot: One-shot visual language reasoning by pl...	2022-12-20	Code
17	Qwen-VL-Chat	66.3	Yes	Qwen-VL: A Versatile Vision-Language Model for U...	2023-08-24	Code
18	UniChart	66.24	Yes	UniChart: A Universal Vision-language Pretrained...	2023-05-24	Code
19	Qwen-VL	65.7	Yes	Qwen-VL: A Versatile Vision-Language Model for U...	2023-08-24	Code
20	StructChart+GPT3.5 (STR ChartQA+SimChart9K)	65.3	Yes	StructChart: On the Schema, Metric, and Augmenta...	2023-09-20	Code
21	MatCha	64.2	No	MatCha: Enhancing Visual Language Pretraining wi...	2022-12-19	Code
22	StructChart+GPT3.5 (STR)	60.7	No	StructChart: On the Schema, Metric, and Augmenta...	2023-09-20	Code
23	Pix2Struct-large	58.6	No	Pix2Struct: Screenshot Parsing as Pretraining fo...	2022-10-07	Code
24	Pix2Struct-base	56	No	Pix2Struct: Screenshot Parsing as Pretraining fo...	2022-10-07	Code
25	VisionTapas-OCR	45.5	No	ChartQA: A Benchmark for Question Answering abou...	2022-03-19	Code
26	DePlot+GPT3 (Self-Consistency)	42.3	No	DePlot: One-shot visual language reasoning by pl...	2022-12-20	Code
27	DePlot+GPT3 (CoT)	36.9	No	DePlot: One-shot visual language reasoning by pl...	2022-12-20	Code

#1ChartPaLI-5B + PaLM 2-SSOTA
81.3
1:1 Accuracy· Extra Data· 2024-03-19
Chart-based Reasoning: Transferring Capabilities from LLMs to VLMs
#2Gemini UltraSOTA
80.8
1:1 Accuracy· 2023-12-19
Gemini: A Family of Highly Capable Multimodal Models Code
#3DePlot+FlanPaLM+Codex (PoT Self-Consistency)SOTA
79.3
1:1 Accuracy· 2022-12-20
DePlot: One-shot visual language reasoning by plot-to-table translation Code
#4ChartPaLI-5B
77.3
1:1 Accuracy· Extra Data· 2024-03-19
Chart-based Reasoning: Transferring Capabilities from LLMs to VLMs
#5DePlot+Codex (PoT Self-Consistency)
76.7
1:1 Accuracy· 2022-12-20
DePlot: One-shot visual language reasoning by plot-to-table translation Code
#6ScreenAI 5B (4.62 B params, w/ OCR)
76.7
1:1 Accuracy· Extra Data· 2024-02-07
ScreenAI: A Vision-Language Model for UI and Infographics Understanding Code
#7SMoLA-PaLI-X Specialist Model
74.6
1:1 Accuracy· Extra Data· 2023-12-01
Omni-SMoLA: Boosting Generalist Multimodal Models with Soft Mixture of Low-rank Experts
#8SMoLA-PaLI-X Generalist Model
73.8
1:1 Accuracy· Extra Data· 2023-12-01
Omni-SMoLA: Boosting Generalist Multimodal Models with Soft Mixture of Low-rank Experts
#9MatCha4096 + LaMenDa
72.64
1:1 Accuracy· Extra Data
No paper
#10PaLI-X (Single-task FT w/ OCR)
72.3
1:1 Accuracy· Extra Data· 2023-05-29
PaLI-X: On Scaling up a Multilingual Vision and Language Model Code
#11PaLI-X (Single-task FT)
70.9
1:1 Accuracy· Extra Data· 2023-05-29
PaLI-X: On Scaling up a Multilingual Vision and Language Model Code
#12PaLI-X (Multi-task FT)
70.6
1:1 Accuracy· Extra Data· 2023-05-29
PaLI-X: On Scaling up a Multilingual Vision and Language Model Code
#13DePlot+FlanPaLM (Self-Consistency)
70.5
1:1 Accuracy· 2022-12-20
DePlot: One-shot visual language reasoning by plot-to-table translation Code
#14PaLI-3
70
1:1 Accuracy· 2023-10-13
PaLI-3 Vision Language Models: Smaller, Faster, Stronger Code
#15PaLI-3 (w/ OCR)
69.5
1:1 Accuracy· 2023-10-13
PaLI-3 Vision Language Models: Smaller, Faster, Stronger Code
#16DePlot+FlanPaLM (CoT)
67.3
1:1 Accuracy· 2022-12-20
DePlot: One-shot visual language reasoning by plot-to-table translation Code
#17Qwen-VL-Chat
66.3
1:1 Accuracy· Extra Data· 2023-08-24
Qwen-VL: A Versatile Vision-Language Model for Understanding, Localization, Text Reading, and Beyond Code
#18UniChart
66.24
1:1 Accuracy· Extra Data· 2023-05-24
UniChart: A Universal Vision-language Pretrained Model for Chart Comprehension and Reasoning Code
#19Qwen-VL
65.7
1:1 Accuracy· Extra Data· 2023-08-24
Qwen-VL: A Versatile Vision-Language Model for Understanding, Localization, Text Reading, and Beyond Code
#20StructChart+GPT3.5 (STR ChartQA+SimChart9K)
65.3
1:1 Accuracy· Extra Data· 2023-09-20
StructChart: On the Schema, Metric, and Augmentation for Visual Chart Understanding Code
#21MatChaSOTA
64.2
1:1 Accuracy· 2022-12-19
MatCha: Enhancing Visual Language Pretraining with Math Reasoning and Chart Derendering Code
#22StructChart+GPT3.5 (STR)
60.7
1:1 Accuracy· 2023-09-20
StructChart: On the Schema, Metric, and Augmentation for Visual Chart Understanding Code
#23Pix2Struct-largeSOTA
58.6
1:1 Accuracy· 2022-10-07
Pix2Struct: Screenshot Parsing as Pretraining for Visual Language Understanding Code
#24Pix2Struct-base
56
1:1 Accuracy· 2022-10-07
Pix2Struct: Screenshot Parsing as Pretraining for Visual Language Understanding Code
#25VisionTapas-OCRSOTA
45.5
1:1 Accuracy· 2022-03-19
ChartQA: A Benchmark for Question Answering about Charts with Visual and Logical Reasoning Code
#26DePlot+GPT3 (Self-Consistency)
42.3
1:1 Accuracy· 2022-12-20
DePlot: One-shot visual language reasoning by plot-to-table translation Code
#27DePlot+GPT3 (CoT)
36.9
1:1 Accuracy· 2022-12-20
DePlot: One-shot visual language reasoning by plot-to-table translation Code