DePlot: One-shot visual language reasoning by plot-to-table translation

Fangyu Liu, Julian Martin Eisenschlos, Francesco Piccinno, Syrine Krichene, Chenxi Pang, Kenton Lee, Mandar Joshi, Wenhu Chen, Nigel Collier, Yasemin Altun

2022-12-20Chart Question Answering Translation Large Language Model Factual Inconsistency Detection in Chart Captioning Visual Question Answering (VQA)Language Modelling

Paper PDF Code

Abstract

Visual language such as charts and plots is ubiquitous in the human world. Comprehending plots and charts requires strong reasoning skills. Prior state-of-the-art (SOTA) models require at least tens of thousands of training examples and their reasoning capabilities are still much limited, especially on complex human-written queries. This paper presents the first one-shot solution to visual language reasoning. We decompose the challenge of visual language reasoning into two steps: (1) plot-to-text translation, and (2) reasoning over the translated text. The key in this method is a modality conversion module, named as DePlot, which translates the image of a plot or chart to a linearized table. The output of DePlot can then be directly used to prompt a pretrained large language model (LLM), exploiting the few-shot reasoning capabilities of LLMs. To obtain DePlot, we standardize the plot-to-table task by establishing unified task formats and metrics, and train DePlot end-to-end on this task. DePlot can then be used off-the-shelf together with LLMs in a plug-and-play fashion. Compared with a SOTA model finetuned on more than >28k data points, DePlot+LLM with just one-shot prompting achieves a 24.0% improvement over finetuned SOTA on human-written queries from the task of chart QA.

Results

Task	Dataset	Metric	Value	Model
Visual Question Answering (VQA)	PlotQA	1:1 Accuracy	66.6	DePlot+FlanPaLM+Codex (PoT Self-Consistency)
Visual Question Answering (VQA)	ChartQA	1:1 Accuracy	79.3	DePlot+FlanPaLM+Codex (PoT Self-Consistency)
Visual Question Answering (VQA)	ChartQA	1:1 Accuracy	76.7	DePlot+Codex (PoT Self-Consistency)
Visual Question Answering (VQA)	ChartQA	1:1 Accuracy	70.5	DePlot+FlanPaLM (Self-Consistency)
Visual Question Answering (VQA)	ChartQA	1:1 Accuracy	67.3	DePlot+FlanPaLM (CoT)
Visual Question Answering (VQA)	ChartQA	1:1 Accuracy	42.3	DePlot+GPT3 (Self-Consistency)
Visual Question Answering (VQA)	ChartQA	1:1 Accuracy	36.9	DePlot+GPT3 (CoT)
Chart Question Answering	PlotQA	1:1 Accuracy	66.6	DePlot+FlanPaLM+Codex (PoT Self-Consistency)
Chart Question Answering	ChartQA	1:1 Accuracy	79.3	DePlot+FlanPaLM+Codex (PoT Self-Consistency)
Chart Question Answering	ChartQA	1:1 Accuracy	76.7	DePlot+Codex (PoT Self-Consistency)
Chart Question Answering	ChartQA	1:1 Accuracy	70.5	DePlot+FlanPaLM (Self-Consistency)
Chart Question Answering	ChartQA	1:1 Accuracy	67.3	DePlot+FlanPaLM (CoT)
Chart Question Answering	ChartQA	1:1 Accuracy	42.3	DePlot+GPT3 (Self-Consistency)
Chart Question Answering	ChartQA	1:1 Accuracy	36.9	DePlot+GPT3 (CoT)
Factual Inconsistency Detection in Chart Captioning	CHOCOLATE-LVLM	Kendall's Tau-c	0.129	DePlot + GPT-4
Factual Inconsistency Detection in Chart Captioning	CHOCOLATE-FT	Kendall's Tau-c	0.109	DePlot + GPT-4
Factual Inconsistency Detection in Chart Captioning	CHOCOLATE-LLM	Kendall's Tau-c	0.117	DePlot + GPT-4

DePlot: One-shot visual language reasoning by plot-to-table translation

Abstract

Results

Related Papers

DePlot: One-shot visual language reasoning by plot-to-table translation

Abstract

Results

Related Papers