TasksSotADatasetsPapersMethodsSubmitAbout
Papers With Code 2

A community resource for machine learning research: papers, code, benchmarks, and state-of-the-art results.

Explore

Notable BenchmarksAll SotADatasetsPapersMethods

Community

Submit ResultsAbout

Data sourced from the PWC Archive (CC-BY-SA 4.0). Built by the community, for the community.

Papers/DePlot: One-shot visual language reasoning by plot-to-tabl...

DePlot: One-shot visual language reasoning by plot-to-table translation

Fangyu Liu, Julian Martin Eisenschlos, Francesco Piccinno, Syrine Krichene, Chenxi Pang, Kenton Lee, Mandar Joshi, Wenhu Chen, Nigel Collier, Yasemin Altun

2022-12-20Chart Question AnsweringTranslationLarge Language ModelFactual Inconsistency Detection in Chart CaptioningVisual Question Answering (VQA)Language Modelling
PaperPDFCode

Abstract

Visual language such as charts and plots is ubiquitous in the human world. Comprehending plots and charts requires strong reasoning skills. Prior state-of-the-art (SOTA) models require at least tens of thousands of training examples and their reasoning capabilities are still much limited, especially on complex human-written queries. This paper presents the first one-shot solution to visual language reasoning. We decompose the challenge of visual language reasoning into two steps: (1) plot-to-text translation, and (2) reasoning over the translated text. The key in this method is a modality conversion module, named as DePlot, which translates the image of a plot or chart to a linearized table. The output of DePlot can then be directly used to prompt a pretrained large language model (LLM), exploiting the few-shot reasoning capabilities of LLMs. To obtain DePlot, we standardize the plot-to-table task by establishing unified task formats and metrics, and train DePlot end-to-end on this task. DePlot can then be used off-the-shelf together with LLMs in a plug-and-play fashion. Compared with a SOTA model finetuned on more than >28k data points, DePlot+LLM with just one-shot prompting achieves a 24.0% improvement over finetuned SOTA on human-written queries from the task of chart QA.

Results

TaskDatasetMetricValueModel
Visual Question Answering (VQA)PlotQA1:1 Accuracy66.6DePlot+FlanPaLM+Codex (PoT Self-Consistency)
Visual Question Answering (VQA)ChartQA1:1 Accuracy79.3DePlot+FlanPaLM+Codex (PoT Self-Consistency)
Visual Question Answering (VQA)ChartQA1:1 Accuracy76.7DePlot+Codex (PoT Self-Consistency)
Visual Question Answering (VQA)ChartQA1:1 Accuracy70.5DePlot+FlanPaLM (Self-Consistency)
Visual Question Answering (VQA)ChartQA1:1 Accuracy67.3DePlot+FlanPaLM (CoT)
Visual Question Answering (VQA)ChartQA1:1 Accuracy42.3DePlot+GPT3 (Self-Consistency)
Visual Question Answering (VQA)ChartQA1:1 Accuracy36.9DePlot+GPT3 (CoT)
Chart Question AnsweringPlotQA1:1 Accuracy66.6DePlot+FlanPaLM+Codex (PoT Self-Consistency)
Chart Question AnsweringChartQA1:1 Accuracy79.3DePlot+FlanPaLM+Codex (PoT Self-Consistency)
Chart Question AnsweringChartQA1:1 Accuracy76.7DePlot+Codex (PoT Self-Consistency)
Chart Question AnsweringChartQA1:1 Accuracy70.5DePlot+FlanPaLM (Self-Consistency)
Chart Question AnsweringChartQA1:1 Accuracy67.3DePlot+FlanPaLM (CoT)
Chart Question AnsweringChartQA1:1 Accuracy42.3DePlot+GPT3 (Self-Consistency)
Chart Question AnsweringChartQA1:1 Accuracy36.9DePlot+GPT3 (CoT)
Factual Inconsistency Detection in Chart CaptioningCHOCOLATE-LVLMKendall's Tau-c0.129DePlot + GPT-4
Factual Inconsistency Detection in Chart CaptioningCHOCOLATE-FTKendall's Tau-c0.109DePlot + GPT-4
Factual Inconsistency Detection in Chart CaptioningCHOCOLATE-LLMKendall's Tau-c0.117DePlot + GPT-4

Related Papers

Visual-Language Model Knowledge Distillation Method for Image Quality Assessment2025-07-21DENSE: Longitudinal Progress Note Generation with Temporal Modeling of Heterogeneous Clinical Notes Across Hospital Visits2025-07-18A Translation of Probabilistic Event Calculus into Markov Decision Processes2025-07-17GeoReg: Weight-Constrained Few-Shot Regression for Socio-Economic Estimation using LLM2025-07-17The Generative Energy Arena (GEA): Incorporating Energy Awareness in Large Language Model (LLM) Human Evaluations2025-07-17Inverse Reinforcement Learning Meets Large Language Model Post-Training: Basics, Advances, and Opportunities2025-07-17Rethinking the Embodied Gap in Vision-and-Language Navigation: A Holistic Study of Physical and Visual Disparities2025-07-17VisionThink: Smart and Efficient Vision Language Model via Reinforcement Learning2025-07-17