Understanding tables with intermediate pre-training

Julian Martin Eisenschlos, Syrine Krichene, Thomas Müller

2020-10-01Findings of the Association for Computational Linguistics 2020Table-based Fact Verification Binary Classification Natural Language Inference Data Augmentation

Paper PDF Code(official)

Abstract

Table entailment, the binary classification task of finding if a sentence is supported or refuted by the content of a table, requires parsing language and table structure as well as numerical and discrete reasoning. While there is extensive work on textual entailment, table entailment is less well studied. We adapt TAPAS (Herzig et al., 2020), a table-based BERT model, to recognize entailment. Motivated by the benefits of data augmentation, we create a balanced dataset of millions of automatically created training examples which are learned in an intermediate step prior to fine-tuning. This new data is not only useful for table entailment, but also for SQA (Iyyer et al., 2017), a sequential table QA task. To be able to use long examples as input of BERT models, we evaluate table pruning techniques as a pre-processing step to drastically improve the training and prediction efficiency at a moderate drop in accuracy. The different methods set the new state-of-the-art on the TabFact (Chen et al., 2020) and SQA datasets.

Results

Task	Dataset	Metric	Value	Model
Table-based Fact Verification	TabFact	Test	81	TAPAS-Large classifier with Counterfactual + Synthetic pre-training
Table-based Fact Verification	TabFact	Val	81	TAPAS-Large classifier with Counterfactual + Synthetic pre-training

Related Papers

Overview of the TalentCLEF 2025: Skill and Job Title Intelligence for Human Capital Management2025-07-17 Pixel Perfect MegaMed: A Megapixel-Scale Vision-Language Foundation Model for Generating High Resolution Medical Images2025-07-17 Similarity-Guided Diffusion for Contrastive Sequential Recommendation2025-07-16 LRCTI: A Large Language Model-Based Framework for Multi-Step Evidence Retrieval and Reasoning in Cyber Threat Intelligence Credibility Verification2025-07-15 Data Augmentation in Time Series Forecasting through Inverted Framework2025-07-15 Iceberg: Enhancing HLS Modeling with Synthetic Data2025-07-14 AI-Enhanced Pediatric Pneumonia Detection: A CNN-Based Approach Using Data Augmentation and Generative Adversarial Networks (GANs)2025-07-13 FreeAudio: Training-Free Timing Planning for Controllable Long-Form Text-to-Audio Generation2025-07-11