TasksSotADatasetsPapersMethodsSubmitAbout
Papers With Code 2

A community resource for machine learning research: papers, code, benchmarks, and state-of-the-art results.

Explore

Notable BenchmarksAll SotADatasetsPapersMethods

Community

Submit ResultsAbout

Data sourced from the PWC Archive (CC-BY-SA 4.0). Built by the community, for the community.

Papers/Annotating Columns with Pre-trained Language Models

Annotating Columns with Pre-trained Language Models

Yoshihiko Suhara, Jinfeng Li, Yuliang Li, Dan Zhang, Çağatay Demiralp, Chen Chen, Wang-Chiew Tan

2021-04-05Column Type AnnotationColumns Property AnnotationType predictionMulti-Task LearningManagementRelation PredictionTable annotation
PaperPDFCode(official)

Abstract

Inferring meta information about tables, such as column headers or relationships between columns, is an active research topic in data management as we find many tables are missing some of this information. In this paper, we study the problem of annotating table columns (i.e., predicting column types and the relationships between columns) using only information from the table itself. We develop a multi-task learning framework (called Doduo) based on pre-trained language models, which takes the entire table as input and predicts column types/relations using a single model. Experimental results show that Doduo establishes new state-of-the-art performance on two benchmarks for the column type prediction and column relation prediction tasks with up to 4.0% and 11.9% improvements, respectively. We report that Doduo can already outperform the previous state-of-the-art performance with a minimal number of tokens, only 8 tokens per column. We release a toolbox (https://github.com/megagonlabs/doduo) and confirm the effectiveness of Doduo on a real-world data science problem through a case study.

Results

TaskDatasetMetricValueModel
Data IntegrationVizNet-Sato-FullMacro-F184.6DODUO
Data IntegrationWikiTables-TURL-CTAF1 (%)92.45DODUO
Data IntegrationVizNet-Sato-MultiColumnMacro-F183.8DODUO
Data IntegrationWikiTables-TURL-CPAF1 (%)91.72DODUO
Table annotationVizNet-Sato-FullMacro-F184.6DODUO
Table annotationWikiTables-TURL-CTAF1 (%)92.45DODUO
Table annotationVizNet-Sato-MultiColumnMacro-F183.8DODUO
Table annotationWikiTables-TURL-CPAF1 (%)91.72DODUO

Related Papers

SGCL: Unifying Self-Supervised and Supervised Learning for Graph Recommendation2025-07-17Overview of the TalentCLEF 2025: Skill and Job Title Intelligence for Human Capital Management2025-07-17Autonomous Resource Management in Microservice Systems via Reinforcement Learning2025-07-17Robust-Multi-Task Gradient Boosting2025-07-15SAMO: A Lightweight Sharpness-Aware Approach for Multi-Task Optimization with Joint Global-Local Perturbation2025-07-10Unpatchable Vulnerabilities in Windows 10/11: Security Report 20252025-07-10DT4PCP: A Digital Twin Framework for Personalized Care Planning Applied to Type 2 Diabetes Management2025-07-10RAPS-3D: Efficient interactive segmentation for 3D radiological imaging2025-07-10