TasksSotADatasetsPapersMethodsSubmitAbout
Papers With Code 2

A community resource for machine learning research: papers, code, benchmarks, and state-of-the-art results.

Explore

Notable BenchmarksAll SotADatasetsPapersMethods

Community

Submit ResultsAbout

Data sourced from the PWC Archive (CC-BY-SA 4.0). Built by the community, for the community.

Papers/Column Type Annotation using ChatGPT

Column Type Annotation using ChatGPT

Keti Korini, Christian Bizer

2023-06-01TaDA@VLDB 2023 8Column Type AnnotationData IntegrationTable annotation
PaperPDFCode(official)

Abstract

Column type annotation is the task of annotating the columns of a relational table with the semantic type of the values contained in each column. Column type annotation is an important pre-processing step for data search and data integration in the context of data lakes. State-of-the-art column type annotation methods either rely on matching table columns to properties of a knowledge graph or fine-tune pre-trained language models such as BERT for column type annotation. In this work, we take a different approach and explore using ChatGPT for column type annotation. We evaluate different prompt designs in zero- and few-shot settings and experiment with providing task definitions and detailed instructions to the model. We further implement a two-step table annotation pipeline which first determines the class of the entities described in the table and depending on this class asks ChatGPT to annotate columns using only the relevant subset of the overall vocabulary. Using instructions as well as the two-step pipeline, ChatGPT reaches F1 scores of over 85% in zero- and one-shot setups. To reach a similar F1 score a RoBERTa model needs to be fine-tuned with 356 examples. This comparison shows that ChatGPT is able deliver competitive results for the column type annotation task given no or only a minimal amount of task-specific demonstrations.

Results

TaskDatasetMetricValueModel
Data IntegrationWDC SOTAB V2Micro F189.47gpt-3.5-turbo-0301-two-step
Table annotationWDC SOTAB V2Micro F189.47gpt-3.5-turbo-0301-two-step

Related Papers

From Classical Machine Learning to Emerging Foundation Models: Review on Multimodal Data Integration for Cancer Research2025-07-11Empowering Digital Agriculture: A Privacy-Preserving Framework for Data Sharing and Collaborative Research2025-06-25Intelligent Operation and Maintenance and Prediction Model Optimization for Improving Wind Power Generation Efficiency2025-06-19Ring-lite: Scalable Reasoning via C3PO-Stabilized Reinforcement Learning for LLMs2025-06-17Brain Imaging Foundation Models, Are We There Yet? A Systematic Review of Foundation Models for Brain Imaging and Biomedical Research2025-06-16Leveraging MIMIC Datasets for Better Digital Health: A Review on Open Problems, Progress Highlights, and Future Promises2025-06-15Enhancing Bagging Ensemble Regression with Data Integration for Time Series-Based Diabetes Prediction2025-06-11scSSL-Bench: Benchmarking Self-Supervised Learning for Single-Cell Data2025-06-10