TasksSotADatasetsPapersMethodsSubmitAbout
Papers With Code 2

A community resource for machine learning research: papers, code, benchmarks, and state-of-the-art results.

Explore

Notable BenchmarksAll SotADatasetsPapersMethods

Community

Submit ResultsAbout

Data sourced from the PWC Archive (CC-BY-SA 4.0). Built by the community, for the community.

Papers/Sato: Contextual Semantic Type Detection in Tables

Sato: Contextual Semantic Type Detection in Tables

Dan Zhang, Yoshihiko Suhara, Jinfeng Li, Madelon Hulsebos, Çağatay Demiralp, Wang-Chiew Tan

2019-11-14Structured PredictionColumn Type AnnotationHybrid Machine LearningInformation RetrievalVocal Bursts Type PredictionRetrieval
PaperPDFCode(official)

Abstract

Detecting the semantic types of data columns in relational tables is important for various data preparation and information retrieval tasks such as data cleaning, schema matching, data discovery, and semantic search. However, existing detection approaches either perform poorly with dirty data, support only a limited number of semantic types, fail to incorporate the table context of columns or rely on large sample sizes for training data. We introduce Sato, a hybrid machine learning model to automatically detect the semantic types of columns in tables, exploiting the signals from the context as well as the column values. Sato combines a deep learning model trained on a large-scale table corpus with topic modeling and structured prediction to achieve support-weighted and macro average F1 scores of 0.925 and 0.735, respectively, exceeding the state-of-the-art performance by a significant margin. We extensively analyze the overall and per-type performance of Sato, discussing how individual modeling components, as well as feature categories, contribute to its performance.

Results

TaskDatasetMetricValueModel
Data IntegrationVizNet-Sato-FullMacro-F175.6Sato
Data IntegrationVizNet-Sato-FullWeighted-F190.2Sato
Data IntegrationVizNet-Sato-MultiColumnMacro-F173.5Sato
Data IntegrationVizNet-Sato-MultiColumnWeighted-F192.5Sato
Table annotationVizNet-Sato-FullMacro-F175.6Sato
Table annotationVizNet-Sato-FullWeighted-F190.2Sato
Table annotationVizNet-Sato-MultiColumnMacro-F173.5Sato
Table annotationVizNet-Sato-MultiColumnWeighted-F192.5Sato

Related Papers

Overview of the TalentCLEF 2025: Skill and Job Title Intelligence for Human Capital Management2025-07-17From Roots to Rewards: Dynamic Tree Reasoning with RL2025-07-17HapticCap: A Multimodal Dataset and Task for Understanding User Experience of Vibration Haptic Signals2025-07-17A Survey of Context Engineering for Large Language Models2025-07-17MCoT-RE: Multi-Faceted Chain-of-Thought and Re-Ranking for Training-Free Zero-Shot Composed Image Retrieval2025-07-17Developing Visual Augmented Q&A System using Scalable Vision Embedding Retrieval & Late Interaction Re-ranker2025-07-16Language-Guided Contrastive Audio-Visual Masked Autoencoder with Automatically Generated Audio-Visual-Text Triplets from Videos2025-07-16Context-Aware Search and Retrieval Over Erasure Channels2025-07-16