TasksSotADatasetsPapersMethodsSubmitAbout
Papers With Code 2

A community resource for machine learning research: papers, code, benchmarks, and state-of-the-art results.

Explore

Notable BenchmarksAll SotADatasetsPapersMethods

Community

Submit ResultsAbout

Data sourced from the PWC Archive (CC-BY-SA 4.0). Built by the community, for the community.

SotA/Knowledge Base/Data Integration

Data Integration

61 benchmarks431 papers

Data integration (also called information integration) is the process of consolidating data from a set of heterogeneous data sources into a single uniform data set (materialized integration) or view on the data (virtual integration). Data integration pipelines involve subtasks such as schema matching, table annotation, entity resolution, value normalization, data cleansing, and data fusion. Application domains of data integration include data warehousing, data lakes, and knowledge base consolidation. Surveys on Data integration:

  • Dong, Srivastava: Big data integration, 2013.

  • Doan, Halevy, Ives: Principles of Data Integration, 2012.

Benchmarks

Data Integration on DBP15k zh-en

Hits@1

Data Integration on Amazon-Google

F1 (%)Candidate Set SizeRecall

Data Integration on Abt-Buy

F1 (%)Candidate Set SizeRecall

Data Integration on dbp15k fr-en

Hits@1

Data Integration on dbp15k ja-en

Hits@1

Data Integration on WDC Products-80%cc-seen-medium

F1 (%)

Data Integration on BiodivTab

F1 (%)

Data Integration on ToughTables-DBP

F1 (%)

Data Integration on ToughTables-WD

F1 (%)

Data Integration on UMVM-dbp-fr-en

Hits@1

Data Integration on UMVM-dbp-ja-en

Hits@1

Data Integration on UMVM-dbp-zh-en

Hits@1

Data Integration on WDC SOTAB V2

Micro F1

Data Integration on UMVM-oea-d-w-v1

Hits@1

Data Integration on UMVM-oea-en-de

Hits@1

Data Integration on UMVM-oea-en-fr

Hits@1

Data Integration on UMVM-oea-d-w-v2

Hits@1

Data Integration on FBDB15k

Hits@1

Data Integration on FBYG15k

Hits@1

Data Integration on WDC Computers-small

F1 (%)

Data Integration on WDC Computers-xlarge

F1 (%)

Data Integration on DBP1M DE-EN

Hit@1

Data Integration on DICEWS-1K

Hit@1

Data Integration on YAGO-WIKI50K

Hit@1

Data Integration on T2Dv2

F1 (%)Accuracy (%)

Data Integration on WDC Products-50%cc-unseen-medium

F1 (%)

Data Integration on WDC SOTAB

Micro F1Weighted F1

Data Integration on WDC Watches-small

F1 (%)

Data Integration on GitTables-SemTab-DBP

F1 (%)

Data Integration on MusicBrainz20K

F1

Data Integration on VizNet-Sato-Full

Macro-F1Weighted-F1

Data Integration on WDC Watches-xlarge

F1 (%)

Data Integration on DBP1M FR-EN

Hit@1

Data Integration on DBP2.0 zh-en

dangling entity detection F1Entity Alignment (Consolidated) F1

Data Integration on GitTables-SemTab-SCH

F1 (%)

Data Integration on VizNet-Sato-MultiColumn

Macro-F1Weighted-F1

Data Integration on WDC Block - large

Candidate Set SizeRecall

Data Integration on WDC Block - medium

Candidate Set SizeRecall

Data Integration on WDC Block - small

Candidate Set SizeRecall

Data Integration on WDC Products-80%cc-seen-medium-multi

F1 Micro

Data Integration on WikiTables-TURL-CPA

F1 (%) Macro-F1

Data Integration on WikiTables-TURL-CTA

F1 (%)Macro-F1

Data Integration on WikipediaGS-CTA

Accuracy (%)

Data Integration on MMKG

H@1

Data Integration on WDC Products

F1 (%)

Data Integration on WikiTables-TURL-CEA

F1 (%)

Data Integration on WikipediaGS

F1 (%)