SILD

Survey Item Linking Dataset

TextsIntroduced 2024-06-14

This dataset contains a collection of texts from publications from a broad range of social science domains (e.g., economics, politics, psychology, etc.). The texts are annotated with labels for Survey Item Linking (SIL), an Entity Linking (EL) task. SIL is divided into two sub-tasks: Mention Detection (MD), a binary text classification task, and Entity Disambiguation (ED), a sentence similarity task. Sentences that mention survey items are labeled with the IDs of entities from a knowledge base (GSIM). SILD contains 20,454 sentences in English and German from 100 publications.