SART

Texts

SART is a collection of three datasets for Similarity, Analogies and Relatedness for the Tatar language. The three subsets are:

  • Similarity dataset - 202 pairs of words along with averaged human scores of similarity degree between the words (in 0-to-10 scale). For example, "йорт, бина, 7.69".
  • Relatedness dataset - 252 pairs of words along with averaged human scores of relatedness degree between the words. For example, "урам, балалар, 5.38".
  • Analogies dataset - set of analytical questions of the form A:B::C:D, meaning A to B as C to D, and D is to be predicted. For example, "Әнкара Төркия Париж Франция". Contains 34 categories, and in total 30 144 questions.

Source: https://github.com/tat-nlp/SART