Papers With Code 2 | ML Benchmarks, SotA Results & Code

SART is a collection of three datasets for Similarity, Analogies and Relatedness for the Tatar language. The three subsets are:

Similarity dataset - 202 pairs of words along with averaged human scores of similarity degree between the words (in 0-to-10 scale). For example, "йорт, бина, 7.69".
Relatedness dataset - 252 pairs of words along with averaged human scores of relatedness degree between the words. For example, "урам, балалар, 5.38".
Analogies dataset - set of analytical questions of the form A:B::C:D, meaning A to B as C to D, and D is to be predicted. For example, "Әнкара Төркия Париж Франция". Contains 34 categories, and in total 30 144 questions.