Papers With Code 2 | ML Benchmarks, SotA Results & Code

The ScaLA dataset is a linguistic acceptability dataset for the Scandinavian languages, including Danish, Norwegian Bokmål, Norwegian Nynorsk, Swedish, Icelandic, and Faroese. It was developed as part of the ScandEval benchmarking platform and consists of sentences in these languages that are either grammatically correct or incorrect. The dataset is designed to evaluate the ability of language models to distinguish between grammatically correct and incorrect sentences in the Scandinavian languages. It is one of the contributions of the ScandEval project, aiming to advance the state of natural language processing in the Scandinavian languages.