TasksSotADatasetsPapersMethodsSubmitAbout
Papers With Code 2

A community resource for machine learning research: papers, code, benchmarks, and state-of-the-art results.

Explore

Notable BenchmarksAll SotADatasetsPapersMethods

Community

Submit ResultsAbout

Data sourced from the PWC Archive (CC-BY-SA 4.0). Built by the community, for the community.

Papers/Discriminating Between Similar Nordic Languages

Discriminating Between Similar Nordic Languages

René Haas, Leon Derczynski

2020-12-11EACL (VarDial) 2021 4Language IdentificationBIG-bench Machine Learning
PaperPDFCodeCode(official)

Abstract

Automatic language identification is a challenging problem. Discriminating between closely related languages is especially difficult. This paper presents a machine learning approach for automatic language identification for the Nordic languages, which often suffer miscategorisation by existing state-of-the-art tools. Concretely we will focus on discrimination between six Nordic languages: Danish, Swedish, Norwegian (Nynorsk), Norwegian (Bokm{\aa}l), Faroese and Icelandic.

Results

TaskDatasetMetricValueModel
Language IdentificationNordic Language IdentificationAccuracy0.9711FastText

Related Papers

mSTEB: Massively Multilingual Evaluation of LLMs on Speech and Text Tasks2025-06-10Neighbors and relatives: How do speech embeddings reflect linguistic connections across the world?2025-06-10Recursive Semantic Anchoring in ISO 639:2023: A Structural Extension to ISO/TC 37 Frameworks2025-06-07TalTech Systems for the Interspeech 2025 ML-SUPERB 2.0 Challenge2025-06-02Improving Multilingual Speech Models on ML-SUPERB 2.0: Fine-tuning with Data Augmentation and LID-Aware CTC2025-05-30CosyVoice 3: Towards In-the-wild Speech Generation via Scaling-up and Post-training2025-05-23Token Masking Improves Transformer-Based Text Classification2025-05-16Advancing Uto-Aztecan Language Technologies: A Case Study on the Endangered Comanche Language2025-05-10