PolyNews
TextsCC-BY-NC-4.0Introduced 2024-06-18
PolyNews is a multilingual dataset containing news titles in 77 languages and 19 scripts.
PolyNews aims to provide an easily-accessible, unified and de-duplicated dataset that combines five disparate data sources. It can be used for domain adaptation of language models, language modeling or text generation in both high-resource and low-resource languages.