HPLT v2

TextsCreative Commons CC0 license ("no rights reserved")Introduced 2025-01-01

Multilingual text collection extracted from the Internet Archive and Common Crawl archives. Intended to train large language models.