HPLT v2
TextsCreative Commons CC0 license ("no rights reserved")Introduced 2025-01-01
Multilingual text collection extracted from the Internet Archive and Common Crawl archives. Intended to train large language models.
Multilingual text collection extracted from the Internet Archive and Common Crawl archives. Intended to train large language models.