Curlie

Introduced 2022-01-10

Curlie dataset is a dataset with more than 1M websites in 92 languages with relative labels collected from Curlie, the largest multilingual crowdsourced Web directory. The dataset contains 14 website categories aligned across languages. It is used for language-agnostic website embedding and classification