DAGW
Danish Gigaword
TextsIntroduced 2021-05-01
It’s hard to develop good tools for processing Danish with computers when no large and wide-coverage dataset of Danish text is readily available. To address this, the Danish Gigaword Project (DAGW) maintains a corpus for Danish with over a billion words. The general goals are to create a dataset that is:
- representative;
- accessible;
- a suitable common starting point for Danish NLP models.