Liu et al. Corpus
TextsIntroduced 2019-05-08
The Liu et al. Corpus is a pretraining dataset for large language models. It consists of 160Gb of news, books, stories, and web text.
The Liu et al. Corpus is a pretraining dataset for large language models. It consists of 160Gb of news, books, stories, and web text.