Tasks SotA Datasets Papers Methods Submit About

Datasets/Liu et al. Corpus

Liu et al. Corpus

TextsIntroduced 2019-05-08

The Liu et al. Corpus is a pretraining dataset for large language models. It consists of 160Gb of news, books, stories, and web text.

Statistics

Papers: 1
Benchmarks: 0

Links

Papers With Code 2

A community resource for machine learning research: papers, code, benchmarks, and state-of-the-art results.

Explore

Notable Benchmarks All SotA Datasets Papers Methods

Community

Submit Results About

Data sourced from the PWC Archive (CC-BY-SA 4.0). Built by the community, for the community.