The Pile
TextsIntroduced 2020-12-31
The Pile is a 825 GiB diverse, open source language modelling data set that consists of 22 smaller, high-quality datasets combined together.
Datasheet: Datasheet for the Pile
The Pile is a 825 GiB diverse, open source language modelling data set that consists of 22 smaller, high-quality datasets combined together.
Datasheet: Datasheet for the Pile