SimpleStories

TextsMITIntroduced 2025-04-12

SimpleStories is a dataset of >2 million model-generated short stories. It was made to train small, interpretable language models on it. The generation process is open-source: To see how the dataset was generated, or to generate some stories yourself, head over to https://github.com/lennart-finke/simple_stories_generate.