ParagraphOrdreing
TextsMIT LicenseIntroduced 2019-03-25
We have prepared a dataset, ParagraphOrdreing, which consists of around 300,000 paragraph pairs. We collected our data from Project Gutenberg. We have written an API for gathering and pre-processing in order to have the appropriate format for the defined task. Each example contains two paragraphs and a label that determines whether the second paragraph comes really after the first paragraph (true order with label 1) or the order has been reversed.
Data Statistics:
- #Train Samples 294,265
- #Test Samples 32,697
- Unique Paragraphs 239,803
- Average Number of Tokens 160.39
- Average Number of Sentences 9.31