EDGAR-CORPUS

TextsAttribution 4.0 InternationalIntroduced 2021-09-29

EDGAR-CORPUS is a novel corpus comprising annual reports from all the publicly traded companies in the US spanning a period of more than 25 years. All the reports are downloaded, split into their corresponding items (sections), and provided in a clean, easy-to-use JSON format.

Image source: https://arxiv.org/pdf/2109.14394v1.pdf