DEplain-APA-doc
Textsupon requestIntroduced 2023-05-30
DEplain-APA-doc: A German Parallel Corpus for Document Simplification on News Texts
DEplain is a new dataset of parallel, professionally written and manually aligned simplifications in plain German “plain DE” (or in German: “Einfache Sprache”). DEplain consists of four main subcorpora: DEplain-APA-doc, DEplain-APA-sent, DEplain-web-doc, and DEplain-web-sent.
DEplain-APA-doc consists of approx. 500 news document pairs. The data is available upon request, please see https://doi.org/10.5281/zenodo.7674560 for more information. The corpus can be used for German text simplification, or in more detail document simplification.