TasksSotADatasetsPapersMethodsSubmitAbout
Papers With Code 2

A community resource for machine learning research: papers, code, benchmarks, and state-of-the-art results.

Explore

Notable BenchmarksAll SotADatasetsPapersMethods

Community

Submit ResultsAbout

Data sourced from the PWC Archive (CC-BY-SA 4.0). Built by the community, for the community.

Papers/Klexikon: A German Dataset for Joint Summarization and Sim...

Klexikon: A German Dataset for Joint Summarization and Simplification

Dennis Aumiller, Michael Gertz

2022-01-18LREC 2022 6Text SummarizationText Simplification
PaperPDFCode(official)Code

Abstract

Traditionally, Text Simplification is treated as a monolingual translation task where sentences between source texts and their simplified counterparts are aligned for training. However, especially for longer input documents, summarizing the text (or dropping less relevant content altogether) plays an important role in the simplification process, which is currently not reflected in existing datasets. Simultaneously, resources for non-English languages are scarce in general and prohibitive for training new solutions. To tackle this problem, we pose core requirements for a system that can jointly summarize and simplify long source documents. We further describe the creation of a new dataset for joint Text Simplification and Summarization based on German Wikipedia and the German children's lexicon "Klexikon", consisting of almost 2900 documents. We release a document-aligned version that particularly highlights the summarization aspect, and provide statistical evidence that this resource is well suited to simplification as well. Code and data are available on Github: https://github.com/dennlinger/klexikon

Results

TaskDatasetMetricValueModel
Text SummarizationKlexikonROUGE-132Luhn's algorithm (25 sentences)
Text SummarizationKlexikonROUGE-25.63Luhn's algorithm (25 sentences)
Text SummarizationKlexikonROUGE-L11.68Luhn's algorithm (25 sentences)
Text SummarizationKlexikonROUGE-125Lead-k
Text SummarizationKlexikonROUGE-25.16Lead-k
Text SummarizationKlexikonROUGE-L12.1Lead-k
Text SummarizationKlexikonROUGE-117.5Lead-3
Text SummarizationKlexikonROUGE-23.94Lead-3
Text SummarizationKlexikonROUGE-L9.99Lead-3
Text SummarizationKlexikonROUGE-116.98Full article
Text SummarizationKlexikonROUGE-24.3Full article
Text SummarizationKlexikonROUGE-L7.09Full article

Related Papers

LRCTI: A Large Language Model-Based Framework for Multi-Step Evidence Retrieval and Reasoning in Cyber Threat Intelligence Credibility Verification2025-07-15On-the-Fly Adaptive Distillation of Transformer to Dual-State Linear Attention2025-06-11Improving large language models with concept-aware fine-tuning2025-06-09MaCP: Minimal yet Mighty Adaptation via Hierarchical Cosine Projection2025-05-29Document-Level Text Generation with Minimum Bayes Risk Decoding using Optimal Transport2025-05-29Enhancing Paraphrase Type Generation: The Impact of DPO and RLHF Evaluated with Human-Ranked Data2025-05-28APE: A Data-Centric Benchmark for Efficient LLM Adaptation in Text Summarization2025-05-26FiLLM -- A Filipino-optimized Large Language Model based on Southeast Asia Large Language Model (SEALLM)2025-05-25