TasksSotADatasetsPapersMethodsSubmitAbout
Papers With Code 2

A community resource for machine learning research: papers, code, benchmarks, and state-of-the-art results.

Explore

Notable BenchmarksAll SotADatasetsPapersMethods

Community

Submit ResultsAbout

Data sourced from the PWC Archive (CC-BY-SA 4.0). Built by the community, for the community.

Papers/TLDR9+: A Large Scale Resource for Extreme Summarization o...

TLDR9+: A Large Scale Resource for Extreme Summarization of Social Media Posts

Sajad Sotudeh, Hanieh Deilamsalehy, Franck Dernoncourt, Nazli Goharian

2021-10-04EMNLP (newsum) 2021 11Extreme Summarization
PaperPDFCode(official)

Abstract

Recent models in developing summarization systems consist of millions of parameters and the model performance is highly dependent on the abundance of training data. While most existing summarization corpora contain data in the order of thousands to one million, generation of large-scale summarization datasets in order of couple of millions is yet to be explored. Practically, more data is better at generalizing the training patterns to unseen data. In this paper, we introduce TLDR9+ -- a large-scale summarization dataset -- containing over 9 million training instances extracted from Reddit discussion forum (https://github.com/sajastu/reddit_collector). This dataset is specifically gathered to perform extreme summarization (i.e., generating one-sentence summary in high compression and abstraction) and is more than twice larger than the previously proposed dataset. We go one step further and with the help of human annotations, we distill a more fine-grained dataset by sampling High-Quality instances from TLDR9+ and call it TLDRHQ dataset. We further pinpoint different state-of-the-art summarization models on our proposed datasets.

Results

TaskDatasetMetricValueModel
Extreme SummarizationTLDR9+RG-1(%)30.26ORACLE-EXT
Extreme SummarizationTLDR9+RG-2(%)9.74ORACLE-EXT
Extreme SummarizationTLDR9+RG-L(%)20.6ORACLE-EXT
Extreme SummarizationTLDR9+RG-1(%)23.59BART
Extreme SummarizationTLDR9+RG-2(%)9.69BART
Extreme SummarizationTLDR9+RG-L(%)18.62BART
Extreme SummarizationTLDR9+RG-1(%)23.05BERTSUMABS
Extreme SummarizationTLDR9+RG-2(%)9.48BERTSUMABS
Extreme SummarizationTLDR9+RG-L(%)18.07BERTSUMABS
Extreme SummarizationTLDR9+RG-1(%)20.94BERTSUMEXT
Extreme SummarizationTLDR9+RG-2(%)4.98BERTSUMEXT
Extreme SummarizationTLDR9+RG-L(%)14.48BERTSUMEXT

Related Papers

DiffSampling: Enhancing Diversity and Accuracy in Neural Text Generation2025-02-19APEX$^2$: Adaptive and Extreme Summarization for Personalized Knowledge Graphs2024-12-23Explainable News Summarization -- Analysis and mitigation of Disagreement Problem2024-10-24ROUGE-K: Do Your Summaries Have Keywords?2024-03-08Improving Primary Healthcare Workflow Using Extreme Summarization of Scientific Literature Based on Generative AI2023-07-24Curriculum-guided Abstractive Summarization for Mental Health Online Posts2023-02-02Curriculum-Guided Abstractive Summarization2023-02-02WikiDes: A Wikipedia-Based Dataset for Generating Short Descriptions from Paragraphs2022-09-27