TasksSotADatasetsPapersMethodsSubmitAbout
Papers With Code 2

A community resource for machine learning research: papers, code, benchmarks, and state-of-the-art results.

Explore

Notable BenchmarksAll SotADatasetsPapersMethods

Community

Submit ResultsAbout

Data sourced from the PWC Archive (CC-BY-SA 4.0). Built by the community, for the community.

Papers/PRIMERA: Pyramid-based Masked Sentence Pre-training for Mu...

PRIMERA: Pyramid-based Masked Sentence Pre-training for Multi-document Summarization

Wen Xiao, Iz Beltagy, Giuseppe Carenini, Arman Cohan

2021-10-16ACL 2022 5Multi-Document SummarizationAbstractive Text SummarizationText SummarizationDocument Summarization
PaperPDFCodeCodeCode(official)

Abstract

We introduce PRIMERA, a pre-trained model for multi-document representation with a focus on summarization that reduces the need for dataset-specific architectures and large amounts of fine-tuning labeled data. PRIMERA uses our newly proposed pre-training objective designed to teach the model to connect and aggregate information across documents. It also uses efficient encoder-decoder transformers to simplify the processing of concatenated input documents. With extensive experiments on 6 multi-document summarization datasets from 3 different domains on zero-shot, few-shot and full-supervised settings, PRIMERA outperforms current state-of-the-art dataset-specific and pre-trained models on most of these settings with large margins. The code and pre-trained models can be found at \url{https://github.com/allenai/PRIMER}.

Results

TaskDatasetMetricValueModel
Text GenerationMulti-NewsROUGE-149.9PRIMER
Text GenerationMulti-NewsROUGE-221.1PRIMER
Text GenerationMulti-NewsROUGE-L25.9PRIMER
Text GenerationWCEPROUGE-146.1PRIMER
Text GenerationWCEPROUGE-225.2PRIMER
Text GenerationWCEPROUGE-L37.9PRIMER
Text SummarizationarXiv Summarization DatasetROUGE-147.6PRIMER
Text SummarizationarXiv Summarization DatasetROUGE-220.8PRIMER
Text SummarizationarXiv Summarization DatasetROUGE-L42.6PRIMER
Text SummarizationMulti-NewsROUGE-149.9PRIMER
Text SummarizationMulti-NewsROUGE-221.1PRIMER
Text SummarizationMulti-NewsROUGE-L25.9PRIMER
Text SummarizationWCEPROUGE-146.1PRIMER
Text SummarizationWCEPROUGE-225.2PRIMER
Text SummarizationWCEPROUGE-L37.9PRIMER

Related Papers

LRCTI: A Large Language Model-Based Framework for Multi-Step Evidence Retrieval and Reasoning in Cyber Threat Intelligence Credibility Verification2025-07-15GenerationPrograms: Fine-grained Attribution with Executable Programs2025-06-17Arctic Long Sequence Training: Scalable And Efficient Training For Multi-Million Token Sequences2025-06-16On-the-Fly Adaptive Distillation of Transformer to Dual-State Linear Attention2025-06-11Improving Fairness of Large Language Models in Multi-document Summarization2025-06-09Improving large language models with concept-aware fine-tuning2025-06-09Advancing Decoding Strategies: Enhancements in Locally Typical Sampling for LLMs2025-06-03ARC: Argument Representation and Coverage Analysis for Zero-Shot Long Document Summarization with Instruction Following LLMs2025-05-29