TasksSotADatasetsPapersMethodsSubmitAbout
Papers With Code 2

A community resource for machine learning research: papers, code, benchmarks, and state-of-the-art results.

Explore

Notable BenchmarksAll SotADatasetsPapersMethods

Community

Submit ResultsAbout

Data sourced from the PWC Archive (CC-BY-SA 4.0). Built by the community, for the community.

Papers/Universal Evasion Attacks on Summarization Scoring

Universal Evasion Attacks on Summarization Scoring

Wenchuan Mu, Kwan Hui Lim

2022-10-25Abstractive Text SummarizationNatural Language InferenceDocument Summarization
PaperPDFCode(official)

Abstract

The automatic scoring of summaries is important as it guides the development of summarizers. Scoring is also complex, as it involves multiple aspects such as fluency, grammar, and even textual entailment with the source text. However, summary scoring has not been considered a machine learning task to study its accuracy and robustness. In this study, we place automatic scoring in the context of regression machine learning tasks and perform evasion attacks to explore its robustness. Attack systems predict a non-summary string from each input, and these non-summary strings achieve competitive scores with good summarizers on the most popular metrics: ROUGE, METEOR, and BERTScore. Attack systems also "outperform" state-of-the-art summarization methods on ROUGE-1 and ROUGE-L, and score the second-highest on METEOR. Furthermore, a BERTScore backdoor is observed: a simple trigger can score higher than any automatic summarization method. The evasion attacks in this work indicate the low robustness of current scoring systems at the system level. We hope that our highlighting of these proposed attacks will facilitate the development of summary scores.

Results

TaskDatasetMetricValueModel
Text SummarizationCNN / Daily MailROUGE-148.18Scrambled code + broken (alter)
Text SummarizationCNN / Daily MailROUGE-219.84Scrambled code + broken (alter)
Text SummarizationCNN / Daily MailROUGE-L45.35Scrambled code + broken (alter)
Text SummarizationCNN / Daily MailROUGE-146.71Scrambled code + broken
Text SummarizationCNN / Daily MailROUGE-220.39Scrambled code + broken
Text SummarizationCNN / Daily MailROUGE-L43.56Scrambled code + broken
Text SummarizationCNN / Daily MailROUGE-148.18Scrambled code + broken (alter)
Text SummarizationCNN / Daily MailROUGE-219.84Scrambled code + broken (alter)
Text SummarizationCNN / Daily MailROUGE-L45.35Scrambled code + broken (alter)
Abstractive Text SummarizationCNN / Daily MailROUGE-148.18Scrambled code + broken (alter)
Abstractive Text SummarizationCNN / Daily MailROUGE-219.84Scrambled code + broken (alter)
Abstractive Text SummarizationCNN / Daily MailROUGE-L45.35Scrambled code + broken (alter)
Abstractive Text SummarizationCNN / Daily MailROUGE-146.71Scrambled code + broken
Abstractive Text SummarizationCNN / Daily MailROUGE-220.39Scrambled code + broken
Abstractive Text SummarizationCNN / Daily MailROUGE-L43.56Scrambled code + broken
Document SummarizationCNN / Daily MailROUGE-148.18Scrambled code + broken (alter)
Document SummarizationCNN / Daily MailROUGE-219.84Scrambled code + broken (alter)
Document SummarizationCNN / Daily MailROUGE-L45.35Scrambled code + broken (alter)

Related Papers

LRCTI: A Large Language Model-Based Framework for Multi-Step Evidence Retrieval and Reasoning in Cyber Threat Intelligence Credibility Verification2025-07-15DS@GT at CheckThat! 2025: Evaluating Context and Tokenization Strategies for Numerical Fact Verification2025-07-08ARAG: Agentic Retrieval Augmented Generation for Personalized Recommendation2025-06-27Thunder-NUBench: A Benchmark for LLMs' Sentence-Level Negation Understanding2025-06-17When Does Meaning Backfire? Investigating the Role of AMRs in NLI2025-06-17GenerationPrograms: Fine-grained Attribution with Executable Programs2025-06-17Arctic Long Sequence Training: Scalable And Efficient Training For Multi-Million Token Sequences2025-06-16Explainable Compliance Detection with Multi-Hop Natural Language Inference on Assurance Case Structure2025-06-10