TasksSotADatasetsPapersMethodsSubmitAbout
Papers With Code 2

A community resource for machine learning research: papers, code, benchmarks, and state-of-the-art results.

Explore

Notable BenchmarksAll SotADatasetsPapersMethods

Community

Submit ResultsAbout

Data sourced from the PWC Archive (CC-BY-SA 4.0). Built by the community, for the community.

Papers/Improving Paraphrase Detection with the Adversarial Paraph...

Improving Paraphrase Detection with the Adversarial Paraphrasing Task

Animesh Nighojkar, John Licato

2021-06-14ACL 2021 5Paraphrase Identification
PaperPDFCode(official)

Abstract

If two sentences have the same meaning, it should follow that they are equivalent in their inferential properties, i.e., each sentence should textually entail the other. However, many paraphrase datasets currently in widespread use rely on a sense of paraphrase based on word overlap and syntax. Can we teach them instead to identify paraphrases in a way that draws on the inferential properties of the sentences, and is not over-reliant on lexical and syntactic similarities of a sentence pair? We apply the adversarial paradigm to this question, and introduce a new adversarial method of dataset creation for paraphrase identification: the Adversarial Paraphrasing Task (APT), which asks participants to generate semantically equivalent (in the sense of mutually implicative) but lexically and syntactically disparate paraphrases. These sentence pairs can then be used both to test paraphrase identification models (which get barely random accuracy) and then improve their performance. To accelerate dataset generation, we explore automation of APT using T5, and show that the resulting dataset also improves accuracy. We discuss implications for paraphrase detection and release our dataset in the hope of making paraphrase detection models better able to detect sentence-level meaning equivalence.

Results

TaskDatasetMetricValueModel
Semantic Textual SimilarityAPMCC0.525RoBETRa base
Paraphrase IdentificationAPMCC0.525RoBETRa base

Related Papers

Evaluating the Effectiveness of Linguistic Knowledge in Pretrained Language Models: A Case Study of Universal Dependencies2025-06-05Enhancing Paraphrase Type Generation: The Impact of DPO and RLHF Evaluated with Human-Ranked Data2025-05-28Enhancing Plagiarism Detection in Marathi with a Weighted Ensemble of TF-IDF and BERT Embeddings for Low-Resource Language Processing2025-01-09Tougher Text, Smarter Models: Raising the Bar for Adversarial Defence Benchmarks2025-01-05Application Specific Compression of Deep Learning Models2024-09-09Cross-lingual paraphrase identification2024-06-21Is Modularity Transferable? A Case Study through the Lens of Knowledge Distillation2024-03-27Memory-efficient Stochastic methods for Memory-based Transformers2023-11-14