Neural Paraphrase Identification of Questions with Noisy Pretraining

Gaurav Singh Tomar, Thyago Duque, Oscar Täckström, Jakob Uszkoreit, Dipanjan Das

2017-04-15WS 2017 9Paraphrase Identification

Abstract

We present a solution to the problem of paraphrase identification of questions. We focus on a recent dataset of question pairs annotated with binary paraphrase labels and show that a variant of the decomposable attention model (Parikh et al., 2016) results in accurate performance on this task, while being far simpler than many competing neural architectures. Furthermore, when the model is pretrained on a noisy dataset of automatically collected question paraphrases, it obtains the best reported performance on the dataset.

Results

Task	Dataset	Metric	Value	Model
Semantic Textual Similarity	Quora Question Pairs	Accuracy	88.4	pt-DecAtt
Paraphrase Identification	Quora Question Pairs	Accuracy	88.4	pt-DecAtt

Related Papers

Evaluating the Effectiveness of Linguistic Knowledge in Pretrained Language Models: A Case Study of Universal Dependencies2025-06-05 Enhancing Paraphrase Type Generation: The Impact of DPO and RLHF Evaluated with Human-Ranked Data2025-05-28 Enhancing Plagiarism Detection in Marathi with a Weighted Ensemble of TF-IDF and BERT Embeddings for Low-Resource Language Processing2025-01-09 Tougher Text, Smarter Models: Raising the Bar for Adversarial Defence Benchmarks2025-01-05 Application Specific Compression of Deep Learning Models2024-09-09 Cross-lingual paraphrase identification2024-06-21 Is Modularity Transferable? A Case Study through the Lens of Knowledge Distillation2024-03-27 Memory-efficient Stochastic methods for Memory-based Transformers2023-11-14