Neural Paraphrase Identification of Questions with Noisy Pretraining
Gaurav Singh Tomar, Thyago Duque, Oscar Täckström, Jakob Uszkoreit, Dipanjan Das
Abstract
We present a solution to the problem of paraphrase identification of questions. We focus on a recent dataset of question pairs annotated with binary paraphrase labels and show that a variant of the decomposable attention model (Parikh et al., 2016) results in accurate performance on this task, while being far simpler than many competing neural architectures. Furthermore, when the model is pretrained on a noisy dataset of automatically collected question paraphrases, it obtains the best reported performance on the dataset.
Results
| Task | Dataset | Metric | Value | Model |
|---|---|---|---|---|
| Semantic Textual Similarity | Quora Question Pairs | Accuracy | 88.4 | pt-DecAtt |
| Paraphrase Identification | Quora Question Pairs | Accuracy | 88.4 | pt-DecAtt |
Related Papers
Evaluating the Effectiveness of Linguistic Knowledge in Pretrained Language Models: A Case Study of Universal Dependencies2025-06-05Enhancing Paraphrase Type Generation: The Impact of DPO and RLHF Evaluated with Human-Ranked Data2025-05-28Enhancing Plagiarism Detection in Marathi with a Weighted Ensemble of TF-IDF and BERT Embeddings for Low-Resource Language Processing2025-01-09Tougher Text, Smarter Models: Raising the Bar for Adversarial Defence Benchmarks2025-01-05Application Specific Compression of Deep Learning Models2024-09-09Cross-lingual paraphrase identification2024-06-21Is Modularity Transferable? A Case Study through the Lens of Knowledge Distillation2024-03-27Memory-efficient Stochastic methods for Memory-based Transformers2023-11-14