Factorising Meaning and Form for Intent-Preserving Paraphrasing

Tom Hosking, Mirella Lapata

2021-05-31ACL 2021 5Paraphrase Identification Form Paraphrase Generation

Abstract

We propose a method for generating paraphrases of English questions that retain the original intent but use a different surface form. Our model combines a careful choice of training objective with a principled information bottleneck, to induce a latent encoding space that disentangles meaning and form. We train an encoder-decoder model to reconstruct a question from a paraphrase with the same meaning and an exemplar with the same surface form, leading to separated encoding spaces. We use a Vector-Quantized Variational Autoencoder to represent the surface form as a set of discrete latent variables, allowing us to use a classifier to select a different surface form at test time. Crucially, our method does not require access to an external source of target exemplars. Extensive experiments and a human evaluation show that we are able to generate paraphrases with a better tradeoff between semantic preservation and syntactic novelty compared to previous methods.

Results

Task	Dataset	Metric	Value	Model
Text Generation	Quora Question Pairs	iBLEU	5.84	Separator
Text Generation	Paralex	iBLEU	14.84	Separator
Paraphrase Generation	Quora Question Pairs	iBLEU	5.84	Separator
Paraphrase Generation	Paralex	iBLEU	14.84	Separator

Related Papers

FreeAudio: Training-Free Timing Planning for Controllable Long-Form Text-to-Audio Generation2025-07-11 Controlled Retrieval-augmented Context Evaluation for Long-form RAG2025-06-24 FormGym: Doing Paperwork with Agents2025-06-17 FreeQ-Graph: Free-form Querying with Semantic Consistent Scene Graph for 3D Scene Understanding2025-06-16 Direct Reasoning Optimization: LLMs Can Reward And Refine Their Own Reasoning for Open-Ended Tasks2025-06-16 ARGUS: Hallucination and Omission Evaluation in Video-LLMs2025-06-09 LLM Unlearning Should Be Form-Independent2025-06-09 Writing-RL: Advancing Long-form Writing via Adaptive Curriculum Reinforcement Learning2025-06-06