Semantic Noise Matters for Neural Natural Language Generation

Ondřej Dušek, David M. Howcroft, Verena Rieser

2019-11-10WS 2019 10Data-to-Text Generation Text Generation Hallucination

Abstract

Neural natural language generation (NNLG) systems are known for their pathological outputs, i.e. generating text which is unrelated to the input specification. In this paper, we show the impact of semantic noise on state-of-the-art NNLG models which implement different semantic control mechanisms. We find that cleaned data can improve semantic correctness by up to 97%, while maintaining fluency. We also find that the most common error is omitting information, rather than hallucination.

Results

Task	Dataset	Metric	Value	Model
Text Generation	Cleaned E2E NLG Challenge	BLEU (Test set)	40.73	TGen
Data-to-Text Generation	Cleaned E2E NLG Challenge	BLEU (Test set)	40.73	TGen

Related Papers

Making Language Model a Hierarchical Classifier and Generator2025-07-17 Mitigating Object Hallucinations via Sentence-Level Early Intervention2025-07-16 The Devil behind the mask: An emergent safety vulnerability of Diffusion LLMs2025-07-15 Seq vs Seq: An Open Suite of Paired Encoders and Decoders2025-07-15 Hashed Watermark as a Filter: Defeating Forging and Overwriting Attacks in Weight-based Neural Network Watermarking2025-07-15 Exploiting Leaderboards for Large-Scale Distribution of Malicious Models2025-07-11 ByDeWay: Boost Your multimodal LLM with DEpth prompting in a Training-Free Way2025-07-11 CLI-RAG: A Retrieval-Augmented Framework for Clinically Structured and Context Aware Text Generation with LLMs2025-07-09