Neural data-to-text generation: A comparison between pipeline and end-to-end architectures

Thiago Castro Ferreira, Chris van der Lee, Emiel van Miltenburg, Emiel Krahmer

2019-08-23IJCNLP 2019 11Data-to-Text Generation Text Generation

Abstract

Traditionally, most data-to-text applications have been designed using a modular pipeline architecture, in which non-linguistic input data is converted into natural language through several intermediate transformations. In contrast, recent neural models for data-to-text generation have been proposed as end-to-end approaches, where the non-linguistic input is rendered in natural language with much less explicit intermediate representations in-between. This study introduces a systematic comparison between neural pipeline and end-to-end data-to-text approaches for the generation of text from RDF triples. Both architectures were implemented making use of state-of-the art deep learning methods as the encoder-decoder Gated-Recurrent Units (GRU) and Transformer. Automatic and human evaluations together with a qualitative analysis suggest that having explicit intermediate steps in the generation process results in better texts than the ones generated by end-to-end approaches. Moreover, the pipeline models generalize better to unseen inputs. Data and code are publicly available.

Results

Task	Dataset	Metric	Value	Model
Text Generation	WebNLG	BLEU	57.2	E2E GRU
Text Generation	WebNLG Full	BLEU	51.68	Transformer (Pipeline)
Data-to-Text Generation	WebNLG	BLEU	57.2	E2E GRU
Data-to-Text Generation	WebNLG Full	BLEU	51.68	Transformer (Pipeline)

Related Papers

Making Language Model a Hierarchical Classifier and Generator2025-07-17 Mitigating Object Hallucinations via Sentence-Level Early Intervention2025-07-16 The Devil behind the mask: An emergent safety vulnerability of Diffusion LLMs2025-07-15 Seq vs Seq: An Open Suite of Paired Encoders and Decoders2025-07-15 Hashed Watermark as a Filter: Defeating Forging and Overwriting Attacks in Weight-based Neural Network Watermarking2025-07-15 Exploiting Leaderboards for Large-Scale Distribution of Malicious Models2025-07-11 CLI-RAG: A Retrieval-Augmented Framework for Clinically Structured and Context Aware Text Generation with LLMs2025-07-09 FIFA: Unified Faithfulness Evaluation Framework for Text-to-Video and Video-to-Text Generation2025-07-09