TasksSotADatasetsPapersMethodsSubmitAbout
Papers With Code 2

A community resource for machine learning research: papers, code, benchmarks, and state-of-the-art results.

Explore

Notable BenchmarksAll SotADatasetsPapersMethods

Community

Submit ResultsAbout

Data sourced from the PWC Archive (CC-BY-SA 4.0). Built by the community, for the community.

Papers/Challenges in Data-to-Document Generation

Challenges in Data-to-Document Generation

Sam Wiseman, Stuart M. Shieber, Alexander M. Rush

2017-07-25EMNLP 2017 9Data-to-Text GenerationText GenerationDescriptive
PaperPDFCodeCodeCode(official)Code(official)

Abstract

Recent neural models have shown significant progress on the problem of generating short descriptive texts conditioned on a small number of database records. In this work, we suggest a slightly more difficult data-to-text generation task, and investigate how effective current approaches are on this task. In particular, we introduce a new, large-scale corpus of data records paired with descriptive documents, propose a series of extractive evaluation methods for analyzing performance, and obtain baseline results using current neural generation methods. Experiments show that these models produce fluent text, but fail to convincingly approximate human-generated documents. Moreover, even templated baselines exceed the performance of these neural models on some metrics, though copy- and reconstruction-based extensions lead to noticeable improvements.

Results

TaskDatasetMetricValueModel
Text GenerationRotoWire (Relation Generation)count23.72Encoder-decoder + conditional copy
Text GenerationRotoWire (Content Ordering)BLEU14.49Encoder-decoder + conditional copy
Text GenerationRotoWireBLEU14.19Encoder-decoder + conditional copy
Data-to-Text GenerationRotoWire (Relation Generation)count23.72Encoder-decoder + conditional copy
Data-to-Text GenerationRotoWire (Content Ordering)BLEU14.49Encoder-decoder + conditional copy
Data-to-Text GenerationRotoWireBLEU14.19Encoder-decoder + conditional copy

Related Papers

Making Language Model a Hierarchical Classifier and Generator2025-07-17DiffRhythm+: Controllable and Flexible Full-Length Song Generation with Preference Optimization2025-07-17Mitigating Object Hallucinations via Sentence-Level Early Intervention2025-07-16Assay2Mol: large language model-based drug design using BioAssay context2025-07-16Describe Anything Model for Visual Question Answering on Text-rich Images2025-07-16The Devil behind the mask: An emergent safety vulnerability of Diffusion LLMs2025-07-15Seq vs Seq: An Open Suite of Paired Encoders and Decoders2025-07-15Hashed Watermark as a Filter: Defeating Forging and Overwriting Attacks in Weight-based Neural Network Watermarking2025-07-15