Ratish Puduppully, Yao Fu, Mirella Lapata
We consider the task of data-to-text generation, which aims to create textual output from non-linguistic input. We focus on generating long-form text, i.e., documents with multiple paragraphs, and propose a neural model enhanced with a planning component responsible for organizing high-level information in a coherent and meaningful way. We infer latent plans sequentially with a structured variational model, while interleaving the steps of planning and generation. Text is generated by conditioning on previous variational decisions and previously generated text. Experiments on two data-to-text benchmarks (RotoWire and MLB) show that our model outperforms strong baselines and is sample efficient in the face of limited training data (e.g., a few hundred instances).
| Task | Dataset | Metric | Value | Model |
|---|---|---|---|---|
| Text Generation | RotoWire (Relation Generation) | Precision | 97.6 | SeqPlan |
| Text Generation | RotoWire (Relation Generation) | count | 46.7 | SeqPlan |
| Text Generation | MLB Dataset (Content Selection) | Precision | 43.3 | SeqPlan |
| Text Generation | MLB Dataset (Content Selection) | Recall | 53.5 | SeqPlan |
| Text Generation | MLB Dataset (Content Ordering) | DLD | 22.7 | SeqPlan |
| Text Generation | MLB Dataset | BLEU | 14.29 | SeqPlan |
| Text Generation | MLB Dataset (Relation Generation) | Precision | 95.9 | SeqPlan |
| Text Generation | MLB Dataset (Relation Generation) | count | 28.9 | SeqPlan |
| Data-to-Text Generation | RotoWire (Relation Generation) | Precision | 97.6 | SeqPlan |
| Data-to-Text Generation | RotoWire (Relation Generation) | count | 46.7 | SeqPlan |
| Data-to-Text Generation | MLB Dataset (Content Selection) | Precision | 43.3 | SeqPlan |
| Data-to-Text Generation | MLB Dataset (Content Selection) | Recall | 53.5 | SeqPlan |
| Data-to-Text Generation | MLB Dataset (Content Ordering) | DLD | 22.7 | SeqPlan |
| Data-to-Text Generation | MLB Dataset | BLEU | 14.29 | SeqPlan |
| Data-to-Text Generation | MLB Dataset (Relation Generation) | Precision | 95.9 | SeqPlan |
| Data-to-Text Generation | MLB Dataset (Relation Generation) | count | 28.9 | SeqPlan |