Machine Translation Pre-training for Data-to-Text Generation -- A Case Study in Czech

Mihir Kale, Scott Roy

2020-04-05Machine Translation Data-to-Text Generation Text Generation Translation Transliteration

Abstract

While there is a large body of research studying deep learning methods for text generation from structured data, almost all of it focuses purely on English. In this paper, we study the effectiveness of machine translation based pre-training for data-to-text generation in non-English languages. Since the structured data is generally expressed in English, text generation into other languages involves elements of translation, transliteration and copying - elements already encoded in neural machine translation systems. Moreover, since data-to-text corpora are typically small, this task can benefit greatly from pre-training. Based on our experiments on Czech, a morphologically complex language, we find that pre-training lets us train end-to-end models with significantly improved performance, as judged by automatic metrics and human evaluation. We also show that this approach enjoys several desirable properties, including improved performance in low data scenarios and robustness to unseen slot values.

Results

Task	Dataset	Metric	Value	Model
Text Generation	Czech Restaurant NLG	BLEU score	26.35	binmt
Text Generation	Czech Restaurant NLG	CIDER	2.6	binmt
Text Generation	Czech Restaurant NLG	METEOR	25.81	binmt
Text Generation	Czech Restaurant NLG	NIST	5.24	binmt
Text Generation	Czech Restaurant NLG	BLEU score	17.72	mass
Text Generation	Czech Restaurant NLG	CIDER	1.75	mass
Text Generation	Czech Restaurant NLG	METEOR	21.16	mass
Text Generation	Czech Restaurant NLG	NIST	4.22	mass
Data-to-Text Generation	Czech Restaurant NLG	BLEU score	26.35	binmt
Data-to-Text Generation	Czech Restaurant NLG	CIDER	2.6	binmt
Data-to-Text Generation	Czech Restaurant NLG	METEOR	25.81	binmt
Data-to-Text Generation	Czech Restaurant NLG	NIST	5.24	binmt
Data-to-Text Generation	Czech Restaurant NLG	BLEU score	17.72	mass
Data-to-Text Generation	Czech Restaurant NLG	CIDER	1.75	mass
Data-to-Text Generation	Czech Restaurant NLG	METEOR	21.16	mass
Data-to-Text Generation	Czech Restaurant NLG	NIST	4.22	mass

Machine Translation Pre-training for Data-to-Text Generation -- A Case Study in Czech

Abstract

Results

Related Papers

Machine Translation Pre-training for Data-to-Text Generation -- A Case Study in Czech

Abstract

Results

Related Papers