TasksSotADatasetsPapersMethodsSubmitAbout
Papers With Code 2

A community resource for machine learning research: papers, code, benchmarks, and state-of-the-art results.

Explore

Notable BenchmarksAll SotADatasetsPapersMethods

Community

Submit ResultsAbout

Data sourced from the PWC Archive (CC-BY-SA 4.0). Built by the community, for the community.

Papers/Machine Translation Pre-training for Data-to-Text Generati...

Machine Translation Pre-training for Data-to-Text Generation -- A Case Study in Czech

Mihir Kale, Scott Roy

2020-04-05Machine TranslationData-to-Text GenerationText GenerationTranslationTransliteration
PaperPDF

Abstract

While there is a large body of research studying deep learning methods for text generation from structured data, almost all of it focuses purely on English. In this paper, we study the effectiveness of machine translation based pre-training for data-to-text generation in non-English languages. Since the structured data is generally expressed in English, text generation into other languages involves elements of translation, transliteration and copying - elements already encoded in neural machine translation systems. Moreover, since data-to-text corpora are typically small, this task can benefit greatly from pre-training. Based on our experiments on Czech, a morphologically complex language, we find that pre-training lets us train end-to-end models with significantly improved performance, as judged by automatic metrics and human evaluation. We also show that this approach enjoys several desirable properties, including improved performance in low data scenarios and robustness to unseen slot values.

Results

TaskDatasetMetricValueModel
Text GenerationCzech Restaurant NLGBLEU score26.35binmt
Text GenerationCzech Restaurant NLGCIDER2.6binmt
Text GenerationCzech Restaurant NLGMETEOR25.81binmt
Text GenerationCzech Restaurant NLGNIST5.24binmt
Text GenerationCzech Restaurant NLGBLEU score17.72mass
Text GenerationCzech Restaurant NLGCIDER1.75mass
Text GenerationCzech Restaurant NLGMETEOR21.16mass
Text GenerationCzech Restaurant NLGNIST4.22mass
Data-to-Text GenerationCzech Restaurant NLGBLEU score26.35binmt
Data-to-Text GenerationCzech Restaurant NLGCIDER2.6binmt
Data-to-Text GenerationCzech Restaurant NLGMETEOR25.81binmt
Data-to-Text GenerationCzech Restaurant NLGNIST5.24binmt
Data-to-Text GenerationCzech Restaurant NLGBLEU score17.72mass
Data-to-Text GenerationCzech Restaurant NLGCIDER1.75mass
Data-to-Text GenerationCzech Restaurant NLGMETEOR21.16mass
Data-to-Text GenerationCzech Restaurant NLGNIST4.22mass

Related Papers

Making Language Model a Hierarchical Classifier and Generator2025-07-17A Translation of Probabilistic Event Calculus into Markov Decision Processes2025-07-17Mitigating Object Hallucinations via Sentence-Level Early Intervention2025-07-16The Devil behind the mask: An emergent safety vulnerability of Diffusion LLMs2025-07-15Seq vs Seq: An Open Suite of Paired Encoders and Decoders2025-07-15Hashed Watermark as a Filter: Defeating Forging and Overwriting Attacks in Weight-based Neural Network Watermarking2025-07-15Function-to-Style Guidance of LLMs for Code Translation2025-07-15Exploiting Leaderboards for Large-Scale Distribution of Malicious Models2025-07-11