Data Diversification: A Simple Strategy For Neural Machine Translation

Xuan-Phi Nguyen, Shafiq Joty, Wu Kui, Ai Ti Aw

2019-11-05NeurIPS 2020 12Machine Translation NMT Translation Knowledge Distillation

Abstract

We introduce Data Diversification: a simple but effective strategy to boost neural machine translation (NMT) performance. It diversifies the training data by using the predictions of multiple forward and backward models and then merging them with the original dataset on which the final NMT model is trained. Our method is applicable to all NMT models. It does not require extra monolingual data like back-translation, nor does it add more computations and parameters like ensembles of models. Our method achieves state-of-the-art BLEU scores of 30.7 and 43.7 in the WMT'14 English-German and English-French translation tasks, respectively. It also substantially improves on 8 other translation tasks: 4 IWSLT tasks (English-German and English-French) and 4 low-resource translation tasks (English-Nepali and English-Sinhala). We demonstrate that our method is more effective than knowledge distillation and dual learning, it exhibits strong correlation with ensembles of models, and it trades perplexity off for better BLEU score. We have released our source code at https://github.com/nxphi47/data_diversification

Results

Task	Dataset	Metric	Value	Model
Machine Translation	IWSLT2014 German-English	BLEU score	37.2	Data Diversification
Machine Translation	WMT2014 English-German	BLEU score	30.7	Data Diversification - Transformer

Related Papers

Visual-Language Model Knowledge Distillation Method for Image Quality Assessment2025-07-21 A Translation of Probabilistic Event Calculus into Markov Decision Processes2025-07-17 Uncertainty-Aware Cross-Modal Knowledge Distillation with Prototype Learning for Multimodal Brain-Computer Interfaces2025-07-17 DVFL-Net: A Lightweight Distilled Video Focal Modulation Network for Spatio-Temporal Action Recognition2025-07-16 Function-to-Style Guidance of LLMs for Code Translation2025-07-15 HanjaBridge: Resolving Semantic Ambiguity in Korean LLMs via Hanja-Augmented Pre-Training2025-07-15 Feature Distillation is the Better Choice for Model-Heterogeneous Federated Learning2025-07-14 KAT-V1: Kwai-AutoThink Technical Report2025-07-11