Rethinking Perturbations in Encoder-Decoders for Fast Training

Sho Takase, Shun Kiyono

2021-04-05NAACL 2021 4Machine Translation Text Summarization

Abstract

We often use perturbations to regularize neural models. For neural encoder-decoders, previous studies applied the scheduled sampling (Bengio et al., 2015) and adversarial perturbations (Sato et al., 2019) as perturbations but these methods require considerable computational time. Thus, this study addresses the question of whether these approaches are efficient enough for training time. We compare several perturbations in sequence-to-sequence problems with respect to computational time. Experimental results show that the simple techniques such as word dropout (Gal and Ghahramani, 2016) and random replacement of input tokens achieve comparable (or better) scores to the recently proposed perturbations, even though these simple methods are faster. Our code is publicly available at https://github.com/takase/rethink_perturbations.

Results

Task	Dataset	Metric	Value	Model
Machine Translation	IWSLT2014 German-English	BLEU score	36.22	Transformer+Rep(Sim)+WDrop
Machine Translation	WMT2014 English-German	BLEU score	33.89	Transformer+Rep(Uni)
Machine Translation	WMT2014 English-German	SacreBLEU	32.35	Transformer+Rep(Uni)
Text Summarization	DUC 2004 Task 1	ROUGE-1	33.06	Transformer+WDrop
Text Summarization	DUC 2004 Task 1	ROUGE-2	11.45	Transformer+WDrop
Text Summarization	DUC 2004 Task 1	ROUGE-L	28.51	Transformer+WDrop
Text Summarization	GigaWord	ROUGE-1	39.81	Transformer+Rep(Uni)
Text Summarization	GigaWord	ROUGE-2	20.4	Transformer+Rep(Uni)
Text Summarization	GigaWord	ROUGE-L	36.93	Transformer+Rep(Uni)
Text Summarization	GigaWord	ROUGE-1	39.66	Transformer+Wdrop
Text Summarization	GigaWord	ROUGE-2	20.45	Transformer+Wdrop
Text Summarization	GigaWord	ROUGE-L	36.59	Transformer+Wdrop

Related Papers

LRCTI: A Large Language Model-Based Framework for Multi-Step Evidence Retrieval and Reasoning in Cyber Threat Intelligence Credibility Verification2025-07-15 Speak2Sign3D: A Multi-modal Pipeline for English Speech to American Sign Language Animation2025-07-09 Pun Intended: Multi-Agent Translation of Wordplay with Contrastive Learning and Phonetic-Semantic Embeddings2025-07-09 GRAFT: A Graph-based Flow-aware Agentic Framework for Document-level Machine Translation2025-07-04 TransLaw: Benchmarking Large Language Models in Multi-Agent Simulation of the Collaborative Translation2025-07-01 Enhancing Automatic Term Extraction with Large Language Models via Syntactic Retrieval2025-06-26 Intrinsic vs. Extrinsic Evaluation of Czech Sentence Embeddings: Semantic Relevance Doesn't Help with MT Evaluation2025-06-25 CycleDistill: Bootstrapping Machine Translation using LLMs with Cyclical Distillation2025-06-24