Wide-minima Density Hypothesis and the Explore-Exploit Learning Rate Schedule

Nikhil Iyer, V Thejas, Nipun Kwatra, Ramachandran Ramjee, Muthian Sivathanu

2020-03-09Machine Translation de-en

Paper PDF Code Code Code Code(official)Code

Abstract

Several papers argue that wide minima generalize better than narrow minima. In this paper, through detailed experiments that not only corroborate the generalization properties of wide minima, we also provide empirical evidence for a new hypothesis that the density of wide minima is likely lower than the density of narrow minima. Further, motivated by this hypothesis, we design a novel explore-exploit learning rate schedule. On a variety of image and natural language datasets, compared to their original hand-tuned learning rate baselines, we show that our explore-exploit schedule can result in either up to 0.84% higher absolute accuracy using the original training budget or up to 57% reduced training time while achieving the original reported accuracy. For example, we achieve state-of-the-art (SOTA) accuracy for IWSLT'14 (DE-EN) dataset by just modifying the learning rate schedule of a high performing model.

Results

Task	Dataset	Metric	Value	Model
Machine Translation	IWSLT2014 German-English	BLEU score	37.78	Cutoff+Knee
Machine Translation	WMT2014 German-English	BLEU score	31.9	MAT+Knee

Related Papers

Speak2Sign3D: A Multi-modal Pipeline for English Speech to American Sign Language Animation2025-07-09 Pun Intended: Multi-Agent Translation of Wordplay with Contrastive Learning and Phonetic-Semantic Embeddings2025-07-09 GRAFT: A Graph-based Flow-aware Agentic Framework for Document-level Machine Translation2025-07-04 TransLaw: Benchmarking Large Language Models in Multi-Agent Simulation of the Collaborative Translation2025-07-01 Enhancing Automatic Term Extraction with Large Language Models via Syntactic Retrieval2025-06-26 Intrinsic vs. Extrinsic Evaluation of Czech Sentence Embeddings: Semantic Relevance Doesn't Help with MT Evaluation2025-06-25 CycleDistill: Bootstrapping Machine Translation using LLMs with Cyclical Distillation2025-06-24 Has Machine Translation Evaluation Achieved Human Parity? The Human Reference and the Limits of Progress2025-06-24