TasksSotADatasetsPapersMethodsSubmitAbout
Papers With Code 2

A community resource for machine learning research: papers, code, benchmarks, and state-of-the-art results.

Explore

Notable BenchmarksAll SotADatasetsPapersMethods

Community

Submit ResultsAbout

Data sourced from the PWC Archive (CC-BY-SA 4.0). Built by the community, for the community.

Papers/Wide-minima Density Hypothesis and the Explore-Exploit Lea...

Wide-minima Density Hypothesis and the Explore-Exploit Learning Rate Schedule

Nikhil Iyer, V Thejas, Nipun Kwatra, Ramachandran Ramjee, Muthian Sivathanu

2020-03-09Machine Translationde-en
PaperPDFCodeCodeCodeCode(official)Code

Abstract

Several papers argue that wide minima generalize better than narrow minima. In this paper, through detailed experiments that not only corroborate the generalization properties of wide minima, we also provide empirical evidence for a new hypothesis that the density of wide minima is likely lower than the density of narrow minima. Further, motivated by this hypothesis, we design a novel explore-exploit learning rate schedule. On a variety of image and natural language datasets, compared to their original hand-tuned learning rate baselines, we show that our explore-exploit schedule can result in either up to 0.84% higher absolute accuracy using the original training budget or up to 57% reduced training time while achieving the original reported accuracy. For example, we achieve state-of-the-art (SOTA) accuracy for IWSLT'14 (DE-EN) dataset by just modifying the learning rate schedule of a high performing model.

Results

TaskDatasetMetricValueModel
Machine TranslationIWSLT2014 German-EnglishBLEU score37.78Cutoff+Knee
Machine TranslationWMT2014 German-EnglishBLEU score31.9MAT+Knee

Related Papers

Speak2Sign3D: A Multi-modal Pipeline for English Speech to American Sign Language Animation2025-07-09Pun Intended: Multi-Agent Translation of Wordplay with Contrastive Learning and Phonetic-Semantic Embeddings2025-07-09GRAFT: A Graph-based Flow-aware Agentic Framework for Document-level Machine Translation2025-07-04TransLaw: Benchmarking Large Language Models in Multi-Agent Simulation of the Collaborative Translation2025-07-01Enhancing Automatic Term Extraction with Large Language Models via Syntactic Retrieval2025-06-26Intrinsic vs. Extrinsic Evaluation of Czech Sentence Embeddings: Semantic Relevance Doesn't Help with MT Evaluation2025-06-25CycleDistill: Bootstrapping Machine Translation using LLMs with Cyclical Distillation2025-06-24Has Machine Translation Evaluation Achieved Human Parity? The Human Reference and the Limits of Progress2025-06-24