TasksSotADatasetsPapersMethodsSubmitAbout
Papers With Code 2

A community resource for machine learning research: papers, code, benchmarks, and state-of-the-art results.

Explore

Notable BenchmarksAll SotADatasetsPapersMethods

Community

Submit ResultsAbout

Data sourced from the PWC Archive (CC-BY-SA 4.0). Built by the community, for the community.

Papers/Alleviating the Inequality of Attention Heads for Neural M...

Alleviating the Inequality of Attention Heads for Neural Machine Translation

Zewei Sun, Shu-Jian Huang, Xin-yu Dai, Jia-Jun Chen

2020-09-21COLING 2022 10Machine TranslationTranslation
PaperPDF

Abstract

Recent studies show that the attention heads in Transformer are not equal. We relate this phenomenon to the imbalance training of multi-head attention and the model dependence on specific heads. To tackle this problem, we propose a simple masking method: HeadMask, in two specific ways. Experiments show that translation improvements are achieved on multiple language pairs. Subsequent empirical analyses also support our assumption and confirm the effectiveness of the method.

Results

TaskDatasetMetricValueModel
Machine TranslationIWSLT2015 Vietnamese-EnglishBLEU26.85HeadMask (Random-18)
Machine TranslationIWSLT2015 Vietnamese-EnglishBLEU26.36HeadMask (Impt-18)
Machine TranslationWMT2016 Romanian-EnglishBLEU score32.95HeadMask (Impt-18)
Machine TranslationWMT2016 Romanian-EnglishBLEU score32.85HeadMask (Random-18)
Machine TranslationWMT2017 Turkish-EnglishBLEU score17.56HeadMask (Random-18)
Machine TranslationWMT2017 Turkish-EnglishBLEU score17.48HeadMask (Impt-18)

Related Papers

A Translation of Probabilistic Event Calculus into Markov Decision Processes2025-07-17Function-to-Style Guidance of LLMs for Code Translation2025-07-15Speak2Sign3D: A Multi-modal Pipeline for English Speech to American Sign Language Animation2025-07-09Pun Intended: Multi-Agent Translation of Wordplay with Contrastive Learning and Phonetic-Semantic Embeddings2025-07-09Unconditional Diffusion for Generative Sequential Recommendation2025-07-08GRAFT: A Graph-based Flow-aware Agentic Framework for Document-level Machine Translation2025-07-04TransLaw: Benchmarking Large Language Models in Multi-Agent Simulation of the Collaborative Translation2025-07-01CycleVAR: Repurposing Autoregressive Model for Unsupervised One-Step Image Translation2025-06-29