TasksSotADatasetsPapersMethodsSubmitAbout
Papers With Code 2

A community resource for machine learning research: papers, code, benchmarks, and state-of-the-art results.

Explore

Notable BenchmarksAll SotADatasetsPapersMethods

Community

Submit ResultsAbout

Data sourced from the PWC Archive (CC-BY-SA 4.0). Built by the community, for the community.

Papers/Multi-branch Attentive Transformer

Multi-branch Attentive Transformer

Yang Fan, Shufang Xie, Yingce Xia, Lijun Wu, Tao Qin, Xiang-Yang Li, Tie-Yan Liu

2020-06-18Machine TranslationNatural Language UnderstandingTranslationCode Generation
PaperPDFCode

Abstract

While the multi-branch architecture is one of the key ingredients to the success of computer vision tasks, it has not been well investigated in natural language processing, especially sequence learning tasks. In this work, we propose a simple yet effective variant of Transformer called multi-branch attentive Transformer (briefly, MAT), where the attention layer is the average of multiple branches and each branch is an independent multi-head attention layer. We leverage two training techniques to regularize the training: drop-branch, which randomly drops individual branches during training, and proximal initialization, which uses a pre-trained Transformer model to initialize multiple branches. Experiments on machine translation, code generation and natural language understanding demonstrate that such a simple variant of Transformer brings significant improvements. Our code is available at \url{https://github.com/HA-Transformer}.

Results

TaskDatasetMetricValueModel
Machine TranslationIWSLT2014 German-EnglishBLEU score36.22MAT
Machine TranslationWMT2014 English-GermanSacreBLEU29.9MAT

Related Papers

CUDA-L1: Improving CUDA Optimization via Contrastive Reinforcement Learning2025-07-18A Translation of Probabilistic Event Calculus into Markov Decision Processes2025-07-17Towards Formal Verification of LLM-Generated Code from Natural Language Prompts2025-07-17MERA Code: A Unified Framework for Evaluating Code Generation Across Tasks2025-07-16Scaling Up RL: Unlocking Diverse Reasoning in LLMs via Prolonged Training2025-07-16Function-to-Style Guidance of LLMs for Code Translation2025-07-15The Devil behind the mask: An emergent safety vulnerability of Diffusion LLMs2025-07-15Vision Language Action Models in Robotic Manipulation: A Systematic Review2025-07-14