TasksSotADatasetsPapersMethodsSubmitAbout
Papers With Code 2

A community resource for machine learning research: papers, code, benchmarks, and state-of-the-art results.

Explore

Notable BenchmarksAll SotADatasetsPapersMethods

Community

Submit ResultsAbout

Data sourced from the PWC Archive (CC-BY-SA 4.0). Built by the community, for the community.

Papers/Lessons on Parameter Sharing across Layers in Transformers

Lessons on Parameter Sharing across Layers in Transformers

Sho Takase, Shun Kiyono

2021-04-13Machine Translation
PaperPDFCode(official)Code

Abstract

We propose a parameter sharing method for Transformers (Vaswani et al., 2017). The proposed approach relaxes a widely used technique, which shares parameters for one layer with all layers such as Universal Transformers (Dehghani et al., 2019), to increase the efficiency in the computational time. We propose three strategies: Sequence, Cycle, and Cycle (rev) to assign parameters to each layer. Experimental results show that the proposed strategies are efficient in the parameter size and computational time. Moreover, we indicate that the proposed strategies are also effective in the configuration where we use many training data such as the recent WMT competition.

Results

TaskDatasetMetricValueModel
Machine TranslationWMT2014 English-GermanBLEU score35.14Transformer Cycle (Rev)
Machine TranslationWMT2014 English-GermanSacreBLEU33.54Transformer Cycle (Rev)

Related Papers

Speak2Sign3D: A Multi-modal Pipeline for English Speech to American Sign Language Animation2025-07-09Pun Intended: Multi-Agent Translation of Wordplay with Contrastive Learning and Phonetic-Semantic Embeddings2025-07-09GRAFT: A Graph-based Flow-aware Agentic Framework for Document-level Machine Translation2025-07-04TransLaw: Benchmarking Large Language Models in Multi-Agent Simulation of the Collaborative Translation2025-07-01Enhancing Automatic Term Extraction with Large Language Models via Syntactic Retrieval2025-06-26Intrinsic vs. Extrinsic Evaluation of Czech Sentence Embeddings: Semantic Relevance Doesn't Help with MT Evaluation2025-06-25CycleDistill: Bootstrapping Machine Translation using LLMs with Cyclical Distillation2025-06-24Has Machine Translation Evaluation Achieved Human Parity? The Human Reference and the Limits of Progress2025-06-24