TasksSotADatasetsPapersMethodsSubmitAbout
Papers With Code 2

A community resource for machine learning research: papers, code, benchmarks, and state-of-the-art results.

Explore

Notable BenchmarksAll SotADatasetsPapersMethods

Community

Submit ResultsAbout

Data sourced from the PWC Archive (CC-BY-SA 4.0). Built by the community, for the community.

Papers/Fast and Simple Mixture of Softmaxes with BPE and Hybrid-L...

Fast and Simple Mixture of Softmaxes with BPE and Hybrid-LightRNN for Language Generation

Xiang Kong, Qizhe Xie, Zihang Dai, Eduard Hovy

2018-09-25Machine TranslationText GenerationTranslationImage Captioning
PaperPDFCode

Abstract

Mixture of Softmaxes (MoS) has been shown to be effective at addressing the expressiveness limitation of Softmax-based models. Despite the known advantage, MoS is practically sealed by its large consumption of memory and computational time due to the need of computing multiple Softmaxes. In this work, we set out to unleash the power of MoS in practical applications by investigating improved word coding schemes, which could effectively reduce the vocabulary size and hence relieve the memory and computation burden. We show both BPE and our proposed Hybrid-LightRNN lead to improved encoding mechanisms that can halve the time and memory consumption of MoS without performance losses. With MoS, we achieve an improvement of 1.5 BLEU scores on IWSLT 2014 German-to-English corpus and an improvement of 0.76 CIDEr score on image captioning. Moreover, on the larger WMT 2014 machine translation dataset, our MoS-boosted Transformer yields 29.5 BLEU score for English-to-German and 42.1 BLEU score for English-to-French, outperforming the single-Softmax Transformer by 0.8 and 0.4 BLEU scores respectively and achieving the state-of-the-art result on WMT 2014 English-to-German task.

Results

TaskDatasetMetricValueModel
Machine TranslationWMT2014 English-GermanBLEU score29.6Transformer Big + MoS
Machine TranslationWMT2014 English-FrenchBLEU score42.1Transformer Big + MoS

Related Papers

Making Language Model a Hierarchical Classifier and Generator2025-07-17A Translation of Probabilistic Event Calculus into Markov Decision Processes2025-07-17Mitigating Object Hallucinations via Sentence-Level Early Intervention2025-07-16Language-Guided Contrastive Audio-Visual Masked Autoencoder with Automatically Generated Audio-Visual-Text Triplets from Videos2025-07-16The Devil behind the mask: An emergent safety vulnerability of Diffusion LLMs2025-07-15Seq vs Seq: An Open Suite of Paired Encoders and Decoders2025-07-15Hashed Watermark as a Filter: Defeating Forging and Overwriting Attacks in Weight-based Neural Network Watermarking2025-07-15Function-to-Style Guidance of LLMs for Code Translation2025-07-15