TasksSotADatasetsPapersMethodsSubmitAbout
Papers With Code 2

A community resource for machine learning research: papers, code, benchmarks, and state-of-the-art results.

Explore

Notable BenchmarksAll SotADatasetsPapersMethods

Community

Submit ResultsAbout

Data sourced from the PWC Archive (CC-BY-SA 4.0). Built by the community, for the community.

Papers/Pay Less Attention with Lightweight and Dynamic Convolutions

Pay Less Attention with Lightweight and Dynamic Convolutions

Felix Wu, Angela Fan, Alexei Baevski, Yann N. Dauphin, Michael Auli

2019-01-29ICLR 2019 5Machine TranslationAbstractive Text SummarizationTranslationLanguage Modelling
PaperPDFCode(official)CodeCode

Abstract

Self-attention is a useful mechanism to build generative models for language and images. It determines the importance of context elements by comparing each element to the current time step. In this paper, we show that a very lightweight convolution can perform competitively to the best reported self-attention results. Next, we introduce dynamic convolutions which are simpler and more efficient than self-attention. We predict separate convolution kernels based solely on the current time-step in order to determine the importance of context elements. The number of operations required by this approach scales linearly in the input length, whereas self-attention is quadratic. Experiments on large-scale machine translation, language modeling and abstractive summarization show that dynamic convolutions improve over strong self-attention models. On the WMT'14 English-German test set dynamic convolutions achieve a new state of the art of 29.7 BLEU.

Results

TaskDatasetMetricValueModel
Machine TranslationIWSLT2014 German-EnglishBLEU score35.2DynamicConv
Machine TranslationIWSLT2014 German-EnglishBLEU score34.8LightConv
Machine TranslationWMT2014 English-GermanBLEU score29.7DynamicConv
Machine TranslationWMT2014 English-GermanBLEU score28.9LightConv
Machine TranslationWMT 2017 English-ChineseBLEU score24.4DynamicConv
Machine TranslationWMT 2017 English-ChineseBLEU score24.3LightConv
Machine TranslationWMT2014 English-FrenchBLEU score43.2DynamicConv
Machine TranslationWMT2014 English-FrenchBLEU score43.1LightConv
Language ModellingOne Billion WordPPL26.67DynamicConv
Text SummarizationCNN / Daily MailROUGE-139.84Dynamic Conv
Text SummarizationCNN / Daily MailROUGE-216.25Dynamic Conv
Text SummarizationCNN / Daily MailROUGE-L36.73Dynamic Conv
Text SummarizationCNN / Daily MailROUGE-139.84DynamicConv
Text SummarizationCNN / Daily MailROUGE-216.25DynamicConv
Text SummarizationCNN / Daily MailROUGE-L36.73DynamicConv
Text SummarizationCNN / Daily MailROUGE-139.52LightConv
Text SummarizationCNN / Daily MailROUGE-215.97LightConv
Text SummarizationCNN / Daily MailROUGE-L36.51LightConv
Abstractive Text SummarizationCNN / Daily MailROUGE-139.84Dynamic Conv
Abstractive Text SummarizationCNN / Daily MailROUGE-216.25Dynamic Conv
Abstractive Text SummarizationCNN / Daily MailROUGE-L36.73Dynamic Conv
Document SummarizationCNN / Daily MailROUGE-139.84DynamicConv
Document SummarizationCNN / Daily MailROUGE-216.25DynamicConv
Document SummarizationCNN / Daily MailROUGE-L36.73DynamicConv
Document SummarizationCNN / Daily MailROUGE-139.52LightConv
Document SummarizationCNN / Daily MailROUGE-215.97LightConv
Document SummarizationCNN / Daily MailROUGE-L36.51LightConv

Related Papers

Visual-Language Model Knowledge Distillation Method for Image Quality Assessment2025-07-21A Translation of Probabilistic Event Calculus into Markov Decision Processes2025-07-17Making Language Model a Hierarchical Classifier and Generator2025-07-17VisionThink: Smart and Efficient Vision Language Model via Reinforcement Learning2025-07-17The Generative Energy Arena (GEA): Incorporating Energy Awareness in Large Language Model (LLM) Human Evaluations2025-07-17Inverse Reinforcement Learning Meets Large Language Model Post-Training: Basics, Advances, and Opportunities2025-07-17Assay2Mol: large language model-based drug design using BioAssay context2025-07-16Describe Anything Model for Visual Question Answering on Text-rich Images2025-07-16