TasksSotADatasetsPapersMethodsSubmitAbout
Papers With Code 2

A community resource for machine learning research: papers, code, benchmarks, and state-of-the-art results.

Explore

Notable BenchmarksAll SotADatasetsPapersMethods

Community

Submit ResultsAbout

Data sourced from the PWC Archive (CC-BY-SA 4.0). Built by the community, for the community.

Papers/Time-aware Large Kernel Convolutions

Time-aware Large Kernel Convolutions

Vasileios Lioutas, Yuhong Guo

2020-02-08ICML 2020 1Machine TranslationDocument SummarizationTranslationLanguage Modelling
PaperPDFCode(official)

Abstract

To date, most state-of-the-art sequence modeling architectures use attention to build generative models for language based tasks. Some of these models use all the available sequence tokens to generate an attention distribution which results in time complexity of $O(n^2)$. Alternatively, they utilize depthwise convolutions with softmax normalized kernels of size $k$ acting as a limited-window self-attention, resulting in time complexity of $O(k{\cdot}n)$. In this paper, we introduce Time-aware Large Kernel (TaLK) Convolutions, a novel adaptive convolution operation that learns to predict the size of a summation kernel instead of using a fixed-sized kernel matrix. This method yields a time complexity of $O(n)$, effectively making the sequence encoding process linear to the number of tokens. We evaluate the proposed method on large-scale standard machine translation, abstractive summarization and language modeling datasets and show that TaLK Convolutions constitute an efficient improvement over other attention/convolution based approaches.

Results

TaskDatasetMetricValueModel
Machine TranslationIWSLT2014 German-EnglishBLEU score35.5TaLK Convolutions
Machine TranslationWMT2014 English-GermanBLEU score29.6TaLK Convolutions
Machine TranslationWMT2014 English-FrenchBLEU score43.2TaLK Convolutions
Language ModellingWikiText-103Test perplexity23.3TaLK Convolutions
Text SummarizationCNN / Daily MailROUGE-140.59TaLK Convolutions (Deep)
Text SummarizationCNN / Daily MailROUGE-218.97TaLK Convolutions (Deep)
Text SummarizationCNN / Daily MailROUGE-L36.81TaLK Convolutions (Deep)
Text SummarizationCNN / Daily MailROUGE-140.03TaLK Convolutions (Standard)
Text SummarizationCNN / Daily MailROUGE-218.45TaLK Convolutions (Standard)
Text SummarizationCNN / Daily MailROUGE-L36.13TaLK Convolutions (Standard)
Document SummarizationCNN / Daily MailROUGE-140.59TaLK Convolutions (Deep)
Document SummarizationCNN / Daily MailROUGE-218.97TaLK Convolutions (Deep)
Document SummarizationCNN / Daily MailROUGE-L36.81TaLK Convolutions (Deep)
Document SummarizationCNN / Daily MailROUGE-140.03TaLK Convolutions (Standard)
Document SummarizationCNN / Daily MailROUGE-218.45TaLK Convolutions (Standard)
Document SummarizationCNN / Daily MailROUGE-L36.13TaLK Convolutions (Standard)

Related Papers

Visual-Language Model Knowledge Distillation Method for Image Quality Assessment2025-07-21A Translation of Probabilistic Event Calculus into Markov Decision Processes2025-07-17Making Language Model a Hierarchical Classifier and Generator2025-07-17VisionThink: Smart and Efficient Vision Language Model via Reinforcement Learning2025-07-17The Generative Energy Arena (GEA): Incorporating Energy Awareness in Large Language Model (LLM) Human Evaluations2025-07-17Inverse Reinforcement Learning Meets Large Language Model Post-Training: Basics, Advances, and Opportunities2025-07-17Assay2Mol: large language model-based drug design using BioAssay context2025-07-16Describe Anything Model for Visual Question Answering on Text-rich Images2025-07-16