Description
ConvBERT is a modification on the BERT architecture which uses a span-based dynamic convolution to replace self-attention heads to directly model local dependencies. Specifically a new mixed attention module replaces the self-attention modules in BERT, which leverages the advantages of convolution to better capture local dependency. Additionally, a new span-based dynamic convolution operation is used to utilize multiple input tokens to dynamically generate the convolution kernel. Lastly, ConvBERT also incorporates some new model designs including the bottleneck attention and grouped linear operator for the feed-forward module (reducing the number of parameters).
Papers Using This Method
Beyond Simple Concatenation: Fairly Assessing PLM Architectures for Multi-Chain Protein-Protein Interactions Prediction2025-05-26Navigating Nuance: In Quest for Political Truth2025-01-01ChatGPT v.s. Media Bias: A Comparative Study of GPT-3.5 and Fine-tuned Language Models2024-03-29Transformer Based Punctuation Restoration for Turkish2023-09-15ConvBERT: Improving BERT with Span-based Dynamic Convolution2020-08-06