Description
Mixed Attention Block is an attention module used in the ConvBERT architecture. It is a mixture of self-attention and span-based dynamic convolution (highlighted in pink). They share the same Query but use different Key to generate the attention map and convolution kernel respectively. The number of attention heads is reducing by directly projecting the input to a smaller embedding space to form a bottleneck structure for self-attention and span-based dynamic convolution. Dimensions of the input and output of some blocks are labeled on the left top corner to illustrate the overall framework, where is the embedding size of the input and is the reduction ratio.
Papers Using This Method
Beyond Simple Concatenation: Fairly Assessing PLM Architectures for Multi-Chain Protein-Protein Interactions Prediction2025-05-26Navigating Nuance: In Quest for Political Truth2025-01-01ChatGPT v.s. Media Bias: A Comparative Study of GPT-3.5 and Fine-tuned Language Models2024-03-29M$^3$Net: Multilevel, Mixed and Multistage Attention Network for Salient Object Detection2023-09-15Transformer Based Punctuation Restoration for Turkish2023-09-15ConvBERT: Improving BERT with Span-based Dynamic Convolution2020-08-06