Papers With Code 2 | ML Benchmarks, SotA Results & Code

Description

Attention-augmented Convolution is a type of convolution with a two-dimensional relative self-attention mechanism that can replace convolutions as a stand-alone computational primitive for image classification. It employs scaled-dot product attention and multi-head attention as with Transformers.

It works by concatenating convolutional and attentional feature map. To see this, consider an original convolution operator with kernel size $k$ , $F\_{in}$ input filters and $F\_{out}$ output filters. The corresponding attention augmented convolution can be written as"

$\text{AAConv}\left(X\right) = \text{Concat}\left[\text{Conv}(X), \text{MHA}(X)\right]$

$X$ originates from an input tensor of shape $\left(H, W, F\_{in}\right)$ . This is flattened to become $X \in \mathbb{R}^{HW \times F\_{in}}$ which is passed into a multi-head attention module, as well as a convolution (see above).

Similarly to the convolution, the attention augmented convolution 1) is equivariant to translation and 2) can readily operate on inputs of different spatial dimensions.

Description

$\text{AAConv}\left(X\right) = \text{Concat}\left[\text{Conv}(X), \text{MHA}(X)\right]$

Similarly to the convolution, the attention augmented convolution 1) is equivariant to translation and 2) can readily operate on inputs of different spatial dimensions.

Attention-augmented Convolution

Description

Papers Using This Method

Attention-augmented Convolution

Description

Papers Using This Method