Attention Augmented Convolutional Networks

Irwan Bello, Barret Zoph, Ashish Vaswani, Jonathon Shlens, Quoc V. Le

2019-04-22ICCV 2019 10Image Classification General Classification object-detection Object Detection

Paper PDF Code Code Code Code Code Code Code Code Code Code Code Code Code Code

Abstract

Convolutional networks have been the paradigm of choice in many computer vision applications. The convolution operation however has a significant weakness in that it only operates on a local neighborhood, thus missing global information. Self-attention, on the other hand, has emerged as a recent advance to capture long range interactions, but has mostly been applied to sequence modeling and generative modeling tasks. In this paper, we consider the use of self-attention for discriminative visual tasks as an alternative to convolutions. We introduce a novel two-dimensional relative self-attention mechanism that proves competitive in replacing convolutions as a stand-alone computational primitive for image classification. We find in control experiments that the best results are obtained when combining both convolutions and self-attention. We therefore propose to augment convolutional operators with this self-attention mechanism by concatenating convolutional feature maps with a set of feature maps produced via self-attention. Extensive experiments show that Attention Augmentation leads to consistent improvements in image classification on ImageNet and object detection on COCO across many different models and scales, including ResNets and a state-of-the art mobile constrained network, while keeping the number of parameters similar. In particular, our method achieves a $1.3\%$ top-1 accuracy improvement on ImageNet classification over a ResNet50 baseline and outperforms other attention mechanisms for images such as Squeeze-and-Excitation. It also achieves an improvement of 1.4 mAP in COCO Object Detection on top of a RetinaNet baseline.

Results

Task	Dataset	Metric	Value	Model
Object Detection	COCO test-dev	box mAP	39.2	AA-ResNet-10 + RetinaNet
Image Classification	CIFAR-100	Percentage correct	81.6	AA-Wide-ResNet
3D	COCO test-dev	box mAP	39.2	AA-ResNet-10 + RetinaNet
2D Classification	COCO test-dev	box mAP	39.2	AA-ResNet-10 + RetinaNet
2D Object Detection	COCO test-dev	box mAP	39.2	AA-ResNet-10 + RetinaNet
16k	COCO test-dev	box mAP	39.2	AA-ResNet-10 + RetinaNet

Attention Augmented Convolutional Networks

Abstract

Results

Related Papers

Attention Augmented Convolutional Networks

Abstract

Results

Related Papers