TasksSotADatasetsPapersMethodsSubmitAbout
Papers With Code 2

A community resource for machine learning research: papers, code, benchmarks, and state-of-the-art results.

Explore

Notable BenchmarksAll SotADatasetsPapersMethods

Community

Submit ResultsAbout

Data sourced from the PWC Archive (CC-BY-SA 4.0). Built by the community, for the community.

Methods/DANet

DANet

Dual Attention Network

GeneralIntroduced 200010 papers
Source Paper

Description

In the field of scene segmentation, encoder-decoder structures cannot make use of the global relationships between objects, whereas RNN-based structures heavily rely on the output of the long-term memorization. To address the above problems, Fu et al. proposed a novel framework, the dual attention network (DANet), for natural scene image segmentation. Unlike CBAM and BAM, it adopts a self-attention mechanism instead of simply stacking convolutions to compute the spatial attention map, which enables the network to capture global information directly.

DANet uses in parallel a position attention module and a channel attention module to capture feature dependencies in spatial and channel domains. Given the input feature map XXX, convolution layers are applied first in the position attention module to obtain new feature maps. Then the position attention module selectively aggregates the features at each position using a weighted sum of features at all positions, where the weights are determined by feature similarity between corresponding pairs of positions. The channel attention module has a similar form except for dimensional reduction to model cross-channel relations. Finally the outputs from the two branches are fused to obtain final feature representations. For simplicity, we reshape the feature map XXX to C×(H×W)C\times (H \times W)C×(H×W) whereupon the overall process can be written as \begin{align} Q,\quad K,\quad V &= W_qX,\quad W_kX,\quad W_vX \end{align} \begin{align} Y^\text{pos} &= X+ V\text{Softmax}(Q^TK) \end{align} \begin{align} Y^\text{chn} &= X+ \text{Softmax}(XX^T)X \end{align} \begin{align} Y &= Y^\text{pos} + Y^\text{chn} \end{align} where WqW_qWq​, WkW_kWk​, Wv∈RC×CW_v \in \mathbb{R}^{C\times C}Wv​∈RC×C are used to generate new feature maps.

The position attention module enables DANet to capture long-range contextual information and adaptively integrate similar features at any scale from a global viewpoint, while the channel attention module is responsible for enhancing useful channels as well as suppressing noise. Taking spatial and channel relationships into consideration explicitly improves the feature representation for scene segmentation. However, it is computationally costly, especially for large input feature maps.

Papers Using This Method

ECG Arrhythmia Detection Using Disease-specific Attention-based Deep Learning Model2024-07-25Rethinking Residual Connection in Training Large-Scale Spiking Neural Networks2023-11-09Distractor-aware Event-based Tracking2023-10-22Improving Deep Attractor Network by BGRU and GMM for Speech Separation2023-08-07Monocular Depth Distribution Alignment with Low Computation2022-03-09DANets: Deep Abstract Networks for Tabular Data Classification and Regression2021-12-06Invertible Denoising Network: A Light Solution for Real Noise Removal2021-04-21Multi-Attention-Network for Semantic Segmentation of Fine Resolution Remote Sensing Images2020-09-03Attention Scaling for Crowd Counting2020-06-01Dual Attention Network for Scene Segmentation2018-09-09