TasksSotADatasetsPapersMethodsSubmitAbout
Papers With Code 2

A community resource for machine learning research: papers, code, benchmarks, and state-of-the-art results.

Explore

Notable BenchmarksAll SotADatasetsPapersMethods

Community

Submit ResultsAbout

Data sourced from the PWC Archive (CC-BY-SA 4.0). Built by the community, for the community.

Methods/Spatial Attention Module (ThunderNet)

Spatial Attention Module (ThunderNet)

Computer VisionIntroduced 20004 papers
Source Paper

Description

Spatial Attention Module (SAM) is a feature extraction module for object detection used in ThunderNet.

The ThunderNet SAM explicitly re-weights the feature map before RoI warping over the spatial dimensions. The key idea of SAM is to use the knowledge from RPN to refine the feature distribution of the feature map. RPN is trained to recognize foreground regions under the supervision of ground truths. Therefore, the intermediate features in RPN can be used to distinguish foreground features from background features. SAM accepts two inputs: the intermediate feature map from RPN FRPN\mathcal{F}^{RPN}FRPN and the thin feature map from the Context Enhancement Module FCEM\mathcal{F}^{CEM}FCEM. The output of SAM FSAM\mathcal{F}^{SAM}FSAM is defined as:

FSAM=FCEM∗sigmoid(θ(FRPN))\mathcal{F}^{SAM} = \mathcal{F}^{CEM} * \text{sigmoid}\left(\theta\left(\mathcal{F}^{RPN}\right)\right)FSAM=FCEM∗sigmoid(θ(FRPN))

Here θ(⋅)\theta\left(·\right)θ(⋅) is a dimension transformation to match the number of channels in both feature maps. The sigmoid function is used to constrain the values within [0,1]\left[0, 1\right][0,1]. At last, FCEM\mathcal{F}^{CEM}FCEM is re-weighted by the generated feature map for better feature distribution. For computational efficiency, we simply apply a 1×1 convolution as θ(⋅)\theta\left(·\right)θ(⋅), so the computational cost of CEM is negligible. The Figure to the right shows the structure of SAM.

SAM has two functions. The first one is to refine the feature distribution by strengthening foreground features and suppressing background features. The second one is to stabilize the training of RPN as SAM enables extra gradient flow from R-CNN subnet to RPN. As a result, RPN receives additional supervision from RCNN subnet, which helps the training of RPN.

Papers Using This Method

Real Time Egocentric Segmentation for Video-self Avatar in Mixed Reality2022-07-04Egocentric Human Segmentation for Mixed Reality2020-05-25ThunderNet: Towards Real-Time Generic Object Detection on Mobile Devices2019-10-01ThunderNet: Towards Real-time Generic Object Detection2019-03-28