Papers With Code 2 | ML Benchmarks, SotA Results & Code

Description

GCT first collects global information by computing the l2-norm of each channel. Next, a learnable vector $\alpha$ is applied to scale the feature. Then a competition mechanism is adopted by channel normalization to interact between channels.

Unlike previous methods, GCT first collects global information by computing the $l_{2}$ -norm of each channel. Next, a learnable vector $\alpha$ is applied to scale the feature. Then a competition mechanism is adopted by channel normalization to interact between channels. Like other common normalization methods, a learnable scale parameter $\gamma$ and bias $\beta$ are applied to rescale the normalization. However, unlike previous methods, GCT adopts tanh activation to control the attention vector. Finally, it not only multiplies the input by the attention vector but also adds an identity connection. GCT can be written as: \begin{align} s = F_\text{gct}(X, \theta) & = \tanh (\gamma CN(\alpha \text{Norm}(X)) + \beta) \end{align} \begin{align} Y & = s X + X \end{align}

where $\alpha$ , $\beta$ and $\gamma$ are trainable parameters. $\text{Norm}(\cdot)$ indicates the $L2$ -norm of each channel. $CN$ is channel normalization.

A GCT block has fewer parameters than an SE block, and as it is lightweight, can be added after each convolutional layer of a CNN.

Description

where $\alpha$ , $\beta$ and $\gamma$ are trainable parameters. $\text{Norm}(\cdot)$ indicates the $L2$ -norm of each channel. $CN$ is channel normalization.

A GCT block has fewer parameters than an SE block, and as it is lightweight, can be added after each convolutional layer of a CNN.

GCT

Description

Papers Using This Method

GCT

Description

Papers Using This Method