TasksSotADatasetsPapersMethodsSubmitAbout
Papers With Code 2

A community resource for machine learning research: papers, code, benchmarks, and state-of-the-art results.

Explore

Notable BenchmarksAll SotADatasetsPapersMethods

Community

Submit ResultsAbout

Data sourced from the PWC Archive (CC-BY-SA 4.0). Built by the community, for the community.

Methods/Spatial Gating Unit

Spatial Gating Unit

GeneralIntroduced 20009 papers
Source Paper

Description

Spatial Gating Unit, or SGU, is a gating unit used in the gMLP architecture to captures spatial interactions. To enable cross-token interactions, it is necessary for the layer s(⋅)s(\cdot)s(⋅) to contain a contraction operation over the spatial dimension. The layer s(⋅)s(\cdot)s(⋅) is formulated as the output of linear gating:

s(Z)=Z⊙f_W,b(Z)s(Z)=Z \odot f\_{W, b}(Z)s(Z)=Z⊙f_W,b(Z)

where ⊙\odot⊙ denotes element-wise multiplication. For training stability, the authors find it critical to initialize WWW as near-zero values and bbb as ones, meaning that f_W,b(Z)≈1f\_{W, b}(Z) \approx 1f_W,b(Z)≈1 and therefore s(Z)≈Zs(Z) \approx Zs(Z)≈Z at the beginning of training. This initialization ensures each gMLP block behaves like a regular FFN at the early stage of training, where each token is processed independently, and only gradually injects spatial information across tokens during the course of learning.

The authors find it further effective to split ZZZ into two independent parts (Z_1,Z_2)\left(Z\_{1}, Z\_{2}\right)(Z_1,Z_2) along the channel dimension for the gating function and for the multiplicative bypass:

s(Z)=Z_1⊙f_W,b(Z_2)s(Z)=Z\_{1} \odot f\_{W, b}\left(Z\_{2}\right)s(Z)=Z_1⊙f_W,b(Z_2)

They also normalize the input to f_W,bf\_{W, b}f_W,b which empirically improved the stability of large NLP models.

Papers Using This Method

Image Super-resolution Reconstruction Network based on Enhanced Swin Transformer via Alternating Aggregation of Local-Global Features2023-12-30SigVIC: Spatial Importance Guided Variable-Rate Image Compression2023-03-16gSwin: Gated MLP Vision Model with Hierarchical Structure of Shifted Window2022-08-24A Proposal of Multi-Layer Perceptron with Graph Gating Unit for Graph Representation Learning and its Application to Surrogate Model for FEM2022-07-11Are We Really Making Much Progress in Text Classification? A Comparative Review2022-04-08Efficient Language Modeling with Sparse all-MLP2022-03-14Convolutional Gated MLP: Combining Convolutions & gMLP2021-11-06CycleMLP: A MLP-like Architecture for Dense Prediction2021-07-21Pay Attention to MLPs2021-05-17