TasksSotADatasetsPapersMethodsSubmitAbout
Papers With Code 2

A community resource for machine learning research: papers, code, benchmarks, and state-of-the-art results.

Explore

Notable BenchmarksAll SotADatasetsPapersMethods

Community

Submit ResultsAbout

Data sourced from the PWC Archive (CC-BY-SA 4.0). Built by the community, for the community.

Methods/gMLP

gMLP

Computer VisionIntroduced 20007 papers
Source Paper

Description

gMLP is an MLP-based alternative to Transformers without self-attention, which simply consists of channel projections and spatial projections with static parameterization. It is built out of basic MLP layers with gating. The model consists of a stack of LLL blocks with identical size and structure. Let X∈Rn×dX \in \mathbb{R}^{n \times d}X∈Rn×d be the token representations with sequence length nnn and dimension ddd. Each block is defined as:

Z=σ(XU),Z~=s(Z),Y=Z~VZ=\sigma(X U), \quad \tilde{Z}=s(Z), \quad Y=\tilde{Z} VZ=σ(XU),Z~=s(Z),Y=Z~V

where σ\sigmaσ is an activation function such as GeLU. UUU and VVV define linear projections along the channel dimension - the same as those in the FFNs of Transformers (e.g., their shapes are 768×3072768 \times 3072768×3072 and 3072×7683072 \times 7683072×768 for BERTbase \text{BERT}_{\text {base }}BERTbase ​).

A key ingredient is s(⋅)s(\cdot)s(⋅), a layer which captures spatial interactions. When sss is an identity mapping, the above transformation degenerates to a regular FFN, where individual tokens are processed independently without any cross-token communication. One of the major focuses is therefore to design a good sss capable of capturing complex spatial interactions across tokens. This leads to the use of a Spatial Gating Unit which involves a modified linear gating.

The overall block layout is inspired by inverted bottlenecks, which define s(⋅)s(\cdot)s(⋅) as a spatial depthwise convolution. Note, unlike Transformers, gMLP does not require position embeddings because such information will be captured in s(⋅)s(\cdot)s(⋅).

Papers Using This Method

gSwin: Gated MLP Vision Model with Hierarchical Structure of Shifted Window2022-08-24A Proposal of Multi-Layer Perceptron with Graph Gating Unit for Graph Representation Learning and its Application to Surrogate Model for FEM2022-07-11Are We Really Making Much Progress in Text Classification? A Comparative Review2022-04-08Efficient Language Modeling with Sparse all-MLP2022-03-14Convolutional Gated MLP: Combining Convolutions & gMLP2021-11-06CycleMLP: A MLP-like Architecture for Dense Prediction2021-07-21Pay Attention to MLPs2021-05-17