TasksSotADatasetsPapersMethodsSubmitAbout
Papers With Code 2

A community resource for machine learning research: papers, code, benchmarks, and state-of-the-art results.

Explore

Notable BenchmarksAll SotADatasetsPapersMethods

Community

Submit ResultsAbout

Data sourced from the PWC Archive (CC-BY-SA 4.0). Built by the community, for the community.

Methods/Shuffle-T

Shuffle-T

Shuffle Transformer

Computer VisionIntroduced 20001 papers
Source Paper

Description

The Shuffle Transformer Block consists of the Shuffle Multi-Head Self-Attention module (ShuffleMHSA), the Neighbor-Window Connection module (NWC), and the MLP module. To introduce cross-window connections while maintaining the efficient computation of non-overlapping windows, a strategy which alternates between WMSA and Shuffle-WMSA in consecutive Shuffle Transformer blocks is proposed. The first window-based transformer block uses regular window partition strategy and the second window-based transformer block uses window-based selfattention with spatial shuffle. Besides, the Neighbor-Window Connection moduel (NWC) is added into each block for enhancing connections among neighborhood windows. Thus the proposed shuffle transformer block could build rich cross-window connections and augments representation. Finally, the consecutive Shuffle Transformer blocks are computed as:

xl=WMSA(BN(zl−1))+zl−1x^{l}=\mathbf{W M S A}\left(\mathbf{B N}\left(z^{l-1}\right)\right)+z^{l-1}xl=WMSA(BN(zl−1))+zl−1

yl=NWC(xl)+xly^{l}=\mathbf{N W C}\left(x^{l}\right)+x^{l}yl=NWC(xl)+xl

zl=MLP(BN(yl))+ylz^{l}=\mathbf{M L P}\left(\mathbf{B N}\left(y^{l}\right)\right)+y^{l}zl=MLP(BN(yl))+yl

xl+1=Shuffle−WMSA(BN(zl))+zlx^{l+1}=\mathbf{S h u f f l e - W M S A}\left(\mathbf{B N}\left(z^{l}\right)\right)+z^{l}xl+1=Shuffle−WMSA(BN(zl))+zl

yl+1=NWC(xl+1)+xl+1y^{l+1}=\mathbf{N W C}\left(x^{l+1}\right)+x^{l+1}yl+1=NWC(xl+1)+xl+1

zl+1=MLP(BN(yl+1))+yl+1z^{l+1}=\mathbf{M L P}\left(\mathbf{B N}\left(y^{l+1}\right)\right)+y^{l+1}zl+1=MLP(BN(yl+1))+yl+1

where xlx^lxl, yly^lyl and zlz^lzl denote the output features of the (Shuffle-)WMSA module, the Neighbor-Window Connection module and the MLP module for block lll, respectively; WMSA and Shuffle-WMSA denote window-based multi-head self-attention without/with spatial shuffle, respectively.

Papers Using This Method

Shuffle Transformer: Rethinking Spatial Shuffle for Vision Transformer2021-06-07