Shuffle Transformer: Rethinking Spatial Shuffle for Vision Transformer

Zilong Huang, Youcheng Ben, Guozhong Luo, Pei Cheng, Gang Yu, Bin Fu

2021-06-07Image Classification Segmentation Semantic Segmentation object-detection Object Detection

Abstract

Very recently, Window-based Transformers, which computed self-attention within non-overlapping local windows, demonstrated promising results on image classification, semantic segmentation, and object detection. However, less study has been devoted to the cross-window connection which is the key element to improve the representation ability. In this work, we revisit the spatial shuffle as an efficient way to build connections among windows. As a result, we propose a new vision transformer, named Shuffle Transformer, which is highly efficient and easy to implement by modifying two lines of code. Furthermore, the depth-wise convolution is introduced to complement the spatial shuffle for enhancing neighbor-window connections. The proposed architectures achieve excellent performance on a wide range of visual tasks including image-level classification, object detection, and semantic segmentation. Code will be released for reproduction.

Results

Task	Dataset	Metric	Value	Model
Semantic Segmentation	ADE20K val	mIoU	50.5	UperNet Shuffle-B
Semantic Segmentation	ADE20K val	mIoU	49.6	UperNet Shuffle-S
Semantic Segmentation	ADE20K val	mIoU	47.6	UperNet Shuffle-T
Semantic Segmentation	ADE20K	Validation mIoU	50.5	UperNet Shuffle-B
Semantic Segmentation	ADE20K	Validation mIoU	47.6	UperNet Shuffle-T
10-shot image generation	ADE20K val	mIoU	50.5	UperNet Shuffle-B
10-shot image generation	ADE20K val	mIoU	49.6	UperNet Shuffle-S
10-shot image generation	ADE20K val	mIoU	47.6	UperNet Shuffle-T
10-shot image generation	ADE20K	Validation mIoU	50.5	UperNet Shuffle-B
10-shot image generation	ADE20K	Validation mIoU	47.6	UperNet Shuffle-T

Related Papers

SeC: Advancing Complex Video Object Segmentation via Progressive Concept Construction2025-07-21 Automatic Classification and Segmentation of Tunnel Cracks Based on Deep Learning and Visual Explanations2025-07-18 Adversarial attacks to image classification systems using evolutionary algorithms2025-07-17 Efficient Adaptation of Pre-trained Vision Transformer underpinned by Approximately Orthogonal Fine-Tuning Strategy2025-07-17 Federated Learning for Commercial Image Sources2025-07-17 MUPAX: Multidimensional Problem Agnostic eXplainable AI2025-07-17 Deep Learning-Based Fetal Lung Segmentation from Diffusion-weighted MRI Images and Lung Maturity Evaluation for Fetal Growth Restriction2025-07-17 DiffOSeg: Omni Medical Image Segmentation via Multi-Expert Collaboration Diffusion Model2025-07-17