TasksSotADatasetsPapersMethodsSubmitAbout
Papers With Code 2

A community resource for machine learning research: papers, code, benchmarks, and state-of-the-art results.

Explore

Notable BenchmarksAll SotADatasetsPapersMethods

Community

Submit ResultsAbout

Data sourced from the PWC Archive (CC-BY-SA 4.0). Built by the community, for the community.

Papers/gSwin: Gated MLP Vision Model with Hierarchical Structure ...

gSwin: Gated MLP Vision Model with Hierarchical Structure of Shifted Window

Mocho Go, Hideyuki Tachibana

2022-08-24Image ClassificationSemantic SegmentationInstance Segmentationobject-detectionObject Detection
PaperPDF

Abstract

Following the success in language domain, the self-attention mechanism (transformer) is adopted in the vision domain and achieving great success recently. Additionally, as another stream, multi-layer perceptron (MLP) is also explored in the vision domain. These architectures, other than traditional CNNs, have been attracting attention recently, and many methods have been proposed. As one that combines parameter efficiency and performance with locality and hierarchy in image recognition, we propose gSwin, which merges the two streams; Swin Transformer and (multi-head) gMLP. We showed that our gSwin can achieve better accuracy on three vision tasks, image classification, object detection and semantic segmentation, than Swin Transformer, with smaller model size.

Results

TaskDatasetMetricValueModel
Semantic SegmentationADE20K valPixel Accuracy83.43gSwin-S
Semantic SegmentationADE20K valmIoU49.69gSwin-S
Semantic SegmentationADE20K valPixel Accuracy82.6gSwin-T
Semantic SegmentationADE20K valmIoU47.63gSwin-T
Semantic SegmentationADE20K valPixel Accuracy81.79gSwin-VT
Semantic SegmentationADE20K valmIoU45.07gSwin-VT
Image ClassificationImageNetGFLOPs7gSwin-S
Image ClassificationImageNetGFLOPs3.6gSwin-T
Image ClassificationImageNetGFLOPs2.3gSwin-VT
Instance SegmentationCOCO test-devmask AP45.03gSwin-S
Instance SegmentationCOCO test-devmask AP44.16gSwin-T
Instance SegmentationCOCO test-devmask AP42.87gSwin-VT
10-shot image generationADE20K valPixel Accuracy83.43gSwin-S
10-shot image generationADE20K valmIoU49.69gSwin-S
10-shot image generationADE20K valPixel Accuracy82.6gSwin-T
10-shot image generationADE20K valmIoU47.63gSwin-T
10-shot image generationADE20K valPixel Accuracy81.79gSwin-VT
10-shot image generationADE20K valmIoU45.07gSwin-VT

Related Papers

SeC: Advancing Complex Video Object Segmentation via Progressive Concept Construction2025-07-21Automatic Classification and Segmentation of Tunnel Cracks Based on Deep Learning and Visual Explanations2025-07-18Adversarial attacks to image classification systems using evolutionary algorithms2025-07-17Efficient Adaptation of Pre-trained Vision Transformer underpinned by Approximately Orthogonal Fine-Tuning Strategy2025-07-17Federated Learning for Commercial Image Sources2025-07-17MUPAX: Multidimensional Problem Agnostic eXplainable AI2025-07-17DiffOSeg: Omni Medical Image Segmentation via Multi-Expert Collaboration Diffusion Model2025-07-17SCORE: Scene Context Matters in Open-Vocabulary Remote Sensing Instance Segmentation2025-07-17