TasksSotADatasetsPapersMethodsSubmitAbout
Papers With Code 2

A community resource for machine learning research: papers, code, benchmarks, and state-of-the-art results.

Explore

Notable BenchmarksAll SotADatasetsPapersMethods

Community

Submit ResultsAbout

Data sourced from the PWC Archive (CC-BY-SA 4.0). Built by the community, for the community.

Papers/Neighborhood Attention Transformer

Neighborhood Attention Transformer

Ali Hassani, Steven Walton, Jiachen Li, Shen Li, Humphrey Shi

2022-04-14CVPR 2023 1Image ClassificationSemantic SegmentationObject Detection
PaperPDFCode(official)CodeCode(official)CodeCode

Abstract

We present Neighborhood Attention (NA), the first efficient and scalable sliding-window attention mechanism for vision. NA is a pixel-wise operation, localizing self attention (SA) to the nearest neighboring pixels, and therefore enjoys a linear time and space complexity compared to the quadratic complexity of SA. The sliding-window pattern allows NA's receptive field to grow without needing extra pixel shifts, and preserves translational equivariance, unlike Swin Transformer's Window Self Attention (WSA). We develop NATTEN (Neighborhood Attention Extension), a Python package with efficient C++ and CUDA kernels, which allows NA to run up to 40% faster than Swin's WSA while using up to 25% less memory. We further present Neighborhood Attention Transformer (NAT), a new hierarchical transformer design based on NA that boosts image classification and downstream vision performance. Experimental results on NAT are competitive; NAT-Tiny reaches 83.2% top-1 accuracy on ImageNet, 51.4% mAP on MS-COCO and 48.4% mIoU on ADE20K, which is 1.9% ImageNet accuracy, 1.0% COCO mAP, and 2.6% ADE20K mIoU improvement over a Swin model with similar size. To support more research based on sliding-window attention, we open source our project and release our checkpoints at: https://github.com/SHI-Labs/Neighborhood-Attention-Transformer .

Results

TaskDatasetMetricValueModel
Semantic SegmentationADE20KGFLOPs (512 x 512)1137NAT-Base
Semantic SegmentationADE20KParams (M)123NAT-Base
Semantic SegmentationADE20KValidation mIoU49.7NAT-Base
Semantic SegmentationADE20KGFLOPs (512 x 512)1010NAT-Small
Semantic SegmentationADE20KParams (M)82NAT-Small
Semantic SegmentationADE20KValidation mIoU49.5NAT-Small
Semantic SegmentationADE20KGFLOPs (512 x 512)934NAT-Tiny
Semantic SegmentationADE20KParams (M)58NAT-Tiny
Semantic SegmentationADE20KValidation mIoU48.4NAT-Tiny
Semantic SegmentationADE20KGFLOPs (512 x 512)900NAT-Mini
Semantic SegmentationADE20KParams (M)50NAT-Mini
Semantic SegmentationADE20KValidation mIoU46.4NAT-Mini
Image ClassificationImageNetGFLOPs13.7NAT-Base
Image ClassificationImageNetGFLOPs7.8NAT-Small
Image ClassificationImageNetGFLOPs4.3NAT-Tiny
Image ClassificationImageNetGFLOPs2.7NAT-Mini
10-shot image generationADE20KGFLOPs (512 x 512)1137NAT-Base
10-shot image generationADE20KParams (M)123NAT-Base
10-shot image generationADE20KValidation mIoU49.7NAT-Base
10-shot image generationADE20KGFLOPs (512 x 512)1010NAT-Small
10-shot image generationADE20KParams (M)82NAT-Small
10-shot image generationADE20KValidation mIoU49.5NAT-Small
10-shot image generationADE20KGFLOPs (512 x 512)934NAT-Tiny
10-shot image generationADE20KParams (M)58NAT-Tiny
10-shot image generationADE20KValidation mIoU48.4NAT-Tiny
10-shot image generationADE20KGFLOPs (512 x 512)900NAT-Mini
10-shot image generationADE20KParams (M)50NAT-Mini
10-shot image generationADE20KValidation mIoU46.4NAT-Mini

Related Papers

SeC: Advancing Complex Video Object Segmentation via Progressive Concept Construction2025-07-21Automatic Classification and Segmentation of Tunnel Cracks Based on Deep Learning and Visual Explanations2025-07-18Adversarial attacks to image classification systems using evolutionary algorithms2025-07-17Efficient Adaptation of Pre-trained Vision Transformer underpinned by Approximately Orthogonal Fine-Tuning Strategy2025-07-17Federated Learning for Commercial Image Sources2025-07-17MUPAX: Multidimensional Problem Agnostic eXplainable AI2025-07-17DiffOSeg: Omni Medical Image Segmentation via Multi-Expert Collaboration Diffusion Model2025-07-17SCORE: Scene Context Matters in Open-Vocabulary Remote Sensing Instance Segmentation2025-07-17