TasksSotADatasetsPapersMethodsSubmitAbout
Papers With Code 2

A community resource for machine learning research: papers, code, benchmarks, and state-of-the-art results.

Explore

Notable BenchmarksAll SotADatasetsPapersMethods

Community

Submit ResultsAbout

Data sourced from the PWC Archive (CC-BY-SA 4.0). Built by the community, for the community.

Papers/Visual Attention Network

Visual Attention Network

Meng-Hao Guo, Cheng-Ze Lu, Zheng-Ning Liu, Ming-Ming Cheng, Shi-Min Hu

2022-02-20Panoptic SegmentationImage ClassificationSegmentationSemantic SegmentationPose EstimationInstance Segmentationobject-detectionObject Detection
PaperPDFCodeCodeCodeCodeCodeCode(official)CodeCodeCodeCodeCodeCodeCodeCodeCodeCodeCodeCodeCodeCodeCode

Abstract

While originally designed for natural language processing tasks, the self-attention mechanism has recently taken various computer vision areas by storm. However, the 2D nature of images brings three challenges for applying self-attention in computer vision. (1) Treating images as 1D sequences neglects their 2D structures. (2) The quadratic complexity is too expensive for high-resolution images. (3) It only captures spatial adaptability but ignores channel adaptability. In this paper, we propose a novel linear attention named large kernel attention (LKA) to enable self-adaptive and long-range correlations in self-attention while avoiding its shortcomings. Furthermore, we present a neural network based on LKA, namely Visual Attention Network (VAN). While extremely simple, VAN surpasses similar size vision transformers(ViTs) and convolutional neural networks(CNNs) in various tasks, including image classification, object detection, semantic segmentation, panoptic segmentation, pose estimation, etc. For example, VAN-B6 achieves 87.8% accuracy on ImageNet benchmark and set new state-of-the-art performance (58.2 PQ) for panoptic segmentation. Besides, VAN-B2 surpasses Swin-T 4% mIoU (50.1 vs. 46.1) for semantic segmentation on ADE20K benchmark, 2.6% AP (48.8 vs. 46.2) for object detection on COCO dataset. It provides a novel method and a simple yet strong baseline for the community. Code is available at https://github.com/Visual-Attention-Network.

Results

TaskDatasetMetricValueModel
Semantic SegmentationADE20KValidation mIoU54.7VAN-B6
Semantic SegmentationADE20KParams (M)55VAN-Large (HamNet)
Semantic SegmentationADE20KValidation mIoU50.2VAN-Large (HamNet)
Semantic SegmentationADE20KParams (M)49VAN-Large
Semantic SegmentationADE20KValidation mIoU48.1VAN-Large
Semantic SegmentationADE20KValidation mIoU46.7VAN-Base (Semantic-FPN)
Semantic SegmentationADE20KParams (M)18VAN-Small
Semantic SegmentationADE20KValidation mIoU42.9VAN-Small
Semantic SegmentationADE20KParams (M)8VAN-Tiny
Semantic SegmentationADE20KValidation mIoU38.5VAN-Tiny
Semantic SegmentationCOCO panopticPQ58.2VAN-B6*
Semantic SegmentationCOCO minivalPQ58.2Visual Attention Network (VAN-B6 + Mask2Former)
Semantic SegmentationCOCO minivalPQst48.2Visual Attention Network (VAN-B6 + Mask2Former)
Semantic SegmentationCOCO minivalPQth64.8Visual Attention Network (VAN-B6 + Mask2Former)
Image ClassificationImageNetGFLOPs114.3VAN-B6 (22K, 384res)
Image ClassificationImageNetGFLOPs50.6VAN-B5 (22K, 384res)
Image ClassificationImageNetGFLOPs38.9VAN-B6 (22K)
Image ClassificationImageNetGFLOPs35.9VAN-B4 (22K, 384res)
Image ClassificationImageNetGFLOPs17.2VAN-B5 (22K)
Image ClassificationImageNetGFLOPs12.2VAN-B4 (22K)
Image ClassificationImageNetGFLOPs5VAN-B2
Image ClassificationImageNetGFLOPs2.5VAN-B1
Image ClassificationImageNetGFLOPs0.9VAN-B0
10-shot image generationADE20KValidation mIoU54.7VAN-B6
10-shot image generationADE20KParams (M)55VAN-Large (HamNet)
10-shot image generationADE20KValidation mIoU50.2VAN-Large (HamNet)
10-shot image generationADE20KParams (M)49VAN-Large
10-shot image generationADE20KValidation mIoU48.1VAN-Large
10-shot image generationADE20KValidation mIoU46.7VAN-Base (Semantic-FPN)
10-shot image generationADE20KParams (M)18VAN-Small
10-shot image generationADE20KValidation mIoU42.9VAN-Small
10-shot image generationADE20KParams (M)8VAN-Tiny
10-shot image generationADE20KValidation mIoU38.5VAN-Tiny
10-shot image generationCOCO panopticPQ58.2VAN-B6*
10-shot image generationCOCO minivalPQ58.2Visual Attention Network (VAN-B6 + Mask2Former)
10-shot image generationCOCO minivalPQst48.2Visual Attention Network (VAN-B6 + Mask2Former)
10-shot image generationCOCO minivalPQth64.8Visual Attention Network (VAN-B6 + Mask2Former)
Panoptic SegmentationCOCO panopticPQ58.2VAN-B6*
Panoptic SegmentationCOCO minivalPQ58.2Visual Attention Network (VAN-B6 + Mask2Former)
Panoptic SegmentationCOCO minivalPQst48.2Visual Attention Network (VAN-B6 + Mask2Former)
Panoptic SegmentationCOCO minivalPQth64.8Visual Attention Network (VAN-B6 + Mask2Former)

Related Papers

SeC: Advancing Complex Video Object Segmentation via Progressive Concept Construction2025-07-21Automatic Classification and Segmentation of Tunnel Cracks Based on Deep Learning and Visual Explanations2025-07-18Adversarial attacks to image classification systems using evolutionary algorithms2025-07-17Efficient Adaptation of Pre-trained Vision Transformer underpinned by Approximately Orthogonal Fine-Tuning Strategy2025-07-17Federated Learning for Commercial Image Sources2025-07-17MUPAX: Multidimensional Problem Agnostic eXplainable AI2025-07-17Deep Learning-Based Fetal Lung Segmentation from Diffusion-weighted MRI Images and Lung Maturity Evaluation for Fetal Growth Restriction2025-07-17DiffOSeg: Omni Medical Image Segmentation via Multi-Expert Collaboration Diffusion Model2025-07-17