TasksSotADatasetsPapersMethodsSubmitAbout
Papers With Code 2

A community resource for machine learning research: papers, code, benchmarks, and state-of-the-art results.

Explore

Notable BenchmarksAll SotADatasetsPapersMethods

Community

Submit ResultsAbout

Data sourced from the PWC Archive (CC-BY-SA 4.0). Built by the community, for the community.

Papers/X-volution: On the unification of convolution and self-att...

X-volution: On the unification of convolution and self-attention

Xuanhong Chen, Hang Wang, Bingbing Ni

2021-06-04Image ClassificationInstance SegmentationObject Detection
PaperPDF

Abstract

Convolution and self-attention are acting as two fundamental building blocks in deep neural networks, where the former extracts local image features in a linear way while the latter non-locally encodes high-order contextual relationships. Though essentially complementary to each other, i.e., first-/high-order, stat-of-the-art architectures, i.e., CNNs or transformers lack a principled way to simultaneously apply both operations in a single computational module, due to their heterogeneous computing pattern and excessive burden of global dot-product for visual tasks. In this work, we theoretically derive a global self-attention approximation scheme, which approximates a self-attention via the convolution operation on transformed features. Based on the approximated scheme, we establish a multi-branch elementary module composed of both convolution and self-attention operation, capable of unifying both local and non-local feature interaction. Importantly, once trained, this multi-branch module could be conditionally converted into a single standard convolution operation via structural re-parameterization, rendering a pure convolution styled operator named X-volution, ready to be plugged into any modern networks as an atomic operation. Extensive experiments demonstrate that the proposed X-volution, achieves highly competitive visual understanding improvements (+1.2% top-1 accuracy on ImageNet classification, +1.7 box AP and +1.5 mask AP on COCO detection and segmentation).

Results

TaskDatasetMetricValueModel
Object DetectionCOCO minivalAP5064Faster R-CNN (FPN, X-volution)
Object DetectionCOCO minivalAP7546.4Faster R-CNN (FPN, X-volution)
Object DetectionCOCO minivalAPL55Faster R-CNN (FPN, X-volution)
Object DetectionCOCO minivalAPM46Faster R-CNN (FPN, X-volution)
Object DetectionCOCO minivalAPS26.9Faster R-CNN (FPN, X-volution)
Object DetectionCOCO minivalbox AP42.8Faster R-CNN (FPN, X-volution)
3DCOCO minivalAP5064Faster R-CNN (FPN, X-volution)
3DCOCO minivalAP7546.4Faster R-CNN (FPN, X-volution)
3DCOCO minivalAPL55Faster R-CNN (FPN, X-volution)
3DCOCO minivalAPM46Faster R-CNN (FPN, X-volution)
3DCOCO minivalAPS26.9Faster R-CNN (FPN, X-volution)
3DCOCO minivalbox AP42.8Faster R-CNN (FPN, X-volution)
Instance SegmentationCOCO minivalAPL53.1Mask R-CNN (FPN, X-volution, SA)
Instance SegmentationCOCO minivalAPM40Mask R-CNN (FPN, X-volution, SA)
Instance SegmentationCOCO minivalAPS19.2Mask R-CNN (FPN, X-volution, SA)
Instance SegmentationCOCO minivalmask AP37.2Mask R-CNN (FPN, X-volution, SA)
2D ClassificationCOCO minivalAP5064Faster R-CNN (FPN, X-volution)
2D ClassificationCOCO minivalAP7546.4Faster R-CNN (FPN, X-volution)
2D ClassificationCOCO minivalAPL55Faster R-CNN (FPN, X-volution)
2D ClassificationCOCO minivalAPM46Faster R-CNN (FPN, X-volution)
2D ClassificationCOCO minivalAPS26.9Faster R-CNN (FPN, X-volution)
2D ClassificationCOCO minivalbox AP42.8Faster R-CNN (FPN, X-volution)
2D Object DetectionCOCO minivalAP5064Faster R-CNN (FPN, X-volution)
2D Object DetectionCOCO minivalAP7546.4Faster R-CNN (FPN, X-volution)
2D Object DetectionCOCO minivalAPL55Faster R-CNN (FPN, X-volution)
2D Object DetectionCOCO minivalAPM46Faster R-CNN (FPN, X-volution)
2D Object DetectionCOCO minivalAPS26.9Faster R-CNN (FPN, X-volution)
2D Object DetectionCOCO minivalbox AP42.8Faster R-CNN (FPN, X-volution)
16kCOCO minivalAP5064Faster R-CNN (FPN, X-volution)
16kCOCO minivalAP7546.4Faster R-CNN (FPN, X-volution)
16kCOCO minivalAPL55Faster R-CNN (FPN, X-volution)
16kCOCO minivalAPM46Faster R-CNN (FPN, X-volution)
16kCOCO minivalAPS26.9Faster R-CNN (FPN, X-volution)
16kCOCO minivalbox AP42.8Faster R-CNN (FPN, X-volution)

Related Papers

Automatic Classification and Segmentation of Tunnel Cracks Based on Deep Learning and Visual Explanations2025-07-18Adversarial attacks to image classification systems using evolutionary algorithms2025-07-17Efficient Adaptation of Pre-trained Vision Transformer underpinned by Approximately Orthogonal Fine-Tuning Strategy2025-07-17Federated Learning for Commercial Image Sources2025-07-17MUPAX: Multidimensional Problem Agnostic eXplainable AI2025-07-17SCORE: Scene Context Matters in Open-Vocabulary Remote Sensing Instance Segmentation2025-07-17A Real-Time System for Egocentric Hand-Object Interaction Detection in Industrial Domains2025-07-17RS-TinyNet: Stage-wise Feature Fusion Network for Detecting Tiny Objects in Remote Sensing Images2025-07-17