TasksSotADatasetsPapersMethodsSubmitAbout
Papers With Code 2

A community resource for machine learning research: papers, code, benchmarks, and state-of-the-art results.

Explore

Notable BenchmarksAll SotADatasetsPapersMethods

Community

Submit ResultsAbout

Data sourced from the PWC Archive (CC-BY-SA 4.0). Built by the community, for the community.

Papers/DeBiFormer: Vision Transformer with Deformable Agent Bi-le...

DeBiFormer: Vision Transformer with Deformable Agent Bi-level Routing Attention

Nguyen Huu Bao Long, Chenyu Zhang, Yuzhi Shi, Tsubasa Hirakawa, Takayoshi Yamashita, Tohgoroh Matsui, Hironobu Fujiyoshi

2024-10-11Image ClassificationSemantic Segmentationobject-detectionObject Detection
PaperPDFCode(official)

Abstract

Vision Transformers with various attention modules have demonstrated superior performance on vision tasks. While using sparsity-adaptive attention, such as in DAT, has yielded strong results in image classification, the key-value pairs selected by deformable points lack semantic relevance when fine-tuning for semantic segmentation tasks. The query-aware sparsity attention in BiFormer seeks to focus each query on top-k routed regions. However, during attention calculation, the selected key-value pairs are influenced by too many irrelevant queries, reducing attention on the more important ones. To address these issues, we propose the Deformable Bi-level Routing Attention (DBRA) module, which optimizes the selection of key-value pairs using agent queries and enhances the interpretability of queries in attention maps. Based on this, we introduce the Deformable Bi-level Routing Attention Transformer (DeBiFormer), a novel general-purpose vision transformer built with the DBRA module. DeBiFormer has been validated on various computer vision tasks, including image classification, object detection, and semantic segmentation, providing strong evidence of its effectiveness.Code is available at {https://github.com/maclong01/DeBiFormer}

Results

TaskDatasetMetricValueModel
Semantic SegmentationADE20KValidation mIoU52DeBiFormer-B (IN1k pretrain, Upernet 160k)
Object DetectionCOCO 2017mAP48.5DeBiFormer-B (IN1k pretrain, MaskRCNN 12ep)
Object DetectionCOCO 2017mAP47.5DeBiFormer-S (IN1k pretrain, MaskRCNN 12ep)
Object DetectionCOCO 2017mAP47.1DeBiFormer-B (IN1k pretrain, Retina)
Object DetectionCOCO 2017mAP45.6DeBiFormer-S (IN1k pretrain, Retina)
3DCOCO 2017mAP48.5DeBiFormer-B (IN1k pretrain, MaskRCNN 12ep)
3DCOCO 2017mAP47.5DeBiFormer-S (IN1k pretrain, MaskRCNN 12ep)
3DCOCO 2017mAP47.1DeBiFormer-B (IN1k pretrain, Retina)
3DCOCO 2017mAP45.6DeBiFormer-S (IN1k pretrain, Retina)
2D ClassificationCOCO 2017mAP48.5DeBiFormer-B (IN1k pretrain, MaskRCNN 12ep)
2D ClassificationCOCO 2017mAP47.5DeBiFormer-S (IN1k pretrain, MaskRCNN 12ep)
2D ClassificationCOCO 2017mAP47.1DeBiFormer-B (IN1k pretrain, Retina)
2D ClassificationCOCO 2017mAP45.6DeBiFormer-S (IN1k pretrain, Retina)
2D Object DetectionCOCO 2017mAP48.5DeBiFormer-B (IN1k pretrain, MaskRCNN 12ep)
2D Object DetectionCOCO 2017mAP47.5DeBiFormer-S (IN1k pretrain, MaskRCNN 12ep)
2D Object DetectionCOCO 2017mAP47.1DeBiFormer-B (IN1k pretrain, Retina)
2D Object DetectionCOCO 2017mAP45.6DeBiFormer-S (IN1k pretrain, Retina)
10-shot image generationADE20KValidation mIoU52DeBiFormer-B (IN1k pretrain, Upernet 160k)
16kCOCO 2017mAP48.5DeBiFormer-B (IN1k pretrain, MaskRCNN 12ep)
16kCOCO 2017mAP47.5DeBiFormer-S (IN1k pretrain, MaskRCNN 12ep)
16kCOCO 2017mAP47.1DeBiFormer-B (IN1k pretrain, Retina)
16kCOCO 2017mAP45.6DeBiFormer-S (IN1k pretrain, Retina)

Related Papers

SeC: Advancing Complex Video Object Segmentation via Progressive Concept Construction2025-07-21Automatic Classification and Segmentation of Tunnel Cracks Based on Deep Learning and Visual Explanations2025-07-18Adversarial attacks to image classification systems using evolutionary algorithms2025-07-17Efficient Adaptation of Pre-trained Vision Transformer underpinned by Approximately Orthogonal Fine-Tuning Strategy2025-07-17Federated Learning for Commercial Image Sources2025-07-17MUPAX: Multidimensional Problem Agnostic eXplainable AI2025-07-17DiffOSeg: Omni Medical Image Segmentation via Multi-Expert Collaboration Diffusion Model2025-07-17SCORE: Scene Context Matters in Open-Vocabulary Remote Sensing Instance Segmentation2025-07-17