TasksSotADatasetsPapersMethodsSubmitAbout
Papers With Code 2

A community resource for machine learning research: papers, code, benchmarks, and state-of-the-art results.

Explore

Notable BenchmarksAll SotADatasetsPapersMethods

Community

Submit ResultsAbout

Data sourced from the PWC Archive (CC-BY-SA 4.0). Built by the community, for the community.

Papers/ViDT: An Efficient and Effective Fully Transformer-based O...

ViDT: An Efficient and Effective Fully Transformer-based Object Detector

Hwanjun Song, Deqing Sun, Sanghyuk Chun, Varun Jampani, Dongyoon Han, Byeongho Heo, Wonjae Kim, Ming-Hsuan Yang

2021-10-08ICLR 2022 4Image Classificationobject-detectionObject Detection
PaperPDFCode(official)

Abstract

Transformers are transforming the landscape of computer vision, especially for recognition tasks. Detection transformers are the first fully end-to-end learning systems for object detection, while vision transformers are the first fully transformer-based architecture for image classification. In this paper, we integrate Vision and Detection Transformers (ViDT) to build an effective and efficient object detector. ViDT introduces a reconfigured attention module to extend the recent Swin Transformer to be a standalone object detector, followed by a computationally efficient transformer decoder that exploits multi-scale features and auxiliary techniques essential to boost the detection performance without much increase in computational load. Extensive evaluation results on the Microsoft COCO benchmark dataset demonstrate that ViDT obtains the best AP and latency trade-off among existing fully transformer-based object detectors, and achieves 49.2AP owing to its high scalability for large models. We will release the code and trained models at https://github.com/naver-ai/vidt

Results

TaskDatasetMetricValueModel
Object DetectionCOCO 2017 valAP49.2ViDT Swin-base
Object DetectionCOCO 2017 valAP5069.4ViDT Swin-base
Object DetectionCOCO 2017 valAP7553.1ViDT Swin-base
Object DetectionCOCO 2017 valAPL66.9ViDT Swin-base
Object DetectionCOCO 2017 valAPM52.6ViDT Swin-base
Object DetectionCOCO 2017 valAPS30.6ViDT Swin-base
Object DetectionCOCO 2017 valAP47.5ViDT Swin-small
Object DetectionCOCO 2017 valAP5067.7ViDT Swin-small
Object DetectionCOCO 2017 valAP7551.4ViDT Swin-small
Object DetectionCOCO 2017 valAPL64.8ViDT Swin-small
Object DetectionCOCO 2017 valAPM50.7ViDT Swin-small
Object DetectionCOCO 2017 valAPS29.2ViDT Swin-small
Object DetectionCOCO 2017 valAP44.8ViDT Swin-tiny
Object DetectionCOCO 2017 valAP5064.5ViDT Swin-tiny
Object DetectionCOCO 2017 valAP7548.7ViDT Swin-tiny
Object DetectionCOCO 2017 valAPL62.1ViDT Swin-tiny
Object DetectionCOCO 2017 valAPM47.6ViDT Swin-tiny
Object DetectionCOCO 2017 valAPS25.9ViDT Swin-tiny
Object DetectionCOCO 2017 valAP40.4ViDT Swin-nano
Object DetectionCOCO 2017 valAP5059.6ViDT Swin-nano
Object DetectionCOCO 2017 valAP7543.3ViDT Swin-nano
Object DetectionCOCO 2017 valAPL55.8ViDT Swin-nano
Object DetectionCOCO 2017 valAPM42.5ViDT Swin-nano
Object DetectionCOCO 2017 valAPS23.2ViDT Swin-nano
3DCOCO 2017 valAP49.2ViDT Swin-base
3DCOCO 2017 valAP5069.4ViDT Swin-base
3DCOCO 2017 valAP7553.1ViDT Swin-base
3DCOCO 2017 valAPL66.9ViDT Swin-base
3DCOCO 2017 valAPM52.6ViDT Swin-base
3DCOCO 2017 valAPS30.6ViDT Swin-base
3DCOCO 2017 valAP47.5ViDT Swin-small
3DCOCO 2017 valAP5067.7ViDT Swin-small
3DCOCO 2017 valAP7551.4ViDT Swin-small
3DCOCO 2017 valAPL64.8ViDT Swin-small
3DCOCO 2017 valAPM50.7ViDT Swin-small
3DCOCO 2017 valAPS29.2ViDT Swin-small
3DCOCO 2017 valAP44.8ViDT Swin-tiny
3DCOCO 2017 valAP5064.5ViDT Swin-tiny
3DCOCO 2017 valAP7548.7ViDT Swin-tiny
3DCOCO 2017 valAPL62.1ViDT Swin-tiny
3DCOCO 2017 valAPM47.6ViDT Swin-tiny
3DCOCO 2017 valAPS25.9ViDT Swin-tiny
3DCOCO 2017 valAP40.4ViDT Swin-nano
3DCOCO 2017 valAP5059.6ViDT Swin-nano
3DCOCO 2017 valAP7543.3ViDT Swin-nano
3DCOCO 2017 valAPL55.8ViDT Swin-nano
3DCOCO 2017 valAPM42.5ViDT Swin-nano
3DCOCO 2017 valAPS23.2ViDT Swin-nano
2D ClassificationCOCO 2017 valAP49.2ViDT Swin-base
2D ClassificationCOCO 2017 valAP5069.4ViDT Swin-base
2D ClassificationCOCO 2017 valAP7553.1ViDT Swin-base
2D ClassificationCOCO 2017 valAPL66.9ViDT Swin-base
2D ClassificationCOCO 2017 valAPM52.6ViDT Swin-base
2D ClassificationCOCO 2017 valAPS30.6ViDT Swin-base
2D ClassificationCOCO 2017 valAP47.5ViDT Swin-small
2D ClassificationCOCO 2017 valAP5067.7ViDT Swin-small
2D ClassificationCOCO 2017 valAP7551.4ViDT Swin-small
2D ClassificationCOCO 2017 valAPL64.8ViDT Swin-small
2D ClassificationCOCO 2017 valAPM50.7ViDT Swin-small
2D ClassificationCOCO 2017 valAPS29.2ViDT Swin-small
2D ClassificationCOCO 2017 valAP44.8ViDT Swin-tiny
2D ClassificationCOCO 2017 valAP5064.5ViDT Swin-tiny
2D ClassificationCOCO 2017 valAP7548.7ViDT Swin-tiny
2D ClassificationCOCO 2017 valAPL62.1ViDT Swin-tiny
2D ClassificationCOCO 2017 valAPM47.6ViDT Swin-tiny
2D ClassificationCOCO 2017 valAPS25.9ViDT Swin-tiny
2D ClassificationCOCO 2017 valAP40.4ViDT Swin-nano
2D ClassificationCOCO 2017 valAP5059.6ViDT Swin-nano
2D ClassificationCOCO 2017 valAP7543.3ViDT Swin-nano
2D ClassificationCOCO 2017 valAPL55.8ViDT Swin-nano
2D ClassificationCOCO 2017 valAPM42.5ViDT Swin-nano
2D ClassificationCOCO 2017 valAPS23.2ViDT Swin-nano
2D Object DetectionCOCO 2017 valAP49.2ViDT Swin-base
2D Object DetectionCOCO 2017 valAP5069.4ViDT Swin-base
2D Object DetectionCOCO 2017 valAP7553.1ViDT Swin-base
2D Object DetectionCOCO 2017 valAPL66.9ViDT Swin-base
2D Object DetectionCOCO 2017 valAPM52.6ViDT Swin-base
2D Object DetectionCOCO 2017 valAPS30.6ViDT Swin-base
2D Object DetectionCOCO 2017 valAP47.5ViDT Swin-small
2D Object DetectionCOCO 2017 valAP5067.7ViDT Swin-small
2D Object DetectionCOCO 2017 valAP7551.4ViDT Swin-small
2D Object DetectionCOCO 2017 valAPL64.8ViDT Swin-small
2D Object DetectionCOCO 2017 valAPM50.7ViDT Swin-small
2D Object DetectionCOCO 2017 valAPS29.2ViDT Swin-small
2D Object DetectionCOCO 2017 valAP44.8ViDT Swin-tiny
2D Object DetectionCOCO 2017 valAP5064.5ViDT Swin-tiny
2D Object DetectionCOCO 2017 valAP7548.7ViDT Swin-tiny
2D Object DetectionCOCO 2017 valAPL62.1ViDT Swin-tiny
2D Object DetectionCOCO 2017 valAPM47.6ViDT Swin-tiny
2D Object DetectionCOCO 2017 valAPS25.9ViDT Swin-tiny
2D Object DetectionCOCO 2017 valAP40.4ViDT Swin-nano
2D Object DetectionCOCO 2017 valAP5059.6ViDT Swin-nano
2D Object DetectionCOCO 2017 valAP7543.3ViDT Swin-nano
2D Object DetectionCOCO 2017 valAPL55.8ViDT Swin-nano
2D Object DetectionCOCO 2017 valAPM42.5ViDT Swin-nano
2D Object DetectionCOCO 2017 valAPS23.2ViDT Swin-nano
16kCOCO 2017 valAP49.2ViDT Swin-base
16kCOCO 2017 valAP5069.4ViDT Swin-base
16kCOCO 2017 valAP7553.1ViDT Swin-base
16kCOCO 2017 valAPL66.9ViDT Swin-base
16kCOCO 2017 valAPM52.6ViDT Swin-base
16kCOCO 2017 valAPS30.6ViDT Swin-base
16kCOCO 2017 valAP47.5ViDT Swin-small
16kCOCO 2017 valAP5067.7ViDT Swin-small
16kCOCO 2017 valAP7551.4ViDT Swin-small
16kCOCO 2017 valAPL64.8ViDT Swin-small
16kCOCO 2017 valAPM50.7ViDT Swin-small
16kCOCO 2017 valAPS29.2ViDT Swin-small
16kCOCO 2017 valAP44.8ViDT Swin-tiny
16kCOCO 2017 valAP5064.5ViDT Swin-tiny
16kCOCO 2017 valAP7548.7ViDT Swin-tiny
16kCOCO 2017 valAPL62.1ViDT Swin-tiny
16kCOCO 2017 valAPM47.6ViDT Swin-tiny
16kCOCO 2017 valAPS25.9ViDT Swin-tiny
16kCOCO 2017 valAP40.4ViDT Swin-nano
16kCOCO 2017 valAP5059.6ViDT Swin-nano
16kCOCO 2017 valAP7543.3ViDT Swin-nano
16kCOCO 2017 valAPL55.8ViDT Swin-nano
16kCOCO 2017 valAPM42.5ViDT Swin-nano
16kCOCO 2017 valAPS23.2ViDT Swin-nano

Related Papers

Automatic Classification and Segmentation of Tunnel Cracks Based on Deep Learning and Visual Explanations2025-07-18Adversarial attacks to image classification systems using evolutionary algorithms2025-07-17Efficient Adaptation of Pre-trained Vision Transformer underpinned by Approximately Orthogonal Fine-Tuning Strategy2025-07-17Federated Learning for Commercial Image Sources2025-07-17MUPAX: Multidimensional Problem Agnostic eXplainable AI2025-07-17A Real-Time System for Egocentric Hand-Object Interaction Detection in Industrial Domains2025-07-17RS-TinyNet: Stage-wise Feature Fusion Network for Detecting Tiny Objects in Remote Sensing Images2025-07-17Decoupled PROB: Decoupled Query Initialization Tasks and Objectness-Class Learning for Open World Object Detection2025-07-17