TasksSotADatasetsPapersMethodsSubmitAbout
Papers With Code 2

A community resource for machine learning research: papers, code, benchmarks, and state-of-the-art results.

Explore

Notable BenchmarksAll SotADatasetsPapersMethods

Community

Submit ResultsAbout

Data sourced from the PWC Archive (CC-BY-SA 4.0). Built by the community, for the community.

Papers/End-to-End Object Detection with Transformers

End-to-End Object Detection with Transformers

Nicolas Carion, Francisco Massa, Gabriel Synnaeve, Nicolas Usunier, Alexander Kirillov, Sergey Zagoruyko

2020-05-26ECCV 2020 8Panoptic SegmentationReal-Time Object Detection2D Object DetectionObject Detection
PaperPDFCodeCodeCodeCodeCodeCodeCodeCodeCode(official)CodeCodeCodeCodeCodeCodeCodeCodeCodeCodeCodeCodeCodeCodeCodeCodeCodeCodeCodeCodeCodeCodeCodeCodeCodeCodeCodeCode

Abstract

We present a new method that views object detection as a direct set prediction problem. Our approach streamlines the detection pipeline, effectively removing the need for many hand-designed components like a non-maximum suppression procedure or anchor generation that explicitly encode our prior knowledge about the task. The main ingredients of the new framework, called DEtection TRansformer or DETR, are a set-based global loss that forces unique predictions via bipartite matching, and a transformer encoder-decoder architecture. Given a fixed small set of learned object queries, DETR reasons about the relations of the objects and the global image context to directly output the final set of predictions in parallel. The new model is conceptually simple and does not require a specialized library, unlike many other modern detectors. DETR demonstrates accuracy and run-time performance on par with the well-established and highly-optimized Faster RCNN baseline on the challenging COCO object detection dataset. Moreover, DETR can be easily generalized to produce panoptic segmentation in a unified manner. We show that it significantly outperforms competitive baselines. Training code and pretrained models are available at https://github.com/facebookresearch/detr.

Results

TaskDatasetMetricValueModel
Semantic SegmentationCOCO minivalAP33DETR-R101 (ResNet-101)
Semantic SegmentationCOCO minivalPQ45.1DETR-R101 (ResNet-101)
Semantic SegmentationCOCO minivalPQst37DETR-R101 (ResNet-101)
Semantic SegmentationCOCO minivalPQth50.5DETR-R101 (ResNet-101)
Semantic SegmentationCOCO minivalRQ55.5DETR-R101 (ResNet-101)
Semantic SegmentationCOCO minivalRQst46DETR-R101 (ResNet-101)
Semantic SegmentationCOCO minivalRQth61.7DETR-R101 (ResNet-101)
Semantic SegmentationCOCO minivalSQ79.9DETR-R101 (ResNet-101)
Semantic SegmentationCOCO minivalSQst78.5DETR-R101 (ResNet-101)
Semantic SegmentationCOCO minivalSQth80.9DETR-R101 (ResNet-101)
Semantic SegmentationCOCO minivalAP39.7PanopticFPN++
Semantic SegmentationCOCO minivalPQ44.1PanopticFPN++
Semantic SegmentationCOCO minivalPQst33.6PanopticFPN++
Semantic SegmentationCOCO minivalPQth51PanopticFPN++
Semantic SegmentationCOCO minivalRQ53.3PanopticFPN++
Semantic SegmentationCOCO minivalRQst42.1PanopticFPN++
Semantic SegmentationCOCO minivalRQth60.6PanopticFPN++
Semantic SegmentationCOCO minivalSQ79.5PanopticFPN++
Semantic SegmentationCOCO minivalSQst74PanopticFPN++
Semantic SegmentationCOCO minivalSQth83.2PanopticFPN++
Object DetectionCOCO-OAverage mAP17.1DETR (ResNet-50)
Object DetectionCOCO-OEffective Robustness-1.82DETR (ResNet-50)
Object DetectionCOCO minivalAP5064.7DETR-DC5 (ResNet-101)
Object DetectionCOCO minivalAP7547.7DETR-DC5 (ResNet-101)
Object DetectionCOCO minivalAPL62.3DETR-DC5 (ResNet-101)
Object DetectionCOCO minivalAPM49.5DETR-DC5 (ResNet-101)
Object DetectionCOCO minivalAPS23.7DETR-DC5 (ResNet-101)
Object DetectionCOCO minivalbox AP44.9DETR-DC5 (ResNet-101)
Object DetectionCOCO minivalAP5063.9Faster RCNN-R101-FPN+
Object DetectionCOCO minivalAP7547.8Faster RCNN-R101-FPN+
Object DetectionCOCO minivalAPL56Faster RCNN-R101-FPN+
Object DetectionCOCO minivalAPM48.1Faster RCNN-R101-FPN+
Object DetectionCOCO minivalAPS27.2Faster RCNN-R101-FPN+
Object DetectionCOCO minivalbox AP44Faster RCNN-R101-FPN+
Object DetectionCOCO (Common Objects in Context)FPS (V100, b=1)26Faster RCNN-FPN+
Object DetectionCOCO (Common Objects in Context)box AP42Faster RCNN-FPN+
3DCOCO-OAverage mAP17.1DETR (ResNet-50)
3DCOCO-OEffective Robustness-1.82DETR (ResNet-50)
3DCOCO minivalAP5064.7DETR-DC5 (ResNet-101)
3DCOCO minivalAP7547.7DETR-DC5 (ResNet-101)
3DCOCO minivalAPL62.3DETR-DC5 (ResNet-101)
3DCOCO minivalAPM49.5DETR-DC5 (ResNet-101)
3DCOCO minivalAPS23.7DETR-DC5 (ResNet-101)
3DCOCO minivalbox AP44.9DETR-DC5 (ResNet-101)
3DCOCO minivalAP5063.9Faster RCNN-R101-FPN+
3DCOCO minivalAP7547.8Faster RCNN-R101-FPN+
3DCOCO minivalAPL56Faster RCNN-R101-FPN+
3DCOCO minivalAPM48.1Faster RCNN-R101-FPN+
3DCOCO minivalAPS27.2Faster RCNN-R101-FPN+
3DCOCO minivalbox AP44Faster RCNN-R101-FPN+
3DCOCO (Common Objects in Context)FPS (V100, b=1)26Faster RCNN-FPN+
3DCOCO (Common Objects in Context)box AP42Faster RCNN-FPN+
2D ClassificationCOCO-OAverage mAP17.1DETR (ResNet-50)
2D ClassificationCOCO-OEffective Robustness-1.82DETR (ResNet-50)
2D ClassificationCOCO minivalAP5064.7DETR-DC5 (ResNet-101)
2D ClassificationCOCO minivalAP7547.7DETR-DC5 (ResNet-101)
2D ClassificationCOCO minivalAPL62.3DETR-DC5 (ResNet-101)
2D ClassificationCOCO minivalAPM49.5DETR-DC5 (ResNet-101)
2D ClassificationCOCO minivalAPS23.7DETR-DC5 (ResNet-101)
2D ClassificationCOCO minivalbox AP44.9DETR-DC5 (ResNet-101)
2D ClassificationCOCO minivalAP5063.9Faster RCNN-R101-FPN+
2D ClassificationCOCO minivalAP7547.8Faster RCNN-R101-FPN+
2D ClassificationCOCO minivalAPL56Faster RCNN-R101-FPN+
2D ClassificationCOCO minivalAPM48.1Faster RCNN-R101-FPN+
2D ClassificationCOCO minivalAPS27.2Faster RCNN-R101-FPN+
2D ClassificationCOCO minivalbox AP44Faster RCNN-R101-FPN+
2D ClassificationCOCO (Common Objects in Context)FPS (V100, b=1)26Faster RCNN-FPN+
2D ClassificationCOCO (Common Objects in Context)box AP42Faster RCNN-FPN+
2D Object DetectionCOCO-OAverage mAP17.1DETR (ResNet-50)
2D Object DetectionCOCO-OEffective Robustness-1.82DETR (ResNet-50)
2D Object DetectionCOCO minivalAP5064.7DETR-DC5 (ResNet-101)
2D Object DetectionCOCO minivalAP7547.7DETR-DC5 (ResNet-101)
2D Object DetectionCOCO minivalAPL62.3DETR-DC5 (ResNet-101)
2D Object DetectionCOCO minivalAPM49.5DETR-DC5 (ResNet-101)
2D Object DetectionCOCO minivalAPS23.7DETR-DC5 (ResNet-101)
2D Object DetectionCOCO minivalbox AP44.9DETR-DC5 (ResNet-101)
2D Object DetectionCOCO minivalAP5063.9Faster RCNN-R101-FPN+
2D Object DetectionCOCO minivalAP7547.8Faster RCNN-R101-FPN+
2D Object DetectionCOCO minivalAPL56Faster RCNN-R101-FPN+
2D Object DetectionCOCO minivalAPM48.1Faster RCNN-R101-FPN+
2D Object DetectionCOCO minivalAPS27.2Faster RCNN-R101-FPN+
2D Object DetectionCOCO minivalbox AP44Faster RCNN-R101-FPN+
2D Object DetectionCOCO (Common Objects in Context)FPS (V100, b=1)26Faster RCNN-FPN+
2D Object DetectionCOCO (Common Objects in Context)box AP42Faster RCNN-FPN+
10-shot image generationCOCO minivalAP33DETR-R101 (ResNet-101)
10-shot image generationCOCO minivalPQ45.1DETR-R101 (ResNet-101)
10-shot image generationCOCO minivalPQst37DETR-R101 (ResNet-101)
10-shot image generationCOCO minivalPQth50.5DETR-R101 (ResNet-101)
10-shot image generationCOCO minivalRQ55.5DETR-R101 (ResNet-101)
10-shot image generationCOCO minivalRQst46DETR-R101 (ResNet-101)
10-shot image generationCOCO minivalRQth61.7DETR-R101 (ResNet-101)
10-shot image generationCOCO minivalSQ79.9DETR-R101 (ResNet-101)
10-shot image generationCOCO minivalSQst78.5DETR-R101 (ResNet-101)
10-shot image generationCOCO minivalSQth80.9DETR-R101 (ResNet-101)
10-shot image generationCOCO minivalAP39.7PanopticFPN++
10-shot image generationCOCO minivalPQ44.1PanopticFPN++
10-shot image generationCOCO minivalPQst33.6PanopticFPN++
10-shot image generationCOCO minivalPQth51PanopticFPN++
10-shot image generationCOCO minivalRQ53.3PanopticFPN++
10-shot image generationCOCO minivalRQst42.1PanopticFPN++
10-shot image generationCOCO minivalRQth60.6PanopticFPN++
10-shot image generationCOCO minivalSQ79.5PanopticFPN++
10-shot image generationCOCO minivalSQst74PanopticFPN++
10-shot image generationCOCO minivalSQth83.2PanopticFPN++
Panoptic SegmentationCOCO minivalAP33DETR-R101 (ResNet-101)
Panoptic SegmentationCOCO minivalPQ45.1DETR-R101 (ResNet-101)
Panoptic SegmentationCOCO minivalPQst37DETR-R101 (ResNet-101)
Panoptic SegmentationCOCO minivalPQth50.5DETR-R101 (ResNet-101)
Panoptic SegmentationCOCO minivalRQ55.5DETR-R101 (ResNet-101)
Panoptic SegmentationCOCO minivalRQst46DETR-R101 (ResNet-101)
Panoptic SegmentationCOCO minivalRQth61.7DETR-R101 (ResNet-101)
Panoptic SegmentationCOCO minivalSQ79.9DETR-R101 (ResNet-101)
Panoptic SegmentationCOCO minivalSQst78.5DETR-R101 (ResNet-101)
Panoptic SegmentationCOCO minivalSQth80.9DETR-R101 (ResNet-101)
Panoptic SegmentationCOCO minivalAP39.7PanopticFPN++
Panoptic SegmentationCOCO minivalPQ44.1PanopticFPN++
Panoptic SegmentationCOCO minivalPQst33.6PanopticFPN++
Panoptic SegmentationCOCO minivalPQth51PanopticFPN++
Panoptic SegmentationCOCO minivalRQ53.3PanopticFPN++
Panoptic SegmentationCOCO minivalRQst42.1PanopticFPN++
Panoptic SegmentationCOCO minivalRQth60.6PanopticFPN++
Panoptic SegmentationCOCO minivalSQ79.5PanopticFPN++
Panoptic SegmentationCOCO minivalSQst74PanopticFPN++
Panoptic SegmentationCOCO minivalSQth83.2PanopticFPN++
16kCOCO-OAverage mAP17.1DETR (ResNet-50)
16kCOCO-OEffective Robustness-1.82DETR (ResNet-50)
16kCOCO minivalAP5064.7DETR-DC5 (ResNet-101)
16kCOCO minivalAP7547.7DETR-DC5 (ResNet-101)
16kCOCO minivalAPL62.3DETR-DC5 (ResNet-101)
16kCOCO minivalAPM49.5DETR-DC5 (ResNet-101)
16kCOCO minivalAPS23.7DETR-DC5 (ResNet-101)
16kCOCO minivalbox AP44.9DETR-DC5 (ResNet-101)
16kCOCO minivalAP5063.9Faster RCNN-R101-FPN+
16kCOCO minivalAP7547.8Faster RCNN-R101-FPN+
16kCOCO minivalAPL56Faster RCNN-R101-FPN+
16kCOCO minivalAPM48.1Faster RCNN-R101-FPN+
16kCOCO minivalAPS27.2Faster RCNN-R101-FPN+
16kCOCO minivalbox AP44Faster RCNN-R101-FPN+
16kCOCO (Common Objects in Context)FPS (V100, b=1)26Faster RCNN-FPN+
16kCOCO (Common Objects in Context)box AP42Faster RCNN-FPN+

Related Papers

A Real-Time System for Egocentric Hand-Object Interaction Detection in Industrial Domains2025-07-17RS-TinyNet: Stage-wise Feature Fusion Network for Detecting Tiny Objects in Remote Sensing Images2025-07-17Decoupled PROB: Decoupled Query Initialization Tasks and Objectness-Class Learning for Open World Object Detection2025-07-17Dual LiDAR-Based Traffic Movement Count Estimation at a Signalized Intersection: Deployment, Data Collection, and Preliminary Analysis2025-07-17Vision-based Perception for Autonomous Vehicles in Obstacle Avoidance Scenarios2025-07-16Tomato Multi-Angle Multi-Pose Dataset for Fine-Grained Phenotyping2025-07-15DEARLi: Decoupled Enhancement of Recognition and Localization for Semi-supervised Panoptic Segmentation2025-07-14ECORE: Energy-Conscious Optimized Routing for Deep Learning Models at the Edge2025-07-08