TasksSotADatasetsPapersMethodsSubmitAbout
Papers With Code 2

A community resource for machine learning research: papers, code, benchmarks, and state-of-the-art results.

Explore

Notable BenchmarksAll SotADatasetsPapersMethods

Community

Submit ResultsAbout

Data sourced from the PWC Archive (CC-BY-SA 4.0). Built by the community, for the community.

Papers/Grounding DINO 1.5: Advance the "Edge" of Open-Set Object ...

Grounding DINO 1.5: Advance the "Edge" of Open-Set Object Detection

Tianhe Ren, Qing Jiang, Shilong Liu, Zhaoyang Zeng, Wenlong Liu, Han Gao, Hongjie Huang, Zhengyu Ma, Xiaoke Jiang, Yihao Chen, Yuda Xiong, Hao Zhang, Feng Li, Peijun Tang, Kent Yu, Lei Zhang

2024-05-16Few-Shot Object DetectionZero-Shot Object Detectionobject-detectionObject Detection
PaperPDFCodeCodeCode(official)

Abstract

This paper introduces Grounding DINO 1.5, a suite of advanced open-set object detection models developed by IDEA Research, which aims to advance the "Edge" of open-set object detection. The suite encompasses two models: Grounding DINO 1.5 Pro, a high-performance model designed for stronger generalization capability across a wide range of scenarios, and Grounding DINO 1.5 Edge, an efficient model optimized for faster speed demanded in many applications requiring edge deployment. The Grounding DINO 1.5 Pro model advances its predecessor by scaling up the model architecture, integrating an enhanced vision backbone, and expanding the training dataset to over 20 million images with grounding annotations, thereby achieving a richer semantic understanding. The Grounding DINO 1.5 Edge model, while designed for efficiency with reduced feature scales, maintains robust detection capabilities by being trained on the same comprehensive dataset. Empirical results demonstrate the effectiveness of Grounding DINO 1.5, with the Grounding DINO 1.5 Pro model attaining a 54.3 AP on the COCO detection benchmark and a 55.7 AP on the LVIS-minival zero-shot transfer benchmark, setting new records for open-set object detection. Furthermore, the Grounding DINO 1.5 Edge model, when optimized with TensorRT, achieves a speed of 75.2 FPS while attaining a zero-shot performance of 36.2 AP on the LVIS-minival benchmark, making it more suitable for edge computing scenarios. Model examples and demos with API will be released at https://github.com/IDEA-Research/Grounding-DINO-1.5-API

Results

TaskDatasetMetricValueModel
Object DetectionLVIS v1.0 minivalbox AP68.1Grounding DINO 1.5 Pro
Object DetectionODinW Full-shot 35 TasksAP72.4Grounding DINO 1.5 Pro
Object DetectionODinW Full-Shot 13 TasksAP72.4Grounding DINO 1.5 Pro
Object DetectionLVIS v1.0 valbox AP63.5Grounding DINO 1.5 Pro
Object DetectionLVIS v1.0 valbox APr64Grounding DINO 1.5 Pro
Object DetectionODinW-35Average Score54.7Grounding DINO 1.5 Pro
Object DetectionODinW-13Average Score66.3Grounding DINO 1.5 Pro
Object DetectionLVIS v1.0 minivalAP57.7Grounding DINO 1.6 Pro (without LVIS data)
Object DetectionLVIS v1.0 minivalAP55.7Grounding DINO 1.5 Pro (without LVIS data)
Object DetectionMSCOCOAP55.4Grounding DINO 1.6 Pro (without COCO data)
Object DetectionMSCOCOAP54.3Grounding DINO 1.5 Pro (without COCO data)
Object DetectionLVIS v1.0 valAP51.1Grounding DINO 1.6 Pro (without LVIS data)
Object DetectionLVIS v1.0 valAP47.7Grounding DINO 1.5 Pro (without LVIS data)
Object DetectionODinWAverage Score30.2Grounding DINO 1.5 Pro
3DLVIS v1.0 minivalbox AP68.1Grounding DINO 1.5 Pro
3DODinW Full-shot 35 TasksAP72.4Grounding DINO 1.5 Pro
3DODinW Full-Shot 13 TasksAP72.4Grounding DINO 1.5 Pro
3DLVIS v1.0 valbox AP63.5Grounding DINO 1.5 Pro
3DLVIS v1.0 valbox APr64Grounding DINO 1.5 Pro
3DODinW-35Average Score54.7Grounding DINO 1.5 Pro
3DODinW-13Average Score66.3Grounding DINO 1.5 Pro
3DLVIS v1.0 minivalAP57.7Grounding DINO 1.6 Pro (without LVIS data)
3DLVIS v1.0 minivalAP55.7Grounding DINO 1.5 Pro (without LVIS data)
3DMSCOCOAP55.4Grounding DINO 1.6 Pro (without COCO data)
3DMSCOCOAP54.3Grounding DINO 1.5 Pro (without COCO data)
3DLVIS v1.0 valAP51.1Grounding DINO 1.6 Pro (without LVIS data)
3DLVIS v1.0 valAP47.7Grounding DINO 1.5 Pro (without LVIS data)
3DODinWAverage Score30.2Grounding DINO 1.5 Pro
Few-Shot Object DetectionODinW-35Average Score54.7Grounding DINO 1.5 Pro
Few-Shot Object DetectionODinW-13Average Score66.3Grounding DINO 1.5 Pro
2D ClassificationLVIS v1.0 minivalbox AP68.1Grounding DINO 1.5 Pro
2D ClassificationODinW Full-shot 35 TasksAP72.4Grounding DINO 1.5 Pro
2D ClassificationODinW Full-Shot 13 TasksAP72.4Grounding DINO 1.5 Pro
2D ClassificationLVIS v1.0 valbox AP63.5Grounding DINO 1.5 Pro
2D ClassificationLVIS v1.0 valbox APr64Grounding DINO 1.5 Pro
2D ClassificationODinW-35Average Score54.7Grounding DINO 1.5 Pro
2D ClassificationODinW-13Average Score66.3Grounding DINO 1.5 Pro
2D ClassificationLVIS v1.0 minivalAP57.7Grounding DINO 1.6 Pro (without LVIS data)
2D ClassificationLVIS v1.0 minivalAP55.7Grounding DINO 1.5 Pro (without LVIS data)
2D ClassificationMSCOCOAP55.4Grounding DINO 1.6 Pro (without COCO data)
2D ClassificationMSCOCOAP54.3Grounding DINO 1.5 Pro (without COCO data)
2D ClassificationLVIS v1.0 valAP51.1Grounding DINO 1.6 Pro (without LVIS data)
2D ClassificationLVIS v1.0 valAP47.7Grounding DINO 1.5 Pro (without LVIS data)
2D ClassificationODinWAverage Score30.2Grounding DINO 1.5 Pro
2D Object DetectionLVIS v1.0 minivalbox AP68.1Grounding DINO 1.5 Pro
2D Object DetectionODinW Full-shot 35 TasksAP72.4Grounding DINO 1.5 Pro
2D Object DetectionODinW Full-Shot 13 TasksAP72.4Grounding DINO 1.5 Pro
2D Object DetectionLVIS v1.0 valbox AP63.5Grounding DINO 1.5 Pro
2D Object DetectionLVIS v1.0 valbox APr64Grounding DINO 1.5 Pro
2D Object DetectionODinW-35Average Score54.7Grounding DINO 1.5 Pro
2D Object DetectionODinW-13Average Score66.3Grounding DINO 1.5 Pro
2D Object DetectionLVIS v1.0 minivalAP57.7Grounding DINO 1.6 Pro (without LVIS data)
2D Object DetectionLVIS v1.0 minivalAP55.7Grounding DINO 1.5 Pro (without LVIS data)
2D Object DetectionMSCOCOAP55.4Grounding DINO 1.6 Pro (without COCO data)
2D Object DetectionMSCOCOAP54.3Grounding DINO 1.5 Pro (without COCO data)
2D Object DetectionLVIS v1.0 valAP51.1Grounding DINO 1.6 Pro (without LVIS data)
2D Object DetectionLVIS v1.0 valAP47.7Grounding DINO 1.5 Pro (without LVIS data)
2D Object DetectionODinWAverage Score30.2Grounding DINO 1.5 Pro
16kLVIS v1.0 minivalbox AP68.1Grounding DINO 1.5 Pro
16kODinW Full-shot 35 TasksAP72.4Grounding DINO 1.5 Pro
16kODinW Full-Shot 13 TasksAP72.4Grounding DINO 1.5 Pro
16kLVIS v1.0 valbox AP63.5Grounding DINO 1.5 Pro
16kLVIS v1.0 valbox APr64Grounding DINO 1.5 Pro
16kODinW-35Average Score54.7Grounding DINO 1.5 Pro
16kODinW-13Average Score66.3Grounding DINO 1.5 Pro
16kLVIS v1.0 minivalAP57.7Grounding DINO 1.6 Pro (without LVIS data)
16kLVIS v1.0 minivalAP55.7Grounding DINO 1.5 Pro (without LVIS data)
16kMSCOCOAP55.4Grounding DINO 1.6 Pro (without COCO data)
16kMSCOCOAP54.3Grounding DINO 1.5 Pro (without COCO data)
16kLVIS v1.0 valAP51.1Grounding DINO 1.6 Pro (without LVIS data)
16kLVIS v1.0 valAP47.7Grounding DINO 1.5 Pro (without LVIS data)
16kODinWAverage Score30.2Grounding DINO 1.5 Pro

Related Papers

A Real-Time System for Egocentric Hand-Object Interaction Detection in Industrial Domains2025-07-17RS-TinyNet: Stage-wise Feature Fusion Network for Detecting Tiny Objects in Remote Sensing Images2025-07-17Decoupled PROB: Decoupled Query Initialization Tasks and Objectness-Class Learning for Open World Object Detection2025-07-17Dual LiDAR-Based Traffic Movement Count Estimation at a Signalized Intersection: Deployment, Data Collection, and Preliminary Analysis2025-07-17Vision-based Perception for Autonomous Vehicles in Obstacle Avoidance Scenarios2025-07-16Tomato Multi-Angle Multi-Pose Dataset for Fine-Grained Phenotyping2025-07-15ECORE: Energy-Conscious Optimized Routing for Deep Learning Models at the Edge2025-07-08Beyond One Shot, Beyond One Perspective: Cross-View and Long-Horizon Distillation for Better LiDAR Representations2025-07-07