Grounding DINO 1.5: Advance the "Edge" of Open-Set Object Detection

Tianhe Ren, Qing Jiang, Shilong Liu, Zhaoyang Zeng, Wenlong Liu, Han Gao, Hongjie Huang, Zhengyu Ma, Xiaoke Jiang, Yihao Chen, Yuda Xiong, Hao Zhang, Feng Li, Peijun Tang, Kent Yu, Lei Zhang

2024-05-16Few-Shot Object Detection Zero-Shot Object Detection object-detection Object Detection

Paper PDF Code Code Code(official)

Abstract

This paper introduces Grounding DINO 1.5, a suite of advanced open-set object detection models developed by IDEA Research, which aims to advance the "Edge" of open-set object detection. The suite encompasses two models: Grounding DINO 1.5 Pro, a high-performance model designed for stronger generalization capability across a wide range of scenarios, and Grounding DINO 1.5 Edge, an efficient model optimized for faster speed demanded in many applications requiring edge deployment. The Grounding DINO 1.5 Pro model advances its predecessor by scaling up the model architecture, integrating an enhanced vision backbone, and expanding the training dataset to over 20 million images with grounding annotations, thereby achieving a richer semantic understanding. The Grounding DINO 1.5 Edge model, while designed for efficiency with reduced feature scales, maintains robust detection capabilities by being trained on the same comprehensive dataset. Empirical results demonstrate the effectiveness of Grounding DINO 1.5, with the Grounding DINO 1.5 Pro model attaining a 54.3 AP on the COCO detection benchmark and a 55.7 AP on the LVIS-minival zero-shot transfer benchmark, setting new records for open-set object detection. Furthermore, the Grounding DINO 1.5 Edge model, when optimized with TensorRT, achieves a speed of 75.2 FPS while attaining a zero-shot performance of 36.2 AP on the LVIS-minival benchmark, making it more suitable for edge computing scenarios. Model examples and demos with API will be released at https://github.com/IDEA-Research/Grounding-DINO-1.5-API

Results

Task	Dataset	Metric	Value	Model
Object Detection	LVIS v1.0 minival	box AP	68.1	Grounding DINO 1.5 Pro
Object Detection	ODinW Full-shot 35 Tasks	AP	72.4	Grounding DINO 1.5 Pro
Object Detection	ODinW Full-Shot 13 Tasks	AP	72.4	Grounding DINO 1.5 Pro
Object Detection	LVIS v1.0 val	box AP	63.5	Grounding DINO 1.5 Pro
Object Detection	LVIS v1.0 val	box APr	64	Grounding DINO 1.5 Pro
Object Detection	ODinW-35	Average Score	54.7	Grounding DINO 1.5 Pro
Object Detection	ODinW-13	Average Score	66.3	Grounding DINO 1.5 Pro
Object Detection	LVIS v1.0 minival	AP	57.7	Grounding DINO 1.6 Pro (without LVIS data)
Object Detection	LVIS v1.0 minival	AP	55.7	Grounding DINO 1.5 Pro (without LVIS data)
Object Detection	MSCOCO	AP	55.4	Grounding DINO 1.6 Pro (without COCO data)
Object Detection	MSCOCO	AP	54.3	Grounding DINO 1.5 Pro (without COCO data)
Object Detection	LVIS v1.0 val	AP	51.1	Grounding DINO 1.6 Pro (without LVIS data)
Object Detection	LVIS v1.0 val	AP	47.7	Grounding DINO 1.5 Pro (without LVIS data)
Object Detection	ODinW	Average Score	30.2	Grounding DINO 1.5 Pro
3D	LVIS v1.0 minival	box AP	68.1	Grounding DINO 1.5 Pro
3D	ODinW Full-shot 35 Tasks	AP	72.4	Grounding DINO 1.5 Pro
3D	ODinW Full-Shot 13 Tasks	AP	72.4	Grounding DINO 1.5 Pro
3D	LVIS v1.0 val	box AP	63.5	Grounding DINO 1.5 Pro
3D	LVIS v1.0 val	box APr	64	Grounding DINO 1.5 Pro
3D	ODinW-35	Average Score	54.7	Grounding DINO 1.5 Pro
3D	ODinW-13	Average Score	66.3	Grounding DINO 1.5 Pro
3D	LVIS v1.0 minival	AP	57.7	Grounding DINO 1.6 Pro (without LVIS data)
3D	LVIS v1.0 minival	AP	55.7	Grounding DINO 1.5 Pro (without LVIS data)
3D	MSCOCO	AP	55.4	Grounding DINO 1.6 Pro (without COCO data)
3D	MSCOCO	AP	54.3	Grounding DINO 1.5 Pro (without COCO data)
3D	LVIS v1.0 val	AP	51.1	Grounding DINO 1.6 Pro (without LVIS data)
3D	LVIS v1.0 val	AP	47.7	Grounding DINO 1.5 Pro (without LVIS data)
3D	ODinW	Average Score	30.2	Grounding DINO 1.5 Pro
Few-Shot Object Detection	ODinW-35	Average Score	54.7	Grounding DINO 1.5 Pro
Few-Shot Object Detection	ODinW-13	Average Score	66.3	Grounding DINO 1.5 Pro
2D Classification	LVIS v1.0 minival	box AP	68.1	Grounding DINO 1.5 Pro
2D Classification	ODinW Full-shot 35 Tasks	AP	72.4	Grounding DINO 1.5 Pro
2D Classification	ODinW Full-Shot 13 Tasks	AP	72.4	Grounding DINO 1.5 Pro
2D Classification	LVIS v1.0 val	box AP	63.5	Grounding DINO 1.5 Pro
2D Classification	LVIS v1.0 val	box APr	64	Grounding DINO 1.5 Pro
2D Classification	ODinW-35	Average Score	54.7	Grounding DINO 1.5 Pro
2D Classification	ODinW-13	Average Score	66.3	Grounding DINO 1.5 Pro
2D Classification	LVIS v1.0 minival	AP	57.7	Grounding DINO 1.6 Pro (without LVIS data)
2D Classification	LVIS v1.0 minival	AP	55.7	Grounding DINO 1.5 Pro (without LVIS data)
2D Classification	MSCOCO	AP	55.4	Grounding DINO 1.6 Pro (without COCO data)
2D Classification	MSCOCO	AP	54.3	Grounding DINO 1.5 Pro (without COCO data)
2D Classification	LVIS v1.0 val	AP	51.1	Grounding DINO 1.6 Pro (without LVIS data)
2D Classification	LVIS v1.0 val	AP	47.7	Grounding DINO 1.5 Pro (without LVIS data)
2D Classification	ODinW	Average Score	30.2	Grounding DINO 1.5 Pro
2D Object Detection	LVIS v1.0 minival	box AP	68.1	Grounding DINO 1.5 Pro
2D Object Detection	ODinW Full-shot 35 Tasks	AP	72.4	Grounding DINO 1.5 Pro
2D Object Detection	ODinW Full-Shot 13 Tasks	AP	72.4	Grounding DINO 1.5 Pro
2D Object Detection	LVIS v1.0 val	box AP	63.5	Grounding DINO 1.5 Pro
2D Object Detection	LVIS v1.0 val	box APr	64	Grounding DINO 1.5 Pro
2D Object Detection	ODinW-35	Average Score	54.7	Grounding DINO 1.5 Pro
2D Object Detection	ODinW-13	Average Score	66.3	Grounding DINO 1.5 Pro
2D Object Detection	LVIS v1.0 minival	AP	57.7	Grounding DINO 1.6 Pro (without LVIS data)
2D Object Detection	LVIS v1.0 minival	AP	55.7	Grounding DINO 1.5 Pro (without LVIS data)
2D Object Detection	MSCOCO	AP	55.4	Grounding DINO 1.6 Pro (without COCO data)
2D Object Detection	MSCOCO	AP	54.3	Grounding DINO 1.5 Pro (without COCO data)
2D Object Detection	LVIS v1.0 val	AP	51.1	Grounding DINO 1.6 Pro (without LVIS data)
2D Object Detection	LVIS v1.0 val	AP	47.7	Grounding DINO 1.5 Pro (without LVIS data)
2D Object Detection	ODinW	Average Score	30.2	Grounding DINO 1.5 Pro
16k	LVIS v1.0 minival	box AP	68.1	Grounding DINO 1.5 Pro
16k	ODinW Full-shot 35 Tasks	AP	72.4	Grounding DINO 1.5 Pro
16k	ODinW Full-Shot 13 Tasks	AP	72.4	Grounding DINO 1.5 Pro
16k	LVIS v1.0 val	box AP	63.5	Grounding DINO 1.5 Pro
16k	LVIS v1.0 val	box APr	64	Grounding DINO 1.5 Pro
16k	ODinW-35	Average Score	54.7	Grounding DINO 1.5 Pro
16k	ODinW-13	Average Score	66.3	Grounding DINO 1.5 Pro
16k	LVIS v1.0 minival	AP	57.7	Grounding DINO 1.6 Pro (without LVIS data)
16k	LVIS v1.0 minival	AP	55.7	Grounding DINO 1.5 Pro (without LVIS data)
16k	MSCOCO	AP	55.4	Grounding DINO 1.6 Pro (without COCO data)
16k	MSCOCO	AP	54.3	Grounding DINO 1.5 Pro (without COCO data)
16k	LVIS v1.0 val	AP	51.1	Grounding DINO 1.6 Pro (without LVIS data)
16k	LVIS v1.0 val	AP	47.7	Grounding DINO 1.5 Pro (without LVIS data)
16k	ODinW	Average Score	30.2	Grounding DINO 1.5 Pro

Grounding DINO 1.5: Advance the "Edge" of Open-Set Object Detection

Tianhe Ren, Qing Jiang, Shilong Liu, Zhaoyang Zeng, Wenlong Liu, Han Gao, Hongjie Huang, Zhengyu Ma, Xiaoke Jiang, Yihao Chen, Yuda Xiong, Hao Zhang, Feng Li, Peijun Tang, Kent Yu, Lei Zhang

Abstract

Results

Task	Dataset	Metric	Value	Model
Object Detection	LVIS v1.0 minival	box AP	68.1	Grounding DINO 1.5 Pro
Object Detection	ODinW Full-shot 35 Tasks	AP	72.4	Grounding DINO 1.5 Pro
Object Detection	ODinW Full-Shot 13 Tasks	AP	72.4	Grounding DINO 1.5 Pro
Object Detection	LVIS v1.0 val	box AP	63.5	Grounding DINO 1.5 Pro
Object Detection	LVIS v1.0 val	box APr	64	Grounding DINO 1.5 Pro
Object Detection	ODinW-35	Average Score	54.7	Grounding DINO 1.5 Pro
Object Detection	ODinW-13	Average Score	66.3	Grounding DINO 1.5 Pro
Object Detection	LVIS v1.0 minival	AP	57.7	Grounding DINO 1.6 Pro (without LVIS data)
Object Detection	LVIS v1.0 minival	AP	55.7	Grounding DINO 1.5 Pro (without LVIS data)
Object Detection	MSCOCO	AP	55.4	Grounding DINO 1.6 Pro (without COCO data)
Object Detection	MSCOCO	AP	54.3	Grounding DINO 1.5 Pro (without COCO data)
Object Detection	LVIS v1.0 val	AP	51.1	Grounding DINO 1.6 Pro (without LVIS data)
Object Detection	LVIS v1.0 val	AP	47.7	Grounding DINO 1.5 Pro (without LVIS data)
Object Detection	ODinW	Average Score	30.2	Grounding DINO 1.5 Pro
3D	LVIS v1.0 minival	box AP	68.1	Grounding DINO 1.5 Pro
3D	ODinW Full-shot 35 Tasks	AP	72.4	Grounding DINO 1.5 Pro
3D	ODinW Full-Shot 13 Tasks	AP	72.4	Grounding DINO 1.5 Pro
3D	LVIS v1.0 val	box AP	63.5	Grounding DINO 1.5 Pro
3D	LVIS v1.0 val	box APr	64	Grounding DINO 1.5 Pro
3D	ODinW-35	Average Score	54.7	Grounding DINO 1.5 Pro
3D	ODinW-13	Average Score	66.3	Grounding DINO 1.5 Pro
3D	LVIS v1.0 minival	AP	57.7	Grounding DINO 1.6 Pro (without LVIS data)
3D	LVIS v1.0 minival	AP	55.7	Grounding DINO 1.5 Pro (without LVIS data)
3D	MSCOCO	AP	55.4	Grounding DINO 1.6 Pro (without COCO data)
3D	MSCOCO	AP	54.3	Grounding DINO 1.5 Pro (without COCO data)
3D	LVIS v1.0 val	AP	51.1	Grounding DINO 1.6 Pro (without LVIS data)
3D	LVIS v1.0 val	AP	47.7	Grounding DINO 1.5 Pro (without LVIS data)
3D	ODinW	Average Score	30.2	Grounding DINO 1.5 Pro
Few-Shot Object Detection	ODinW-35	Average Score	54.7	Grounding DINO 1.5 Pro
Few-Shot Object Detection	ODinW-13	Average Score	66.3	Grounding DINO 1.5 Pro
2D Classification	LVIS v1.0 minival	box AP	68.1	Grounding DINO 1.5 Pro
2D Classification	ODinW Full-shot 35 Tasks	AP	72.4	Grounding DINO 1.5 Pro
2D Classification	ODinW Full-Shot 13 Tasks	AP	72.4	Grounding DINO 1.5 Pro
2D Classification	LVIS v1.0 val	box AP	63.5	Grounding DINO 1.5 Pro
2D Classification	LVIS v1.0 val	box APr	64	Grounding DINO 1.5 Pro
2D Classification	ODinW-35	Average Score	54.7	Grounding DINO 1.5 Pro
2D Classification	ODinW-13	Average Score	66.3	Grounding DINO 1.5 Pro
2D Classification	LVIS v1.0 minival	AP	57.7	Grounding DINO 1.6 Pro (without LVIS data)
2D Classification	LVIS v1.0 minival	AP	55.7	Grounding DINO 1.5 Pro (without LVIS data)
2D Classification	MSCOCO	AP	55.4	Grounding DINO 1.6 Pro (without COCO data)
2D Classification	MSCOCO	AP	54.3	Grounding DINO 1.5 Pro (without COCO data)
2D Classification	LVIS v1.0 val	AP	51.1	Grounding DINO 1.6 Pro (without LVIS data)
2D Classification	LVIS v1.0 val	AP	47.7	Grounding DINO 1.5 Pro (without LVIS data)
2D Classification	ODinW	Average Score	30.2	Grounding DINO 1.5 Pro
2D Object Detection	LVIS v1.0 minival	box AP	68.1	Grounding DINO 1.5 Pro
2D Object Detection	ODinW Full-shot 35 Tasks	AP	72.4	Grounding DINO 1.5 Pro
2D Object Detection	ODinW Full-Shot 13 Tasks	AP	72.4	Grounding DINO 1.5 Pro
2D Object Detection	LVIS v1.0 val	box AP	63.5	Grounding DINO 1.5 Pro
2D Object Detection	LVIS v1.0 val	box APr	64	Grounding DINO 1.5 Pro
2D Object Detection	ODinW-35	Average Score	54.7	Grounding DINO 1.5 Pro
2D Object Detection	ODinW-13	Average Score	66.3	Grounding DINO 1.5 Pro
2D Object Detection	LVIS v1.0 minival	AP	57.7	Grounding DINO 1.6 Pro (without LVIS data)
2D Object Detection	LVIS v1.0 minival	AP	55.7	Grounding DINO 1.5 Pro (without LVIS data)
2D Object Detection	MSCOCO	AP	55.4	Grounding DINO 1.6 Pro (without COCO data)
2D Object Detection	MSCOCO	AP	54.3	Grounding DINO 1.5 Pro (without COCO data)
2D Object Detection	LVIS v1.0 val	AP	51.1	Grounding DINO 1.6 Pro (without LVIS data)
2D Object Detection	LVIS v1.0 val	AP	47.7	Grounding DINO 1.5 Pro (without LVIS data)
2D Object Detection	ODinW	Average Score	30.2	Grounding DINO 1.5 Pro
16k	LVIS v1.0 minival	box AP	68.1	Grounding DINO 1.5 Pro
16k	ODinW Full-shot 35 Tasks	AP	72.4	Grounding DINO 1.5 Pro
16k	ODinW Full-Shot 13 Tasks	AP	72.4	Grounding DINO 1.5 Pro
16k	LVIS v1.0 val	box AP	63.5	Grounding DINO 1.5 Pro
16k	LVIS v1.0 val	box APr	64	Grounding DINO 1.5 Pro
16k	ODinW-35	Average Score	54.7	Grounding DINO 1.5 Pro
16k	ODinW-13	Average Score	66.3	Grounding DINO 1.5 Pro
16k	LVIS v1.0 minival	AP	57.7	Grounding DINO 1.6 Pro (without LVIS data)
16k	LVIS v1.0 minival	AP	55.7	Grounding DINO 1.5 Pro (without LVIS data)
16k	MSCOCO	AP	55.4	Grounding DINO 1.6 Pro (without COCO data)
16k	MSCOCO	AP	54.3	Grounding DINO 1.5 Pro (without COCO data)
16k	LVIS v1.0 val	AP	51.1	Grounding DINO 1.6 Pro (without LVIS data)
16k	LVIS v1.0 val	AP	47.7	Grounding DINO 1.5 Pro (without LVIS data)
16k	ODinW	Average Score	30.2	Grounding DINO 1.5 Pro

Grounding DINO 1.5: Advance the "Edge" of Open-Set Object Detection

Abstract

Results

Related Papers

Grounding DINO 1.5: Advance the "Edge" of Open-Set Object Detection

Abstract

Results

Related Papers