Metric: AP (higher is better)
| # | Model↕ | AP▼ | Extra Data | Paper | Date↕ | Code |
|---|---|---|---|---|---|---|
| 1 | CP-DETR-Pro(without LVIS data) | 51.6 | Yes | CP-DETR: Concept Prompt Guide DETR Toward Strong... | 2024-12-13 | - |
| 2 | Grounding DINO 1.6 Pro (without LVIS data) | 51.1 | Yes | Grounding DINO 1.5: Advance the "Edge" of Open-S... | 2024-05-16 | Code |
| 3 | Grounding DINO 1.5 Pro (without LVIS data) | 47.7 | Yes | Grounding DINO 1.5: Advance the "Edge" of Open-S... | 2024-05-16 | Code |
| 4 | best_single_model_val | 47.55 | No | - | - | - |
| 5 | OWLv2 (OWL-ST+FT) | 47 | Yes | Scaling Open-Vocabulary Object Detection | 2023-06-16 | Code |
| 6 | htc | 39.05 | No | - | - | - |
| 7 | MQ-GLIP-L | 34.7 | Yes | Multi-modal Queried Object Detection in the Wild | 2023-05-30 | Code |
| 8 | OV-DINO-T (without LVIS data, swin tiny) | 32.9 | Yes | OV-DINO: Unified Open-Vocabulary Detection with ... | 2024-07-10 | Code |
| 9 | Organizer Provided Baseline | 27.26 | No | - | - | - |
| 10 | GLIP-L | 26.9 | Yes | Grounded Language-Image Pre-training | 2021-12-07 | Code |
| 11 | null | 25.8 | No | - | - | - |
| 12 | Forest R-CNN | 23.2 | No | Forest R-CNN: Large-Vocabulary Long-Tailed Objec... | 2020-08-13 | Code |
| 13 | MQ-GLIP-T | 22.6 | Yes | Multi-modal Queried Object Detection in the Wild | 2023-05-30 | Code |
| 14 | MQ-GroundingDINO-T | 22.1 | Yes | Multi-modal Queried Object Detection in the Wild | 2023-05-30 | Code |
| 15 | person | 21.82 | No | - | - | - |
| 16 | test balloon 6 | 16.62 | No | - | - | - |