RF100-VL

Roboflow100-VL

ImagesIntroduced 2025-03-20

RF100-VL is a multi-domain benchmark for object detection. The benchmark is designed to measure the extent to which model architectures can generalise to different domains, from medical imagery to defect detection to document feature identification. RF100-VL was introduced by researchers from Roboflow and Carnegie Mellon University in the paper "Roboflow100-VL: A Multi-Domain Object Detection Benchmark for Vision-Language Models".

Roboflow 100 Vision Language (RF100-VL) is the first benchmark to ask, “How well does your VLM understand the real world?” In pursuit of this question, RF100-VL introduces 100 open source datasets containing object detection bounding boxes and multimodal few shot instruction with visual examples and rich textual descriptions across novel image domains. The dataset is comprised of 164,149 images and 1,355,491, annotations across seven domains, including aerial, biological, and industrial imagery. 1693 labeling hours were spent labeling, reviewing, and preparing the dataset.

RF100-VL is a curated sample from Roboflow Universe, a repository of over 500,000+ datasets that collectively demonstrate how computer vision is being leveraged in production problems today. Current state-of-the-art models trained on web-scale data like QwenVL2.5 and GroundingDINO achieve as low as 2% AP in some categories represented in RF100-VL.