Datasets

3,275 machine learning datasets

3,275 dataset results

Orchid2024

Orchid2024 is a fine-grained classification dataset specifically designed for Chinese Cymbidium orchid cultivars. It includes data collected from 20 cities across 12 provincial administrative regions in China and encompasses 1,269 cultivars from 8 Chinese Cymbidium orchid species and 6 additional categories, totaling 156,630 images. The dataset covers nearly all common Chinese Cymbidium cultivars currently found in China, with its fine granularity and focus on the real world making it a unique and practical resource for researchers and practitioners.

1 papers0 benchmarksImages

MemBench

Click to add a brief description of the dataset (Markdown and LaTeX enabled).

1 papers0 benchmarksImages

ALLO (Anomaly Localization in Lunar Orbit)

ALLO is an anomaly detection and localization dataset for space stations in lunar orbit. Synthetically rendered using Blender, ALLO provides realistic images of what a robotic manipulator on a space station will encounter including possible anomalies.

1 papers0 benchmarksImages

Calgary-Campinas Public Brain MR Dataset

A collaborative effort between researchers at the Vascular Imaging Lab located at the University of Calgary and the Medical Image Computing Lab located at the University of Campinas (UNICAMP) originated the Calgary Campinas public brain magnetic resonance (MR) images dataset.

1 papers0 benchmarksImages

VisArgs

VisArgs is a densely annotated benchmark for visual argument understanding. It contains 1,611 images annotated with 5,112 visual premises (with regions), 5,574 commonsense premises, and reasoning trees connecting them into structured arguments. We propose three tasks for evaluating visual argument understanding: premise localization, premise identification, and conclusion.

1 papers0 benchmarksImages, Texts

MediConfusion

MediConfusion is a challenging medical Visual Question Answering (VQA) benchmark dataset, that probes the failure modes of medical Multimodal Large Language Models (MLLMs) from a vision perspective. We reveal that state-of-the-art models are easily confused by image pairs that are otherwise visually dissimilar and clearly distinct for medical experts. Our benchmark consists of 176 confusing pairs. A confusing pair is a set of two images that share the same question and corresponding answer options, but the correct answer is different for the images. We evaluate models based on their ability to answer both questions correctly within a confusing pair, which we call set accuracy. This metric indicates how well models can tell the two images apart, as a model that selects the same answer option for both images for all pairs will receive 0% set accuracy. We also report confusion, a metric that describes the proportion of confusing pairs where the model ha

1 papers0 benchmarksBiomedical, Images, Medical, Texts

Depth from Couple Optical Differentiation

Provide:

1 papers0 benchmarksImages

CEMS-W (CEMS Wildires)

The dataset includes annotations for burned area delineation and land cover segmentation, with a focus on European soil. The dataset is curated from various sources, including the Copernicus European Monitoring System (EMS) and Sentinel-2 feeds.

1 papers2 benchmarksImages

IVM-Mix-1M

IVM-Mix-1M provide over 1M image-instruction pairs with corresponding instruction-relevant mask labels. Our IVM-Mix-1M dataset consists of three part: HumanLabelData, RobotMachineData and VQAMachineData. For the HumanLabelData and RobotMachineData, we provide well-orgnized images, mask label and language instructions. For the VQAMachineData, we only provide mask label and language instructions, please refer to https://huggingface.co/datasets/2toINF/IVM-Mix-1M and download the images from constituting datasets.

1 papers0 benchmarksImages, Texts

SPADA Dataset

Dataset for Land Cover segmentation from sparse labels, using Sentinel-2 as source imagery.

1 papers0 benchmarksImages

vqa-nle-llava

VQA NLE synthetic dataset, made with LLaVA-1.5 using features from GQA dataset. Total number of unique datas: 66684

1 papers0 benchmarksImages, Texts

YesBut

YesBut Dataset (https://yesbut-dataset.github.io) Understanding satire and humor is a challenging task for even current Vision-Language models. In this paper, we propose the challenging tasks of Satirical Image Detection (detecting whether an image is satirical), Understanding (generating the reason behind the image being satirical), and Completion (given one half of the image, selecting the other half from 2 given options, such that the complete image is satirical) and release a high-quality dataset YesBut, consisting of 2547 images, 1084 satirical and 1463 non-satirical, containing different artistic styles, to evaluate those tasks. Each satirical image in the dataset depicts a normal scenario, along with a conflicting scenario which is funny or ironic. Despite the success of current Vision-Language Models on multimodal tasks such as Visual QA and Image Captioning, our benchmarking experiments show that such models perform poorly on the proposed tasks on the YesBut Dataset in Zero-Sh

1 papers0 benchmarksImages, Texts

SCARED-C (SCARED-Corrupted)

The dataset SCARED-C is introduced in the context of assessing robustness in endoscopic depth prediction models. It is part of the EndoDepth benchmark, which is designed to evaluate the performance of monocular depth prediction models specifically for endoscopic scenarios. The dataset features 16 different types of image corruptions, each with five levels of severity, encompassing challenges like lens distortion, resolution alterations, specular reflection, and color changes that are typical in endoscopic imaging. The ground truth is on the original testing set of SCARED.

1 papers2 benchmarksBiomedical, Images, Medical

MMInstruct-GPT4V (MMInstruct: A High-Quality Multi-Modal Instruction Tuning Dataset with Extensive Diversity)

Vision-language supervised fine-tuning effectively enhances VLLM performance, but existing visual instruction tuning datasets have limitations:

1 papers0 benchmarksImages, Texts

FairFD

A Racial Fairness Benchmark Dataset for Face Forgery Detection.

1 papers0 benchmarksImages

RClicks

We conducted a large crowdsourcing study of click patterns in an interactive segmentation scenario and collected 475K real-user clicks. Drawing on ideas from saliency tasks, we develop a clickability model that enables sampling clicks, which closely resemble actual user inputs. Using our model and dataset, we propose RClicks benchmark for a comprehensive comparison of existing interactive segmentation methods on realistic clicks. Specifically, we evaluate not only the average quality of methods, but also the robustness w.r.t. click patterns.

1 papers0 benchmarksActions, Images, Interactive, Tables, Tabular

SAT-MTB-VSR

SAT-MTB-VSR is a large-scale dataset for satellite video super-resolution made from original videos of Jilin-1, which is a subset of the satellite video multitasking dataset SAT-MTB. The dataset is cropped from 18 videos captured by the Jilin-1 video satellite, covering a wide range of terrains, such as cities, docks, airports, suburbs, forests, and deserts, with a resolution of about 1 m. And the videos contain dynamic scenes, such as moving cars, airplanes, trains, and ships, which test the ability of the VSR method to deal with moving targets of different sizes and speeds. At the same time, due to the motion of the satellite, the video contains changes in viewing angle and lighting.

1 papers11 benchmarksImages

ReALFRED

realfred is an embodied instruction following benchmark.

1 papers0 benchmarksImages, Texts

Driving Weather

A synthetic dataset including driving under adverse weather conditions | Autonomous Driving

1 papers0 benchmarksImages

WiRLD (Wikidata Reference Logo Dataset)

The Wikidata Reference Logo Dataset (WiRLD), a comprehensive collection of reference logos specifically designed to address the challenges of large-scale logo identification. Recognizing the limitations of existing logo datasets, which often have a restricted number of logo classes or lack public availability, the authors curated WiRLD to facilitate research on more realistic, large-scale logo identification tasks. WiRLD contains 100,000 reference logo images sourced from Wikidata, representing 100,000 distinct logo classes. Each entity in the dataset has one corresponding logo image. The dataset's focus on providing a vast and readily accessible collection of reference logos makes it particularly valuable for evaluating one-shot logo identification methods, especially in large-scale scenarios

1 papers0 benchmarksImages

PreviousPage 141 of 164Next