3,275 machine learning datasets
3,275 dataset results
IllusionFashionMNIST_test Dataset Characteristics IllusionFashionMNIST_test is a generated dataset derived from the FashionMNIST dataset. It incorporates the concept of pareidolia—a phenomenon where patterns, often faces, are perceived in random or abstract stimuli. The dataset contains 11 classes: the original 10 classes from FashionMNIST, and an additional "No Illusion" class. It includes 1,267 samples, all synthetically created rather than real-world images.
IllusionAnimals_test Dataset Characteristics IllusionAnimals_test is a generated dataset based on a synthetic collection of animal images, including 10 animal classes: cat, dog, pigeon, butterfly, elephant, horse, deer, snake, fish, and rooster. Additionally, it includes a "No Illusion" class, bringing the total number of classes to 11. The dataset contains 1,100 samples, all created synthetically rather than derived from real-world images.
IllusionChar_test Dataset Characteristics IllusionChar_test is a generated dataset containing 3,300 samples of images that feature sequences of 3 to 5 random characters. Unlike classification-focused datasets, this dataset is designed for tasks that require reasoning about patterns, sequences, or illusions within the character sequences. All images are synthetically generated, and no real-world data is included.
High-resolution early gastric cancer (EGC) detection and analysis: Patient Data:Datasets often include images from patients diagnosed with gastric cancer, specifically distinguishing between early gastric cancer (EGC) and Non -pathogenic gastric cancer (NGC). The study utilized data from 341 patients, with 124 classified as EGC and 217 as NGC. Image Types: High-resolution images are typically obtained from endoscopy image. Data Volume: The size of datasets mentioned a dataset of 1120 images specifically for EGC detection and 2150 images for NGC.
Whole-body, low-level control/manipulation demonstration dataset for ManiSkill-HAB. Demonstrations are organized by task-subtask-object. All demos use RGBD (128x128) and state. JSON files store metadata (tincluding even labels and success/failure mode), while HDF5 files store demonstration data.
UAVDB is a high-resolution RGB video dataset meticulously designed for UAV detection tasks across diverse scales and complex backgrounds. Comprising 10,763 training, 2,720 validation, and 4,578 test images (18,061 total) across datasets and camera configurations, it addresses key limitations of existing datasets, such as inaccurate bounding box annotations and limited diversity in environmental contexts, thereby enhancing the reliability and applicability of detection algorithms in real-world scenarios.
The MVTec-FS dataset is a refined version of the MVTec AD dataset, designed for few-shot learning research. It contains instance-level annotations of anomaly images and is tailored to support tasks such as:
Infrared dim-small target detection has gained increasing importance in both military and civilian applications due to its ability to detect thermal radiation, operate effectively at night, passively sense radiation, and offer strong concealment with high resistance to interference. These capabilities make it ideal for systems such as aircraft and bird surveillance, missile guidance, and maritime rescue operations. In these applications, the need for mid- to long-range observations often results in small targets that appear dim and are difficult to detect. This dataset, named SIRST-UAVB, provides infrared images captured in the 3–5 μm wavelength range using a mid-wave infrared camera, with a resolution of 640×512 pixels and shooting distances ranging from 100 to 800 meters. The dataset predominantly features small targets, which make up 94.3% of the total data and include unmanned aerial vehicles (UAVs) and birds. These targets are presented against complex backgrounds, such as skies,
We introduce the AODRaw dataset, which offers 7,785 high-resolution real RAW images with 135,601 annotated instances spanning 62 categories, capturing a broad range of indoor and outdoor scenes under 9 distinct light and weather conditions. AODRaw supports RAW and sRGB object detection.
MMCOMPOSITION is a high-quality benchmark specifically designed to comprehensively evaluate the compositionality of pre-trained Vision-Language Models (VLMs) across three main dimensions—VL compositional perception, reasoning, and probing—which are further divided into 13 distinct categories of questions. While previous benchmarks have mainly focused on text-to-image retrieval, single-choice questions, and open-ended text generation, MMCOMPOSITION introduces a more diverse and challenging set of 4,342 tasks covering both single-image and multi-image scenarios, as well as single-choice and indefinite-choice formats. This expanded range of tasks aims to capture the complex interplay between vision and language more effectively, surpassing earlier benchmarks such as ARO and Winoground by providing a more comprehensive and in-depth assessment of models’ cross-modal compositional capabilities.
An evaluation dataset for planning with LLM agents
MapEval-Visual contains 400 image-question-answer triplets. Each question is paired with a snapshot from google maps website. The task is the answer question based on the provided map snapshot.
A RGB-D dataset converted from NYUDv2 into COCO-style instance segmentation format. To construct NYUDv2-IS, specifically tailored for instance segmentation, we generated instance masks that delineate individual objects in each image. These masks were labeled using the object class annotations provided in the original NYUDv2 dataset, which is distributed in MATLAB format. The process involved several key steps: (1) extracting binary instance masks, (2) converting these masks into polygon representations, and (3) generating COCO-style annotations. Each annotation includes essential attributes such as category ID, segmentation masks, bounding boxes, object areas, and image metadata. During this conversion, we focused on 9 categories out of the original 13 classes, excluding non-instance categories such as walls and floors. To ensure dataset quality, images without any object annotations were systematically removed.
A RGB-D dataset converted from SUN-RGBD into COCO-style instance segmentation format. To transform SUN-RGBD into an instance segmentation benchmark (i.e., SUN-RGBDIS), we employed a pipeline similar to that of NYUDv2-IS. We selected 17 categories from the original 37 classes, carefully omitting non-instance categories like ceilings and walls. Images lacking any identifiable object instances were filtered out to maintain dataset relevance for instance segmentation tasks. We systematically convert segmentation annotations into COCO format, generating precise bounding boxes, instance masks, and object attributes.
RGB-D instance segmentation box dataset. The Box-IS dataset was created to support research on human-robot collaboration with a focus on robotic manipulation tasks. It was captured using the Intel® RealSense™ Depth Camera D455, a high-performance sensor designed for depth imaging. To ensure precise depth measurements, we bypassed the default depth data processing of the sensor and performed accurate stereo matching directly from the captured left and right IR images. Employing the UniMatch technique, we derived a high-quality depth map from these stereo IR images, which was then aligned with the corresponding RGB image for a comprehensive output. The dataset was intentionally designed to encompass a broad range of scene complexities, from simple box arrangements to highly irregular configurations. This diversity ensures that it can effectively benchmark algorithms across varying levels of difficulty.
we introduce COph100, a novel and challenging dataset known as the Comprehensive Ophthalmology Retinal Image Registration dataset for infants with a wide range of image quality issues constituting the public "RIDIRP" database. COph100 consists of 100 eyes, each with 2 to 9 examination sessions, amounting to a total of 491 image pairs carefully selected from the publicly available dataset. We manually labeled the corresponding ground truth image points and provided automatic vessel segmentation masks for each image.
The IUST_PersonReID dataset was developed to address limitations in existing person re-identification datasets by including cultural and environmental contexts unique to Islamic countries, especially Iran and Iraq. Unlike common datasets, which don’t reflect the clothing styles common in these regions—such as hijabs and other coverings—the IUST_PersonReID dataset represents this diversity, helping to reduce demographic bias and improve model accuracy. Collected from various real-world settings under different lighting, camera angles, indoor & outdoor, and weather conditions, this dataset provides extensive, overlapping views across multiple cameras. By capturing these unique conditions, IUST_PersonReID offers a valuable resource for developing re-ID models that perform more reliably across diverse environments and populations.
bbb
SPIQA Dataset Card Dataset Details Dataset Name: SPIQA (Scientific Paper Image Question Answering)
Click to add a brief description of the dataset (Markdown and LaTeX enabled).