3,275 machine learning datasets
3,275 dataset results
This dataset supports the research detailed in the pre-print "Virtual Imaging Trials Improved the Transparency and Reliability of AI Systems in COVID-19 Imaging." The study employs both clinical and simulated CT data to evaluate AI models for COVID-19 diagnosis. By leveraging the Virtual Imaging Trials (VIT) framework, the research addresses reproducibility and generalizability issues prevalent in medical imaging AI models.
A fully synthetic dataset of drones generated using structured domain randomization. It contains multiple datasets generated using different styles: - Drones only - Drones and Birds - Generic Distractors - Realistic Distractors - Random Backgrounds
COMFORT is an evaluation protocol to systematically assess the spatial reasoning capabilities of VLMs.
This repository contains data for the NeurIPS conference paper titled "Harnessing Machine Learning for Single-Shot Measurement of Free Electron Laser Pulse Power".
Due to the free-form nature of the open vocabulary image classification task, special annotations are required for image sets used for evaluation purposes. Three such image datasets are presented here:
DAVIS-Edit is a curated testing benchmark for video editing. This dataset contains two evaluation settings, i.e., text- and image-based editing. Besides, it offers two types of annotated for both modalities of prompts, considering the editing scenarios with similar (DAVIS-Edit-S) and changing (DAVIS-Edit-C) shapes, so as to address the shape inconsistency problem in video-to-video editing.
This dataset contains synthetic images extracted from the CARLA simulator along with rich information extracted from the deferred rendering pipeline of Unreal Engine 4. The main purpose of this dataset is the training of the state-of-the-art image-to-image translation model proposed by Intel Labs "Enhancing Photorealism Enhancement" (EPE). Translation results derived from the model targeting the characteristics of Cityscapes, KITTI, and Mapillary Vistas are also provided. Computer vision-based models trained on these data are expected to perform better when deployed in the real world.
We established a large-scale plant disease segmentation dataset named PlantSeg. PlantSeg comprises more than 11,400 images of 115 different plant diseases from various environments, each annotated with its corresponding segmentation label for diseased parts. To the best of our knowledge, PlantSeg is the largest plant disease segmentation dataset containing in-the-wild images. Our dataset enables researchers to evaluate their models and provides a valid foundation for the development and benchmarking of plant disease segmentation algorithms.
WiFiCam dataset for through-wall imaging based on WiFi channel state information. The corresponding source code repository is located at: https://github.com/StrohmayerJ/wificam
Click to add a brief description of the dataset (Markdown and LaTeX enabled).
The SuSy Dataset combines authentic photographs and AI-generated images designed for training and evaluating synthetic image detection models. It contains over 25,000 images from six different sources, including real-world photographs from COCO and synthetic images created by state-of-the-art diffusion models such as DALL-E 3, Midjourney, and Stable Diffusion.
Temporal Dataset for Indoor and In-Vehicle Thermal Comfort Estimation Abstract Thermal comfort estimation is essential for enhancing user experience in static indoor environments and dynamic in-vehicle scenarios. While traditional datasets focus on buildings, their application to fast-changing conditions, such as in vehicles, remains unexplored. We address this gap by introducing two temporal datasets collected from (1) a self-built climatic chamber with 31 sensor signals and user-labeled ratings from 18 participants and (2) in-vehicle studies with 20 participants in a BMW 3 Series.
In order to evaluate the effectiveness of NToP in real-world scenarios, we collect a new dataset OmniLab with a top-view omnidirectional camera, mounted on the ceiling of two different rooms (bedroom, living room) at 2.5 m height. Five actors (3 males, 2 females) perform 15 actions from CMU-MoCap database (brooming, cleaning windows, down and get up, drinking, fall-on-face, in chair and stand up, pull object, push object, rugpull, turn left, turn right, upbend from knees, upbend from waist, up from ground, walk, walk-old-man) in two rooms with varying clothes. The recorded action length is 2.5 s, which results in 60 images for each scene at a frame rate of 24 FPS. The position of the camera is fixed and the resolution of the images is 1200 by 1200 pixels. A total of 4800 frames are collected. All annotations of 17 keypoints conforming to COCO conventions are estimated through a keypoint detector and subsequently refined by four different humans in two loops to ensure high annotation qu
Human pose estimation (HPE) in the top-view using fisheye cameras presents a promising and innovative application domain. However, the availability of datasets capturing this viewpoint is extremely limited, especially those with high-quality 2D and 3D keypoint annotations. Addressing this gap, we leverage the capabilities of Neural Radiance Fields (NeRF) technique to establish a comprehensive pipeline for generating human pose datasets from existing 2D and 3D datasets, specifically tailored for the top-view fisheye perspective. Through this pipeline, we create a novel dataset NToP (NeRF-powered Top-view human Pose dataset for fisheye cameras) with over 570 thousand images, and conduct an extensive evaluation of its efficacy in enhancing neural networks for 2D and 3D top-view human pose estimation. Extensive evaluations on existing top-view 2D and 3D HPE datasets as well as our new real-world top-view 2D HPE dataset OmniLab prove that our dataset is effective and exceeds previous datase
Click to add a brief description of the dataset (Markdown and LaTeX enabled).
These images consist of a series of bacteria of the type Bacillus Subtilis that are suspended and captured by a digital microscope. The fluorescent bacteria dataset can be created as desired, defining the number of bacteria per image and the total number of images. It comes with 3280x2464 resolution images and centroid locations of each bacteria, useful for enumeration or density map estimation.
To study the problem of weakly supervised attended object detection in cultural sites, we collected and labeled a dataset of egocentric images acquired from subjects visiting a cultural site. The dataset has been designed to offer a snapshot of the subject’s visual experience while visiting a museum and contains labels for several artworks and details attended by the subjects.
Click to add a brief description of the dataset (Markdown and LaTeX enabled).
Scene Graph Generation (SGG) converts visual scenes into structured graph representations, providing deeper scene understanding for complex vision tasks. However, existing SGG models often overlook essential spatial relationships and struggle with generalization in open-vocabulary contexts. To address these limitations, we propose LLaVA-SpaceSGG, a multimodal large language model (MLLM) designed for open-vocabulary SGG with enhanced spatial relation modeling. To train it, we collect the SGG instruction-tuning dataset, named SpaceSGG. This dataset is constructed by combining publicly available datasets and synthesizing data using open-source models within our data construction pipeline. It combines object locations, object relations, and depth information, resulting in three data formats: spatial SGG description, question-answering, and conversation. To enhance the transfer of MLLMs' inherent capabilities to the SGG task, we introduce a two-stage training paradigm. Experiments show that
IllusionMNIST_test Dataset Characteristics IllusionMNIST_test is a generated dataset derived from the MNIST dataset. It introduces a novel element of pareidolia—a phenomenon where patterns, often faces, are perceived in random or abstract stimuli. The dataset contains 11 classes: the original 10 digits from MNIST, and an additional "No Illusion" class. It includes 1,219 samples, all synthetically created rather than real-world images.