3,275 machine learning datasets
3,275 dataset results
GQA-OOD is a new dataset and benchmark for the evaluation of VQA models in OOD (out of distribution) settings.
High Quality Indoor Monocular Depth Estimation Dataset with focus on performance variation across space type
This dataset is made up of forward-looking sonar images containing ten classes of underwater debris. The dataset can be used for segmentation or object detection. Applications include training computer vision models for underwater robotics applications.
This dataset contains images from Sentinel-2 satellites taken before and after a wildfire. The ground truth masks are provided by the California Department of Forestry and Fire Protection and they are mapped on the images. The dataset is designed to do binary semantic segmentation of burned vs unburned areas.
A multimodal empathetic dialogue dataset.
Touch is an important sensing modality for humans, but it has not yet been incorporated into a multimodal generative language model. This is partially due to the difficulty of obtaining natural language labels for tactile data and the complexity of aligning tactile readings with both visual observations and language descriptions. As a step towards bridging that gap, this work introduces a new dataset of 44K in-the-wild vision-touch pairs, with English language labels annotated by humans (10%) and textual pseudo-labels from GPT-4V (90%). We use this dataset to train a vision-language-aligned tactile encoder for open-vocabulary classification and a touch-vision-language (TVL) model for text generation using the trained encoder. Results suggest that by incorporating touch, the TVL model improves (+29% classification accuracy) touch-vision-language alignment over existing models trained on any pair of those modalities. Although only a small fraction of the dataset is human-labeled, the TVL
SKSF-A consists of seven distinct styles drawn by professional artists. SKSF-A contains 134 identities and corresponding sketches, making a total of 938 face-sketch pairs. SKSF-A is introduced in StyleSketch, Eurographics 2024. https://kwanyun.github.io/stylesketch_project/
We release the dataset for non-commercial research. Submit requests <a href="https://forms.gle/6WPEGNWbYoEe6bte8" target="_blank">here</a>.
In Visual Query Detection (VQD), a system is given a query (prompt) natural language and an image, and then the system must produce 0 - N boxes that satisfy that query. VQD is related to several other tasks in computer vision, but it captures abilities these other tasks ignore. Unlike object detection, VQD can deal with attributes and relations among objects in the scene. In VQA, often algorithms produce the right answers due to dataset bias without `looking' at relevant image regions. Referring Expression Recognition (RER) datasets have short and often ambiguous prompts, and by having only a single box as an output, they make it easier to exploit dataset biases. VQD requires goal-directed object detection and outputting a variable number of boxes that answer a query.
The Online Action Detection Dataset (OAD) was captured using the Kinect V2 sensor, which collects color images, depth images and human skeleton joints synchronously. This dataset includes 59 long sequences and 10 actions.
This dataset contains simulated and expert-labelled spectrograms from two radio telescopes: the Hydrogen Epoch of Reionization Array (HERA) in South Africa and the Low-Frequency Array (LOFAR) in the Netherlands. These datasets are intended to test radio-frequency interference (RFI) detection schemes. This entry pertains to the HERA dataset specifically.
This dataset contains simulated and expert-labelled spectrograms from two radio telescopes: the Hydrogen Epoch of Reionization Array (HERA) in South Africa and the Low-Frequency Array (LOFAR) in the Netherlands. These datasets are intended to test radio-frequency interference (RFI) detection schemes. This entry pertains to the LOFAR dataset specifically.
SR-Reg is a brain MR-CT registration dataset, deriving from SynthRAD 2023 (https://synthrad2023.grand-challenge.org/). This dataset contains 180 subjects preprocessed images, and each subject comprises a brain MR image and a brain CT image with corresponding segmentation label. SR-Reg is first introduced in MambaMorph (https://arxiv.org/abs/2401.13934).
SEPE 8K dataset is made of 40 different 8K (8192 x 4320) video sequences and 40 variant 8K (8192 x 5464) images. The video sequences were captured at a framerate of 29.97 frames per second (FPS) and had been encoded into videos using AVC/H.264, HEVC/H.265, and AV1 codecs at resolutions from 8K to 480p. The images, video sequences, encoded videos, and various other statistics related to the media that make the dataset are stored online, published, and maintained on the repo on GitHub for non-commercial use. this proposed dataset is - as far as we know - the first to publish true 8K natural sequences; thus, it is important for the next level of applications dealing with multimedia such as video quality assessment, super-resolution, video coding, video compression, and many more.
BlendMimic3D is a pioneering synthetic dataset developed using Blender, designed to enhance Human Pose Estimation (HPE) research. This dataset features diverse scenarios including self-occlusions, object-based occlusions, and out-of-frame occlusions, tailored for the development and testing of advanced HPE models.
This dataset contains pre and post destruction images and also segmentation labels for test images.
Appearance-based gaze estimation systems have shown great progress recently, yet the performance of these techniques depend on the datasets used for training. Most of the existing gaze estimation datasets setup in interactive settings were recorded in laboratory conditions and those recorded in the wild conditions display limited head pose and illumination variations. Further, we observed little attention so far towards precision evaluations of existing gaze estimation approaches. In this work, we present a large gaze estimation dataset, PARKS-Gaze, with wider head pose and illumination variation and with multiple samples for a single Point of Gaze (PoG). The dataset contains 974 minutes of data from 28 participants with a head pose range of ±60◦ in both yaw and pitch directions. Our within-dataset and cross-dataset evaluations and precision evaluations indicate that the proposed dataset is more challenging and enable models to generalize on unseen participants better than the existing
Dataset Card for The Cancer Genome Atlas (TCGA) Multimodal Dataset <!-- Provide a quick summary of the dataset. -->
DARK FACE dataset provides 6,000 real-world low light images captured during the nighttime, at teaching buildings, streets, bridges, overpasses, parks etc., all labeled with bounding boxes for of human face, as the main training and/or validation sets. We also provide 9,000 unlabeled low-light images collected from the same setting. Additionally, we provided a unique set of 789 paired low-light/normal-light images captured in controllable real lighting conditions (but unnecessarily containing faces), which can be used as parts of the training data at the participants' discretization. There will be a hold-out testing set of 4,000 low-light images, with human face bounding boxes annotated.
Im4Sketch is a large-scale dataset with shape-oriented set of classes for image-to-sketch generalization . It consists of a collection of natural images from 874 categories for training and validation, and sketches from 393 categories (a subset of natural image categories) for testing.