Datasets

19,997 machine learning datasets

19,997 dataset results

FinVis

Pretrain: 200k Instruction: 100k

Google Brain - Ventilator Pressure Prediction

What do doctors do when a patient has trouble breathing? They use a ventilator to pump oxygen into a sedated patient's lungs via a tube in the windpipe. But mechanical ventilation is a clinician-intensive procedure, a limitation that was prominently on display during the early days of the COVID-19 pandemic. At the same time, developing new methods for controlling mechanical ventilators is prohibitively expensive, even before reaching clinical trials. High-quality simulators could reduce this barrier.

4 papers0 benchmarks

SDSD-outdoor (Seeing Dynamic Scene in the Dark: High-Quality Video Dataset with Mechatronic Alignment)

Seeing Dynamic Scene in the Dark: High-Quality Video Dataset with Mechatronic Alignment

4 papers1 benchmarks

DAMON (Dense Annotation of 3D Human Object contact in Natural Images)

Click to add a brief description of the dataset (Markdown and LaTeX enabled).

4 papers0 benchmarksImages

Glass (Glass Identification)

From USA Forensic Science Service; 6 types of glass; defined in terms of their oxide content (i.e. Na, Fe, K, etc)

4 papers0 benchmarks

WU-Minn HCP Data - 1200 Subjects

This HCP data release includes high-resolution 3T MR scans from young healthy adult twins and non-twin siblings (ages 22-35) using four imaging modalities: structural images (T1w and T2w), resting-state fMRI (rfMRI), task-fMRI (tfMRI), and high angular resolution diffusion imaging (dMRI). Behavioral and other individual subject measure data (both NIH Toolbox and non-Toolbox measures) is available on all subjects. MEG data and 7T MR data is available for a subset of subjects (twin pairs). The Open Access Dataset includes imaging data and most behavioral data. To protect subject privacy, some of the data (e.g., which subjects are twins) are part of a Restricted Access dataset.

4 papers0 benchmarks

Pancreas-CT

Click to add a brief description of the dataset (Markdown and LaTeX enabled).

4 papers2 benchmarksImages

PRO-teXt

PRO-teXt is an extension of PROXD with the inclusion of text prompts to synthesize objects. There are 180/20 interactions for training/testing in PRO-teXt. Each interaction involves a linguistic command corresponding to an existing room arrangement.

4 papers15 benchmarks

DiaMOS Plant (A Dataset for Diagnosis and Monitoring Plant Disease)

Abstract The classification and recognition of foliar diseases is an increasingly developing field of research, where the concepts of machine and deep learning are used to support agricultural stakeholders. Datasets are the fuel for the development of these technologies. In this paper, we release and make publicly available the field dataset collected to diagnose and monitor plant symptoms, called DiaMOS Plant, consisting of 3505 images of pear fruit and leaves affected by four diseases. In addition, we perform a comparative analysis of existing literature datasets designed for the classification and recognition of leaf diseases, highlighting the main features that maximize the value and information content of the collected data. This study provides guidelines that will be useful to the research community in the context of the selection and construction of datasets.

4 papers0 benchmarksImages

BACH

Breast cancer is the most common invasive cancer in women, affecting more than 10% of women worldwide. Microscopic analysis of a biopsy remains one of the most important methods to diagnose the type of breast cancer. This requires specialized analysis by pathologists, in a task that i) is highly time- and cost-consuming and ii) often leads to nonconsensual results. The relevance and potential of automatic classification algorithms using hematoxylin-eosin stained histopathological images has already been demonstrated, but the reported results are still sub-optimal for clinical use. With the goal of advancing the state-of-the-art in automatic classification, the Grand Challenge on BreAst Cancer Histology images (BACH) was organized in conjunction with the 15th International Conference on Image Analysis and Recognition (ICIAR 2018). BACH aimed at the classification and localization of clinically relevant histopathological classes in microscopy and whole-slide images from a large annotated

4 papers0 benchmarks

Alpaca instruction tuning

Click to add a brief description of the dataset (Markdown and LaTeX enabled).

4 papers0 benchmarks

HowTo100M Adverbs

HowTo100M Adverbs is a subset from HowTo100M with mined adverbs from 83 tasks in HowTo100M. The annotations were obtained from automatically transcribed narrations of instructional videos. The dataset contains originally 5,824 clips annotated with action-adverb pairs from 72 verbs and 6 adverbs. Source: How Do You Do It? Fine-Grained Action Understanding with Pseudo-Adverbs

4 papers9 benchmarksActions, Videos

PsyMo (PsyMo: A Dataset for Estimating Self-Reported Psychological Traits from Gait)

Psychological trait estimation from external factors such as movement and appearance is a challenging and long-standing problem in psychology, and is principally based on the psychological theory of embodiment. To date, attempts to tackle this problem have utilized private small-scale datasets with intrusive body-attached sensors. Potential applications of an automated system for psychological trait estimation include estimation of occupational fatigue and psychology, and marketing and advertisement. In this work, we propose PsyMo (Psychological traits from Motion), a novel, multi-purpose and multi-modal dataset for exploring psychological cues manifested in walking patterns. We gathered walking sequences from 312 subjects in 7 different walking variations and 6 camera angles. In conjunction with walking sequences, participants filled in 6 psychological questionnaires, totalling 17 psychometric attributes related to personality, self-esteem, fatigue, aggressiveness and mental health. W

4 papers0 benchmarks3d meshes, Images

ITALIC

ITALIC: An ITALian Intent Classification Dataset

4 papers0 benchmarksAudio, Texts

Camouflaged Animal Dataset

The nine (moving camera) videos in this benchmark exhibit camouflaged animals that are difficult to see in a single frame, but can be detected based upon their motion across frames.

4 papers35 benchmarksVideos

EuroSAT-SAR

A SAR version of the EuroSAT dataset. The images were collected from Sentinel-1 GRD products (two bands VV and VH) based on the geocoordinates of the EuroSAT images.

4 papers1 benchmarksImages

MCubeS (P) (Multimodal Material Segmentation Dataset)

Multimodal material segmentation (MCubeS) dataset contains 500 sets of images from 42 street scenes. Each scene has images for four modalities: RGB, angle of linear polarization (AoLP), degree of linear polarization (DoLP), and near-infrared (NIR). The dataset provides annotated ground truth labels for both material and semantic segmentation for every pixel. The dataset is divided training set with 302 image sets, validation set with 96 image sets, and test set with 102 image sets. Each image has 1224 x 1024 pixels and a total of 20 class labels per pixel.

4 papers2 benchmarks

VirtualHome2KG

VirtualHome2KG is a system for constructing and augmenting knowledge graphs (KGs) of daily living activities using virtual space. We also provide an ontology to describe the structure of the KGs. We used VirtualHome as a platform of virtual space simulation. Thus, this repository is an extension of the virtualhome. Please see the original repository of the virtualhome for details of the Unity simulation.

4 papers0 benchmarksGraphs

ImageNet-1k vs NINCO (No ImageNet Class Objects)

The NINCO (No ImageNet Class Objects) dataset is introduced in the ICML 2023 paper In or Out? Fixing ImageNet Out-of-Distribution Detection Evaluation. The images in this dataset are free from objects that belong to any of the 1000 classes of ImageNet-1K (ILSVRC2012), which makes NINCO suitable for evaluating out-of-distribution detection on ImageNet-1K .

4 papers3 benchmarksImages

DeepSpaceYoloDataset

During the MILAN research project (MachIne Learning for AstroNomy), we have compiled a large collection of deep sky images during Electronically Assisted Astronomy sessions in Luxembourg, France, Belgium.

4 papers0 benchmarks

PreviousPage 251 of 1000Next