395 machine learning datasets
395 dataset results
Several datasets are fostering innovation in higher-level functions for everyone, everywhere. By providing this repository, we hope to encourage the research community to focus on hard problems. In this repository, we present our severity rates (BIRADS) of clinicians while diagnosing several patients from our User Tests and Analysis 4 (UTA4) study. Here, we provide a dataset for the measurements of severity rates (BIRADS) concerning the patient diagnostic. Work and results are published on a top Human-Computer Interaction (HCI) conference named AVI 2020 (page). Results were analyzed and interpreted from our Statistical Analysis charts. The user tests were made in clinical institutions, where clinicians diagnose several patients for a Single-Modality vs Multi-Modality comparison. For example, in these tests, we used both prototype-single-modality and prototype-multi-modality repositories for the comparison. On the same hand, the hereby dataset represents the pieces of information of bot
Highlights
Hypertention Disease Medication dataset.
The ABCD Study is a prospective longitudinal study starting at the ages of 9-10 and following participants for 10 years. The study includes a diverse sample of nearly 12,000 youth enrolled at 21 research sites across the country. It measures brain development (via structural, task functional, and resting state functional imaging), social, emotional, and cognitive development, mental health, substance use and attitudes, gender identity and sexual health, bio-specimens, as well as a variety of physical health, and environmental factors.
The NCANDA consortium is composed of an Administrative component at the University of California San Diego, a Data Analysis and Informatics component at SRI International, and five research sites (University of California San Diego, SRI International, Duke University, the University of Pittsburgh, and the Oregon Health & Science University). A sample of 831 individuals (ages 12-21) were recruited for the study across the five research sites. The enrolled participants are followed in an accelerated longitudinal design that involves structural and functional imaging of the brain along with extensive neuropsychological and clinical assessments.
This dataset includes sharp-blur pairs of Leishmania image, which is a protozoan parasite microscopy image dataset of Leishmania, obtained from the preserved slides stained with Giemsa. The paired blur-sharp images are acquired by employing a bright-field microscope (Olympus IX53) with 100× magnification oil immersion objectives.We first capture the sharp images as ground truth, then acquire its corresponding out-of-focus images. The extent and nature of defocusing are random along the optical axis, where the degree of out-of-focus is inconsistent from image-to-image. This dataset includes 764 in-focus and 764 corresponding out-of-focus images, where each image is composed of 2304 × 1728 pixels in 24-bit JPG format.
The data set includes 589 T2-weighted images acquired from the same number of patients collected by seven studies, INDEX, the SmartTarget Biopsy Trial, PICTURE, TCIA Prostate3T, Promise12, TCIA ProstateDx (Diagnosis) and the Prostate MR Image Database. Further details are reported in the respective study references.
Unique radiogenomic dataset from a Non-Small Cell Lung Cancer (NSCLC) cohort of 211 subjects. The dataset comprises Computed Tomography (CT), Positron Emission Tomography (PET)/CT images, semantic annotations of the tumors as observed on the medical images using a controlled vocabulary, segmentation maps of tumors in the CT scans, and quantitative values obtained from the PET/CT scans. Imaging data are also paired with gene mutation, RNA sequencing data from samples of surgically excised tumor tissue, and clinical data, including survival outcomes.
It is a competition on kaggle with stroke Prediction, which is heavily imbalanced.
EBHI-Seg is a dataset containing 5,170 images of six types of tumor differentiation stages and the corresponding ground truth images. The dataset can provide researchers with new segmentation algorithms for medical diagnosis of colorectal cancer.
VISEM-Tracking is a dataset consisting of 20 video recordings of 30s of spermatozoa with manually annotated bounding-box coordinates and a set of sperm characteristics analyzed by experts in the domain. It is an extension of the previously published VISEM dataset. In addition to the annotated data, unlabeled video clips are provided for easy-to-use access and analysis of the data.
The Rotterdam EyePACS AIROGS dataset (in full, so including train and test) contains 113,893 color fundus images from 60,357 subjects and approximately 500 different sites with a heterogeneous ethnicity.
Histological images of colorectal cancer, derived from the TCGA database
MTNeuro is a multi-task neuroimaging benchmark built on volumetric, micrometer-resolution X-ray microtomography images spanning a large thalamocortical section of mouse brain, encompassing multiple cortical and subcortical regions.
The Fraunhofer Portugal AICOS EDoF Dataset was produced within the TAMI project and is composed of images of microscopic fields of view (FOV) of Liquid-based Cervical Cytology (LBC) samples. A total of 15 LBC samples were supplied by the Pathology Services from Hospital Fernando Fonseca and the Portuguese Oncology Institute of Porto. For each LBC sample, a set of images were obtained using a version of µSmartScope [1,2] prototype adapted to the cervical cytology use case [3,4].
MuCeD, a dataset that is carefully curated and validated by expert pathologists from the All India Institute of Medical Science (AIIMS), Delhi, India. The H&E-stained histopathology images of the human duodenum in MuCeD are captured through an Olympus BX50 microscope at 20x zoom using a DP26 camera with each image being 1920x2148 in dimension. The dataset has 55 images, with bounding boxes for 2,090 IELs and 6,518 ENs annotated using the LabelMe software and are further validated by multiple pathologists. These cells are selected from the epithelial area -- a region of interest that has been explicitly segmented by experts. The epithelial area denotes the area of continuous villi and is used for cell detection, whereas rest of the area is masked out. Further, each image is sliced into 9 subimages and each subimage is re-scaled to 640x640, before it is given as input to object detection models. We divide 55 images into five folds of 11 images each and report 5-fold crossvalidation num
An instance segmentation dataset of yeast cells in microstructures. The dataset includes 493 densely annotated microscopy images. For more information see the paper "An Instance Segmentation Dataset of Yeast Cells in Microstructures".
The data generated from this study are grouped into 3 main types: (1) participant demographic and clinical data, (2) sensor data from the different devices, as well as clinical scores and metadata related to the tasks performed, and (3) participant diaries collected during the in-clinic and at-home phases of the study. Throughout the data tables, timestamps are provided as UNIX epoch/POSIX time.
COVIDx CXR-3 is an open access benchmark dataset that we generated, comprising 30,882 CXR images across 17,026 patient cases. Images may be added over time to improve the dataset.
A dataset of 100K synthetic images of skin lesions, ground-truth (GT) segmentations of lesions and healthy skin, GT segmentations of seven body parts (head, torso, hips, legs, feet, arms and hands), and GT binary masks of non-skin regions in the texture maps of 215 scans from the 3DBodyTex.v1 dataset [2], [3] created using the framework described in [1]. The dataset is primarily intended to enable the development of skin lesion analysis methods. Synthetic image creation consisted of two main steps. First, skin lesions from the Fitzpatrick 17k dataset were blended onto skin regions of high-resolution three-dimensional human scans from the 3DBodyTex dataset [2], [3]. Second, two-dimensional renders of the modified scans were generated.