395 machine learning datasets
395 dataset results
The VGG Cell dataset (made up entirely of synthetic images) is the main public benchmark used to compare cell counting techniques.
NIH-Lymph Node (NIH-LN) contains 388 mediastinal LNs in 90 CT scans and 595 abdominal LNs in 86 scans.
The CLOUD dataset is a set of Optical Coherence Tomography of the Anterior Segment images (AS-OCT) used to the automatic identification and representation of the cornea-contact lens relationship. The dataset includes 112 AS-OCT images that were captured from 16 different patients. In particular, the images were obtained by an OCT Cirrus 500 scanner model of Carl Zeiss Meditec with an anterior segment module for users of scleral contact lens (SCL).
The dataset has 93 image stacks and their corresponding Extended Depth of Field (EDF) image acquired from cases with grades Nagative, LSIL or HSIL (The Bethesda System): - Negative: 16 - LSIL: 46 - HSIL: 31 The ground truth includes the grade labels for each frame and manually marked points inside cervical cells in each frame. There are in total 2705 manually marked points inside all frames: - Negative: 238 - LSIL: 1536 - HSIL: 931
The COVID-19 Posteroanterior Chest X-Ray fused (CPCXR) dataset is generated by the fusion of three publicly available datasets: COVID-19 cxr image, Radiological Society of North America (RSNA), and U.S. national library of medicine (USNLM) collected Montgomery country - NLM(MC). The dataset consists of samples of diseases labeled as COVID-19, Tuberculosis, Other pneumonia (SARS, MERS, etc.), and Normal. The dataset can be utilized to train an evaulate deep learning and machine learning models as binary and multi-class classification problem.
DLBCL-Morph is a dataset containing 42 digitally scanned high-resolution tissue microarray (TMA) slides accompanied by clinical, cytogenetic, and geometric features from 209 DLBCL cases.
Medical Case Report Corpus is a new corpus comprising annotations of medical entities in case reports, originating from PubMed Central's open access library.
The PREdiction of Clinical Outcomes from Genomic profiles (or PRECOG) encompasses 166 cancer expression data sets, including overall survival data for ~18,000 patients diagnosed with 39 distinct malignancies.
The SYSU-CEUS dataset consists of three types of Focal liver lesions (FLLs): 186 HCC instances, 109 HEM instances and 58 FNH instances (i.e.,186 malignant instances and 167 benign instances). This dataset is collected from the First Affiliated Hospital, Sun Yat-sen University. The equipment used was Aplio SSA-770A (Toshiba Medical System). All these instances with resolution 768*576 were taken from different patients, with large variations in appearance and enhancement patterns (e.g. sizes, contrasts, shapes and locations) of the FLLs.
The PART-OF dataset is a dataset of relations extracted from a medical ontology. The different entities in the ontology are parts of the human body. The dataset has 16,894 nodes with 19,436 edges between them.
Data Set Information: The main goal of this data set is providing clean and valid signals for designing cuff-less blood pressure estimation algorithms. The raw electrocardiogram (ECG), photoplethysmograph (PPG), and arterial blood pressure (ABP) signals are originally collected from the physionet.org and then some preprocessing and validation performed on them. (For more information about the process please refer to our paper)
A public open dataset of synthetic chest X-ray images of COVID-19.
Mouse Brain MRI atlas (both in-vivo and ex-vivo) (repository relocated from the original webpage)
Chinese Medical Information Extraction, a dataset that is also released in CHIP2020, is used for CMeIE task. The task is aimed at identifying both entities and relations in a sentence following the schema constraints. There are 53 relations defined in the dataset, including 10 synonymous sub-relationships and 43 other sub-relationships.
The Synthetic COVID-19 Chest X-ray Dataset consists of 21,295 synthetic COVID-19 chest X-ray images to be used for computer-aided diagnosis. These images, generated via an unsupervised domain adaptation approach, are of high quality.
The synethetic dataset (10000 pairs of images and region, 2.95GB) is shared with the code (hdf5 dataset format).
SinGAN-Seg-polyps is a synthetic dataset for polyp segmentation consisting of 10,000 synthetic polyps and masks.
HYPE Dataset - Version 1.0.0
A dataset of A 3D Computed Tomography (CT) image dataset, ImageTBAD, for segmentation of Type-B Aortic Dissection is published. ImageTBAD contains 100 3D Computed Tomography (CT) images, which is of decent size compared with existing medical imaging datasets.
Authors of the Dataset: