395 machine learning datasets
395 dataset results
Standardized Multi-Channel Dataset for Glaucoma (SMDG-19) is a collection and standardization of 19 public datasets, comprised of full-fundus glaucoma images, associated image metadata like, optic disc segmentation, optic cup segmentation, blood vessel segmentation, and any provided per-instance text metadata like sex and age. This dataset is the largest public repository of fundus images with glaucoma.
A large paroxysmal atrial fibrillation long-term electrocardiogram monitoring database Abstract Atrial fibrillation (AF) is the most common sustained heart arrhythmia in adults. Holter monitoring, a long-term 2-lead electrocardiogram (ECG), is a key tool available to cardiologists for AF diagnosis. Machine learning (ML) and deep learning (DL) models have shown great capacity to automatically detect AF in ECG and their use as medical decision support tool is growing. Training these models rely on a few open and annotated databases. We present a new Holter monitoring database from patients with paroxysmal AF with 167 records from 152 patients, acquired from an outpatient cardiology clinic from 2006 to 2017 in Belgium. AF episodes were manually annotated and reviewed by an expert cardiologist and a specialist cardiac nurse. Records last from 19 hours up to 95 hours, divided into 24-hour files. In total, it represents 24 million seconds of annotated Holter monitoring, sampled at 200 Hz. Th
This is an improved machine-learning-ready glaucoma dataset using a balanced subset of standardized fundus images from the Rotterdam EyePACS AIROGS [1] set. This dataset is split into training, validation, and test folders which contain 4000 (~84%), 385 (~8%), and 385 (~8%) fundus images in each class respectively. Each training set has a folder for each class: referable glaucoma (RG) and non-referable glaucoma (NRG).
The purpose of this challenge is to investigate (semi-)automatic spinal curvature estimation algorithms. Participant will have to submit results of Cobb angle for all the test data.
Dataset for the DREAMING - Diminished Reality for Emerging Applications in Medicine through Inpainting Challenge!
Objective This study introduces the BlendedICU dataset, a massive dataset of international intensive care data. This dataset aims to facilitate generalizability studies of machine learning models, as well as statistical studies of clinical practices in the intensive care units.
This dataset comprises fractured and non-fractured X-ray images covering all anatomical body regions, including lower limb, upper limb, lumbar, hips, knees, etc. The dataset is categorized into train, test, and validation folders, each containing fractured and non-fractured radiographic images.
This dataset consists of both fractured and non-fractured X-ray images encompassing various anatomical regions of the body, such as the lower limb, upper limb, lumbar region, hips, knees, and more. It is organized into three main folders: train, test, and validation, each containing both fractured and non-fractured radiographic images. You can freely access the dataset via the following link: https://www.kaggle.com/datasets/bmadushanirodrigo/fracture-multi-region-x-ray-data/data
Heel Bone X-Ray Dataset consists of 3,956 X-ray images of the foot, primarily focused on detecting and classifying heel bone diseases. The images were obtained from Kirkuk General Hospital in Digital Imaging and Communications in Medicine (DICOM) format and converted to JPG format using the MicroDicom tool.
Medical report generation (MRG), which aims to automatically generate a textual description of a specific medical image (e.g., a chest X-ray), has recently received increasing research interest. Building on the success of image captioning, MRG has become achievable. However, generating language-specific radiology reports poses a challenge for data-driven models due to their reliance on paired image-report chest X-ray datasets, which are labor-intensive, time-consuming, and costly. In this paper, we introduce a chest X-ray benchmark dataset, namely CASIA-CXR, consisting of high-resolution chest radiographs accompanied by narrative reports originally written in French. To the best of our knowledge, this is the first public chest radiograph dataset with medical reports in this particular language. Importantly, we propose a simple yet effective multimodal encoder-decoder contextually-guided framework for medical report generation in French. We validated our framework through intra-language
SimNICT is the first dataset for training universal non-ideal measurement CT (NICT) enhancement models.
Saarbruecken Voice Database contains voice and EGG recordings of patients diagnosed with voice disorder, as well as healthy persons.
Sakha-TB is a de-identified image dataset of frontal chest X-rays (CXR), collected through collaboration with several medical institutions in the Republic of Sakha (Yakutia, Russia). The set contains 400 normal X-rays and 400 X-rays with manifestations of pulmonary tuberculosis, balanced to some extent by age and gender, in 16-bit and 8-bit lossless PNG format, converted directly from DICOM files without any changes.
This collection consists of DICOM images and DICOM Segmentation Objects (DSOs) for 197 patients with Colorectal Liver Metastases (CRLM). The collection consists of a large, single-institution consecutive series of patients that underwent resection of CRLM and matched preoperative computed tomography (CT) scans for quantitative image analysis. Inclusion criteria were (a) pathologically confirmed resected CRLM, (b) available data from pathologic analysis of the underlying non-tumoral liver parenchyma and hepatic tumor, (c) available preoperative conventional portal venous contrast-enhanced multi-detector computed tomography (MDCT) performed within 6 weeks of hepatic resection. Patients with 90-day mortality or that had less than 24 months of follow-up were excluded. Additionally, because pathologic and radiographic alterations of the non-tumoral liver parenchyma caused by hepatic artery infusion (HAI) of chemotherapy are not well described, any patient who received preoperative HAI was e
Cataract is the leading cause of blindness worldwide, most affecting life in low- and middle-income countries (LMICs). The mainly used, most appropriate, and most cost-effective cataract surgical technique for LMICs is small incision cataract surgery (SICS). While algorithms have been developed for automated video analysis of surgical performance parameters for the cataract surgical technique predominantly used in high-income settings, so far there were no datasets nor algorithms for SICS available. This MICCAI challenge introduces the first SICS video dataset and offers teams the opportunity to evaluate the effectiveness of their phase recognition algorithms. The dataset of 155 patients was recruited at Sankara Eye Hospital in India.