Datasets

395 machine learning datasets

395 dataset results

DFUC2021 (Diabetic Foot Ulcers 2021)

The Diabetic Foot Ulcers dataset (DFUC2021) is a dataset for analysis of pathology, focusing on infection and ischaemia. The final release of DFUC2021 consists of 15,683 DFU patches, with 5,955 training, 5,734 for testing and 3,994 unlabeled DFU patches. The ground truth labels are four classes, i.e. control, infection, ischaemia and both conditions.

7 papers0 benchmarksImages, Medical

KUMC

The KUMC dataset for polyp detection and classification was collected from the University of Kansas Medical Center. It contains 80 colonoscopy video sequences which are manually labeled with bounding boxes as well as the polyp classes for the entire dataset.

7 papers0 benchmarksImages, Medical, Videos

KiTS19 (The 2019 Kidney and Kidney Tumor Segmentation Challenge)

The 2021 Kidney and Kidney Tumor Segmentation challenge (abbreviated KiTS21) is a competition in which teams compete to develop the best system for automatic semantic segmentation of renal tumors and surrounding anatomy.

7 papers2 benchmarks3D, Medical

SLAKE-English

English subset of the SLAKE dataset, comprising 642 images and more than 7,000 question–answer pairs.

7 papers0 benchmarksImages, Medical, Texts

BCI Competition IV: ECoG to Finger Movements

Prediction of Finger Flexion IV Brain-Computer Interface Data Competition The goal of this dataset is to predict the flexion of individual fingers from signals recorded from the surface of the brain (electrocorticography (ECoG)). This data set contains brain signals from three subjects, as well as the time courses of the flexion of each of five fingers. The task in this competition is to use the provided flexion information in order to predict finger flexion for a provided test set. The performance of the classifier will be evaluated by calculating the average correlation coefficient r between actual and predicted finger flexion.

7 papers1 benchmarksBiomedical, Medical, Time series

PTB-XL

Electrocardiography (ECG) is a key diagnostic tool to assess the cardiac condition of a patient. Automatic ECG interpretation algorithms as diagnosis support systems promise large reliefs for the medical personnel - only on the basis of the number of ECGs that are routinely taken. However, the development of such algorithms requires large training datasets and clear benchmark procedures. In our opinion, both aspects are not covered satisfactorily by existing freely accessible ECG datasets.

7 papers8 benchmarksMedical

CheXmask

The CheXmask Database presents a comprehensive, uniformly annotated collection of chest radiographs, constructed from five public databases: ChestX-ray8, Chexpert, MIMIC-CXR-JPG, Padchest and VinDr-CXR. The database aggregates 657,566 anatomical segmentation masks derived from images which have been processed using the HybridGNet model to ensure consistent, high-quality segmentation. To confirm the quality of the segmentations, we include in this database individual Reverse Classification Accuracy (RCA) scores for each of the segmentation masks. This dataset is intended to catalyze further innovation and refinement in the field of semantic chest X-ray analysis, offering a significant resource for researchers in the medical imaging domain.

7 papers0 benchmarksBiomedical, Images, Medical

CHB-MIT (CHB-MIT Scalp EEG)

The CHB-MIT dataset is a dataset of EEG recordings from pediatric subjects with intractable seizures. Subjects were monitored for up to several days following withdrawal of anti-seizure mediation in order to characterize their seizures and assess their candidacy for surgical intervention. The dataset contains 23 patients divided among 24 cases (a patient has 2 recordings, 1.5 years apart). The dataset consists of 969 Hours of scalp EEG recordings with 173 seizures. There exist various types of seizures in the dataset (clonic, atonic, tonic). The diversity of patients (Male, Female, 10-22 years old) and different types of seizures contained in the datasets are ideal for assessing the performance of automatic seizure detection methods in realistic settings.

6 papers1 benchmarksAudio, EEG, Medical

ISIC 2018 Task 2

The ISIC 2018 dataset was published by the International Skin Imaging Collaboration (ISIC) as a large-scale dataset of dermoscopy images. The Task 2 dataset is the challenge on lesion attribute detection. It includes 2594 images. The task is to detect the following dermoscopic attributes: pigment network, negative network, streaks, mila-like cysts and globules (including dots).

6 papers0 benchmarksImages, Medical

3DSeg-8

The 3DSeg-8 is a collection of several publicly available 3D segmentation datasets from different medical imaging modalities, e.g. magnetic resonance imaging (MRI) and computed tomography (CT), with various scan regions, target organs and pathologies.

6 papers0 benchmarksMedical

GBCU (Gallbladder Cancer Ultrasound Dataset)

GBCU is the first public dataset for Gallbladder Cancer identification from Ultrasound images. GBCU contains a total of 1255 (432 normal, 558 benign, and 265 malignant) annotated abdominal Ultrasound images collected from 218 patients. Of the 218 patients, 71, 100, and 47 were from the normal, benign, and malignant classes, respectively. The sizes of the training and testing sets are 1133 and 122, respectively. To ensure generalization to unseen patients, all images of any particular patient were either in the train or the test split. We acquired data samples from patients referred to PGIMER, Chandigarh (a referral hospital in Northern India) for abdominal ultrasound examinations of suspected Gallbladder pathologies. The study was approved by the Ethics Committee of PGIMER, Chandigarh. We obtained informed written consent from the patients at the time of recruitment, and protect their privacy by fully anonymizing the data. Grayscale B-mode static images, including both sagittal and axi

6 papers1 benchmarksImages, Medical

FrenchMedMCQA (FrenchMedMCQA: A French Multiple-Choice Question Answering Dataset for Medical domain)

This paper introduces FrenchMedMCQA, the first publicly available Multiple-Choice Question Answering (MCQA) dataset in French for medical domain. It is composed of 3,105 questions taken from real exams of the French medical specialization diploma in pharmacy, mixing single and multiple answers. Each instance of the dataset contains an identifier, a question, five possible answers and their manual correction(s). We also propose first baseline models to automatically process this MCQA task in order to report on the current performances and to highlight the difficulty of the task. A detailed analysis of the results showed that it is necessary to have representations adapted to the medical domain or to the MCQA task: in our case, English specialized models yielded better results than generic French ones, even though FrenchMedMCQA is in French. Corpus, models and tools are available online.

6 papers2 benchmarksBiomedical, Medical, Texts

DRTiD

DRTiD is a benchmark dataset for DR grading, consisting of 3,100 two-field fundus images.

6 papers0 benchmarksImages, Medical

CODE-15%

A dataset of 12-lead ECGs with annotations. The dataset contains 345 779 exams from 233 770 patients. It was obtained through stratified sampling from the CODE dataset ( 15% of the patients). The data was collected by the Telehealth Network of Minas Gerais in the period between 2010 and 2016.

6 papers3 benchmarksMedical

RAD-ChestCT Dataset

The RAD-ChestCT dataset is a large medical imaging dataset developed by Duke MD/PhD Rachel Draelos during her Computer Science PhD supervised by Lawrence Carin. The full dataset includes 35,747 chest CT scans from 19,661 adult patients. The public Zenodo repository contains an initial release of 3,630 chest CT scans, approximately 10% of the dataset. This dataset is of significant interest to the machine learning and medical imaging research communities.

6 papers0 benchmarks3D, Images, Medical

ARCADE (Automatic Region-based Coronary Artery Disease diagnostics using x-ray angiography imagEs Dataset)

ARCADE: Automatic Region-based Coronary Artery Disease diagnostics using x-ray angiography imagEs Dataset Phase 2 consist of two folders with 300 images in each of them as well as annotations.

6 papers0 benchmarksImages, Medical

Thyroid (Thyroid Disease)

Thyroid is a dataset for detection of thyroid diseases, in which patients diagnosed with hypothyroid or subnormal are anomalies against normal patients. It contains 2800 training data instance and 972 test instances, with 29 or so attributes.

5 papers3 benchmarksImages, Medical

RITE (Retinal Images vessel Tree Extraction)

The RITE (Retinal Images vessel Tree Extraction) is a database that enables comparative studies on segmentation or classification of arteries and veins on retinal fundus images, which is established based on the public available DRIVE database (Digital Retinal Images for Vessel Extraction).

5 papers3 benchmarksImages, Medical

Raider

The Raider dataset collects fMRI recordings of 1000 voxels from the ventral temporal cortex, for 10 healthy adult participants passively watching the full-length movie “Raiders of the Lost Ark”.

5 papers0 benchmarksMedical, fMRI

ISIC 2017 Task 3

The ISIC 2017 dataset was published by the International Skin Imaging Collaboration (ISIC) as a large-scale dataset of dermoscopy images. The Task 3 challenge dataset for lesion classification contains 2,000 images for training including 374 melanoma, 254 seborrheic keratosis and the remainder as benign nevi (1372).

5 papers0 benchmarksImages, Medical

PreviousPage 8 of 20Next