TasksSotADatasetsPapersMethodsSubmitAbout
Papers With Code 2

A community resource for machine learning research: papers, code, benchmarks, and state-of-the-art results.

Explore

Notable BenchmarksAll SotADatasetsPapersMethods

Community

Submit ResultsAbout

Data sourced from the PWC Archive (CC-BY-SA 4.0). Built by the community, for the community.

Datasets

395 machine learning datasets

Filter by Modality

  • Images3,275
  • Texts3,148
  • Videos1,019
  • Audio486
  • Medical395
  • 3D383
  • Time series298
  • Graphs285
  • Tabular271
  • Speech199
  • RGB-D192
  • Environment148
  • Point cloud135
  • Biomedical123
  • LiDAR95
  • RGB Video87
  • Tracking78
  • Biology71
  • Actions68
  • 3d meshes65
  • Tables52
  • Music48
  • EEG45
  • Hyperspectral images45
  • Stereo44
  • MRI39
  • Physics32
  • Interactive29
  • Dialog25
  • Midi22
  • 6D17
  • Replay data11
  • Financial10
  • Ranking10
  • Cad9
  • fMRI7
  • Parallel6
  • Lyrics2
  • PSG2
Clear filter

395 dataset results

LUNA16

The LUNA16 (LUng Nodule Analysis) dataset is a dataset for lung segmentation. It consists of 1,186 lung nodules annotated in 888 CT scans.

99 papers0 benchmarksImages, Medical

Medical Segmentation Decathlon

The Medical Segmentation Decathlon is a collection of medical image segmentation datasets. It contains a total of 2,633 three-dimensional images collected across multiple anatomies of interest, multiple modalities and multiple sources. Specifically, it contains data for the following body organs or parts: Brain, Heart, Liver, Hippocampus, Prostate, Lung, Pancreas, Hepatic Vessel, Spleen and Colon.

97 papers2 benchmarksImages, Medical

Sleep-EDF (Sleep-EDF Expanded)

The sleep-edf database contains 197 whole-night PolySomnoGraphic sleep recordings, containing EEG, EOG, chin EMG, and event markers. Some records also contain respiration and body temperature. Corresponding hypnograms (sleep patterns) were manually scored by well-trained technicians according to the Rechtschaffen and Kales manual, and are also available.

94 papers9 benchmarksAudio, EEG, Medical

PathVQA

PathVQA consists of 32,799 open-ended questions from 4,998 pathology images where each question is manually checked to ensure correctness.

89 papers0 benchmarksImages, Medical, Texts

PPMI (Parkinson’s Progression Markers Initiative)

The Parkinson’s Progression Markers Initiative (PPMI) dataset originates from an observational clinical and longitudinal study comprising evaluations of people with Parkinson’s disease (PD), those people with high risk, and those who are healthy.

87 papers1 benchmarksImages, Medical

CAMUS (Cardiac Acquisitions for Multi-structure Ultrasound Segmentation)

This project aims to provide all the materials to the community to resolve the problem of echocardiographic image segmentation and volume estimation from 2D ultrasound sequences (both two and four-chamber views). To this aim, the following solutions were set up.

85 papers0 benchmarksMedical, Videos

PROMISE12

The PROMISE12 dataset was made available for the MICCAI 2012 prostate segmentation challenge. Magnetic Resonance (MR) images (T2-weighted) of 50 patients with various diseases were acquired at different locations with several MRI vendors and scanning protocols.

84 papers1 benchmarksImages, MRI, Medical

ChestX-ray8

ChestX-ray8 is a medical imaging dataset which comprises 108,948 frontal-view X-ray images of 32,717 (collected from the year of 1992 to 2015) unique patients with the text-mined eight common disease labels, mined from the text radiological reports via NLP techniques.

81 papers0 benchmarksImages, Medical

RadGraph (RadGraph: Extracting Clinical Entities and Relations from Radiology Reports)

RadGraph is a dataset of entities and relations in radiology reports based on our novel information extraction schema, consisting of 600 reports with 30K radiologist annotations and 221K reports with 10.5M automatically generated annotations.

78 papers0 benchmarksGraphs, Medical, Texts

BraTS 2017

The BRATS2017 dataset. It contains 285 brain tumor MRI scans, with four MRI modalities as T1, T1ce, T2, and Flair for each scan. The dataset also provides full masks for brain tumors, with labels for ED, ET, NET/NCR. The segmentation evaluation is based on three tasks: WT, TC and ET segmentation.

77 papers0 benchmarksImages, MRI, Medical

BraTS 2015

The BraTS 2015 dataset is a dataset for brain tumor image segmentation. It consists of 220 high grade gliomas (HGG) and 54 low grade gliomas (LGG) MRIs. The four MRI modalities are T1, T1c, T2, and T2FLAIR. Segmented “ground truth” is provide about four intra-tumoral classes, viz. edema, enhancing tumor, non-enhancing tumor, and necrosis.

69 papers0 benchmarksImages, MRI, Medical

UBFC-rPPG (Univ. Bourgogne Franche-Comté Remote PhotoPlethysmoGraphy)

We introduce here a new database called UBFC-rPPG (stands for Univ. Bourgogne Franche-Comté Remote PhotoPlethysmoGraphy) comprising two datasets that are focused specifically on rPPG analysis. The UBFC-RPPG database was created using a custom C++ application for video acquisition with a simple low cost webcam (Logitech C920 HD Pro) at 30fps with a resolution of 640x480 in uncompressed 8-bit RGB format. A CMS50E transmissive pulse oximeter was used to obtain the ground truth PPG data comprising the PPG waveform as well as the PPG heart rates. During the recording, the subject sits in front of the camera (about 1m away from the camera) with his/her face visible. All experiments are conducted indoors with a varying amount of sunlight and indoor illumination. The link to download the complete video dataset is available on request. A basic Matlab implementation can also be provided to read ground truth data acquired with a pulse oximeter.

69 papers20 benchmarksImages, Medical

CoNSeP (Colorectal Nuclear Segmentation and Phenotypes)

The colorectal nuclear segmentation and phenotypes (CoNSeP) dataset consists of 41 H&E stained image tiles, each of size 1,000×1,000 pixels at 40× objective magnification. The images were extracted from 16 colorectal adenocarcinoma (CRA) WSIs, each belonging to an individual patient, and scanned with an Omnyx VL120 scanner within the department of pathology at University Hospitals Coventry and Warwickshire, UK.

68 papers5 benchmarksImages, Medical

SLAKE

SLAKE is an English-Chinese bilingual dataset consisting of 642 images and 14,028 question-answer pairs for training and testing Med-VQA systems.

63 papers0 benchmarksImages, Medical, Texts

PanNuke

PanNuke is a semi automatically generated nuclei instance segmentation and classification dataset with exhaustive nuclei labels across 19 different tissue types. The dataset consists of 481 visual fields, of which 312 are randomly sampled from more than 20K whole slide images at different magnifications, from multiple data sources. In total the dataset contains 205,343 labeled nuclei, each with an instance segmentation mask.

61 papers11 benchmarksImages, Medical

CHASE_DB1

CHASE_DB1 is a dataset for retinal vessel segmentation which contains 28 color retina images with the size of 999×960 pixels which are collected from both left and right eyes of 14 school children. Each image is annotated by two independent human experts.

59 papers18 benchmarksImages, Medical

PMC-VQA

PMC-VQA is a large-scale medical visual question-answering dataset that contains 227k VQA pairs of 149k images that cover various modalities or diseases. The question-answer pairs are generated from PMC-OA.

55 papers3 benchmarksImages, Medical, Texts

CVC-ClinicDB

CVC-ClinicDB is an open-access dataset of 612 images with a resolution of 384×288 from 31 colonoscopy sequences.It is used for medical image segmentation, in particular polyp detection in colonoscopy videos.

54 papers6 benchmarksImages, Medical

HRF (High-Resolution Fundus)

The HRF dataset is a dataset for retinal vessel segmentation which comprises 45 images and is organized as 15 subsets. Each subset contains one healthy fundus image, one image of patient with diabetic retinopathy and one glaucoma image. The image sizes are 3,304 x 2,336, with a training/testing image split of 22/23.

53 papers21 benchmarksImages, Medical

MedMentions

MedMentions is a new manually annotated resource for the recognition of biomedical concepts. What distinguishes MedMentions from other annotated biomedical corpora is its size (over 4,000 abstracts and over 350,000 linked mentions), as well as the size of the concept ontology (over 3 million concepts from UMLS 2017) and its broad coverage of biomedical disciplines.

48 papers2 benchmarksMedical, Texts
PreviousPage 2 of 20Next