TasksSotADatasetsPapersMethodsSubmitAbout
Papers With Code 2

A community resource for machine learning research: papers, code, benchmarks, and state-of-the-art results.

Explore

Notable BenchmarksAll SotADatasetsPapersMethods

Community

Submit ResultsAbout

Data sourced from the PWC Archive (CC-BY-SA 4.0). Built by the community, for the community.

Datasets

395 machine learning datasets

Filter by Modality

  • Images3,275
  • Texts3,148
  • Videos1,019
  • Audio486
  • Medical395
  • 3D383
  • Time series298
  • Graphs285
  • Tabular271
  • Speech199
  • RGB-D192
  • Environment148
  • Point cloud135
  • Biomedical123
  • LiDAR95
  • RGB Video87
  • Tracking78
  • Biology71
  • Actions68
  • 3d meshes65
  • Tables52
  • Music48
  • EEG45
  • Hyperspectral images45
  • Stereo44
  • MRI39
  • Physics32
  • Interactive29
  • Dialog25
  • Midi22
  • 6D17
  • Replay data11
  • Financial10
  • Ranking10
  • Cad9
  • fMRI7
  • Parallel6
  • Lyrics2
  • PSG2
Clear filter

395 dataset results

LIMUC (Labeled Images for Ulcerative Colitis)

The LIMUC dataset is the largest publicly available labeled ulcerative colitis dataset that compromises 11276 images from 564 patients and 1043 colonoscopy procedures. Three experienced gastroenterologists were involved in the annotation process, and all images are labeled according to the Mayo endoscopic score (MES).

4 papers1 benchmarksBiomedical, Images, Medical

OVQA

OVQA contains 19,020 medical visual question and answer pairs generated from 2,001 medical images collected from 2,212 EMRs in Orthopedics.

4 papers0 benchmarksImages, Medical, Texts

Multi-Label Classification Dataset Repository

For each dataset we provide a short description as well as some characterization metrics. It includes the number of instances (m), number of attributes (d), number of labels (q), cardinality (Card), density (Dens), diversity (Div), average Imbalance Ratio per label (avgIR), ratio of unconditionally dependent label pairs by chi-square test (rDep) and complexity, defined as m × q × d as in [Read 2010]. Cardinality measures the average number of labels associated with each instance, and density is defined as cardinality divided by the number of labels. Diversity represents the percentage of labelsets present in the dataset divided by the number of possible labelsets. The avgIR measures the average degree of imbalance of all labels, the greater avgIR, the greater the imbalance of the dataset. Finally, rDep measures the proportion of pairs of labels that are dependent at 99% confidence. A broader description of all the characterization metrics and the used partition methods are described in

4 papers0 benchmarksAudio, Biology, Images, Medical, Music, Texts, Videos

VietMed (VietMed: A Dataset and Benchmark for Automatic Speech Recognition of Vietnamese in the Medical Domain)

We introduced a Vietnamese speech recognition dataset in the medical domain comprising 16h of labeled medical speech, 1000h of unlabeled medical speech and 1200h of unlabeled general-domain speech. To our best knowledge, VietMed is by far the world’s largest public medical speech recognition dataset in 7 aspects: total duration, number of speakers, diseases, recording conditions, speaker roles, unique medical terms and accents. VietMed is also by far the largest public Vietnamese speech dataset in terms of total duration. Additionally, we are the first to present a medical ASR dataset covering all ICD-10 disease groups and all accents within a country.

4 papers3 benchmarksAudio, Medical, Speech, Texts

AutoPET

A whole-body FDG-PET/CT dataset with manually annotated tumor lesions (FDG-PET-CT-Lesions) 1,014 studies (900 patients)

4 papers0 benchmarksMedical

QT-NSTDB (QT database + MIT-BIH Noise Stress Test Database (NSTDB))

We designed a baseline wander (BLW) removal benchmark to evaluate various methods using a consistent test set and uniform conditions. Details of the data preprocessing pipeline are heavily based on papers [1]. All 105 signals from the QT Database were resampled from 250 Hz to 360 Hz to align with the NSTDB sampling frequency. Heartbeats were extracted using the annotations provided by specialists. During this process, we identified a small number of incorrect annotations for beat start/end points, leading to cases where two consecutive beats were erroneously merged into one. To address this issue, we discarded beats exceeding 512 samples (1422.22 ms) in length. We designated heartbeats from 14 signals, accounting for 13% of the total signals, as the test set. These signals were selected to include two signals from each of the seven datasets comprising the QT Database, ensuring a diverse representation of pathologies in the test set. This setup provides a more robust evaluation of the g

4 papers8 benchmarksMedical

ORVS (Online Retinal image for Vessel Segmentation (ORVS))

The ORVS dataset has been newly established as a collaboration between the computer science and visual-science departments at the University of Calgary.

3 papers0 benchmarksImages, Medical

OCTAGON (OCTAGON Dataset)

The OCTAGON dataset is a set of Angiography by Octical Coherence Tomography images (OCT-A) used to the segmentation of the Foveal Avascular Zone (FAZ). The dataset includes 144 healthy OCT-A images and 69 diabetic OCT-A images, divided into four groups, each one with 36 and about 17 OCT-A images, respectively. These groups are: 3x3 superficial, 3x3 deep, 6x6 superficial and 6x6 deep, where 3x3 and 6x6 are the zoom of the image and superficial/deep are the depth level of the extracted image. The healthy dataset includes OCT-A images from people classified in 6 age ranges: 10-19 years, 20-29 years, 30-39 years, 40-49 years, 50-59 years and 60-69 years. Each age range includes 3 different patients with information of left and right eyes for each one. Finally, for each eye, there are four different images: one 3x3 superficial image, one 3x3 deep image, one 6x6 superficial image and one 6x6 deep image. Each image have two manual labelled of expert clinicians of the FAZ and their quantificat

3 papers0 benchmarksImages, Medical

LKS (Liver Kidney Stomach)

LKS is a dataset of 684 Liver-Kidney-Stomach immunofluorescence whole slide images (WSIs) used in the investigation of autoimmune liver disease.

3 papers0 benchmarksMedical

Prostate MRI Segmentation Dataset

This prostate MRI segmentation dataset is collected from six different data sources.

3 papers0 benchmarksMedical

US-4

The US-4 is a dataset of Ultrasound (US) images. It is a video-based image dataset that contains over 23,000 high-resolution images from four US video sub-datasets, where two sub-datasets are newly collected by experienced doctors for this dataset.

3 papers0 benchmarksImages, Medical

NinaPro DB2 (DB2 - 40 Intact Subjects - Delsys Trigno electrodes)

The second Ninapro database includes 40 intact subjects and it is thoroughly described in the paper: "Manfredo Atzori, Arjan Gijsberts, Claudio Castellini, Barbara Caputo, Anne-Gabrielle Mittaz Hager, Simone Elsig, Giorgio Giatsidis, Franco Bassetto & Henning Müller. Electromyography data for non-invasive naturally-controlled robotic hand prostheses. Scientific Data, 2014" (http://www.nature.com/articles/sdata201453). Please, cite this paper for any work related to the Ninapro database. Please, use also the paper by Gijsberts et al., 2014 (http://publications.hevs.ch/index.php/publications/show/1629) for more information about the database.

3 papers0 benchmarksBiomedical, Medical, Time series

IBC (Individual Brain Charting)

The Individual Brain Charting (IBC) project aims at providing a new generation of functional-brain atlases. To map cognitive mechanisms in a fine scale, task-fMRI data at high-spatial-resolution are being acquired on a fixed cohort of 12 participants, while performing many different tasks. These data—free from both inter-subject and inter-site variability—are publicly available as means to support the investigation of functional segregation and connectivity as well as individual variability with a view to establishing a better link between brain systems and behavior.

3 papers0 benchmarksMRI, Medical, fMRI

MIT-BIH AFDB (MIT-BIH Atrial Fibrilation Database)

This database includes 25 long-term ECG recordings of human subjects with atrial fibrillation (mostly paroxysmal).

3 papers0 benchmarksMedical, Time series

PWDB (Pulse Wave Database)

Overview This database of simulated arterial pulse waves is designed to be representative of a sample of pulse waves measured from healthy adults. It contains pulse waves for 4,374 virtual subjects, aged from 25-75 years old (in 10 year increments). The database contains a baseline set of pulse waves for each of the six age groups, created using cardiovascular properties (such as heart rate and arterial stiffness) which are representative of healthy subjects at each age group. It also contains 728 further virtual subjects at each age group, in which each of the cardiovascular properties are varied within normal ranges. This allows for extensive in silico analyses of haemodynamics and the performance of pulse wave analysis algorithms.

3 papers0 benchmarksBiology, Biomedical, Medical, Time series

Medico automatic polyp segmentation challenge (dataset)

The “Medico automatic polyp segmentation challenge” aims to develop computer-aided diagnosis systems for automatic polyp segmentation to detect all types of polyps (for example, irregular polyp, smaller or flat polyps) with high efficiency and accuracy. The main goal of the challenge is to benchmark semantic segmentation algorithms on a publicly available dataset, emphasizing robustness, speed, and generalization.

3 papers5 benchmarksBiomedical, Images, Medical

ShARe/CLEF 2014: Task 2 Disorders

3 papers1 benchmarksMedical, Texts

KvasirCapsule-SEG

The dataset contains a Video capsule endoscopy dataset for polyp segmentation.

3 papers2 benchmarksBiomedical, Cad, Images, Medical

FetReg

Fetoscopic Placental Vessel Segmentation and Registration (FetReg) is a large-scale multi-centre dataset for the development of generalized and robust semantic segmentation and video mosaicking algorithms for the fetal environment with a focus on creating drift-free mosaics from long duration fetoscopy videos.

3 papers0 benchmarksImages, Medical

AxonEM

The AxonEM dataset consists of two 30x30x30 um^3 EM image volumes from the human and mouse cortex, respectively. It is used for 3D axon instance segmentation of brain cortical regions. The authors proofread over 18,000 axon instances to provide dense 3D axon instance segmentation, enabling large-scale evaluation of axon reconstruction methods. In addition, the authors also densely annotate nine ground truth subvolumes for training, per each data volume.

3 papers0 benchmarksMedical
PreviousPage 10 of 20Next