TasksSotADatasetsPapersMethodsSubmitAbout
Papers With Code 2

A community resource for machine learning research: papers, code, benchmarks, and state-of-the-art results.

Explore

Notable BenchmarksAll SotADatasetsPapersMethods

Community

Submit ResultsAbout

Data sourced from the PWC Archive (CC-BY-SA 4.0). Built by the community, for the community.

Datasets

123 machine learning datasets

Filter by Modality

  • Images3,275
  • Texts3,148
  • Videos1,019
  • Audio486
  • Medical395
  • 3D383
  • Time series298
  • Graphs285
  • Tabular271
  • Speech199
  • RGB-D192
  • Environment148
  • Point cloud135
  • Biomedical123
  • LiDAR95
  • RGB Video87
  • Tracking78
  • Biology71
  • Actions68
  • 3d meshes65
  • Tables52
  • Music48
  • EEG45
  • Hyperspectral images45
  • Stereo44
  • MRI39
  • Physics32
  • Interactive29
  • Dialog25
  • Midi22
  • 6D17
  • Replay data11
  • Financial10
  • Ranking10
  • Cad9
  • fMRI7
  • Parallel6
  • Lyrics2
  • PSG2
Clear filter

123 dataset results

ECG in High Intensity Exercise Dataset

The data presented here was extracted from a larger dataset collected through a collaboration between the Embedded Systems Laboratory (ESL) of the Swiss Federal Institute of Technology in Lausanne (EPFL), Switzerland and the Institute of Sports Sciences of the University of Lausanne (ISSUL). In this dataset, we report the extracted segments used for an analysis of R peak detection algorithms during high intensity exercise.

1 papers0 benchmarksBiomedical, Time series

TUAC (Temple University Artifact Corpus)

A new subset of the popular open source electroencephalogram (EEG) corpus – TUH EEG: - The Temple University Artifact Corpus (TUAR) consists of high yield artifact files annotated using a five-way classification system: 1. Chewing (CHEW): An artifact resulting from the tensing and relaxing of the jaw muscles. 2. Electrode (ELEC): An artifact that encompasses various electrode related phenomena. 3. Eye Movement (EYEM): A spike-like waveform created during patient eye movement. 4. Muscle (MUSC): A common artifact with high frequency, sharp waves corresponding to patient movement. 5. Shiver (SHIV): A specific and sustained sharp wave artifact that occurs when a patient shivers. - EEG artifacts are waveforms that are not of cerebral origin and may have been affected by several external and physiological factors. - These artifacts cause false alarms in seizure prediction machine learning systems. This corpus was developed to support research and evaluation of artifact suppression technology

1 papers0 benchmarksBiomedical, EEG

Extended heartSeg

The dataset X of this work is an extension of the heartSeg dataset. Each sample x ∈ X is an RGB image capturing the heart region of Medaka (Oryzias latipes) hatchlings from a constant ventral view. Since the body of Medaka is see-through, noninvasive studies regarding the internal organs and the whole circulatory system are practicable. A Medaka’s heart contains three parts: the atrium, the ventricle, and the bulbus. The atrium receives deoxygenated blood from the circulatory system and delivers it to the ventricle, which forwards it into the bulbus. The bulbus is the heart’s exit chamber and provides the gill arches with a constant blood flow. The blood flow through these three chambers was captured in 63 short recordings (around 11 seconds with 24 frames per second each) in total, from which the single image samples x ∈ X are extracted. The dataset is split into training and test data following the heartSeg dataset with ntrain = 565 samples in the training set Xtrain and ntest = 165

1 papers1 benchmarksBiology, Biomedical, Medical, Videos

Full-Spectral Autofluorescence Lifetime Microscopic Images

The dataset contains full-spectral autofluorescence lifetime microscopic images (FS-FLIM) acquired on unstained ex-vivo human lung tissue, where 100 4D hypercubes of 256x256 (spatial resolution) x 32 (time bins) x 512 (spectral channels from 500nm to 780nm). This dataset associates with our paper "Deep Learning-Assisted Co-registration of Full-Spectral Autofluorescence Lifetime Microscopic Images with H&E-Stained Histology Images" (https://arxiv.org/abs/2202.07755) and "Full spectrum fluorescence lifetime imaging with 0.5 nm spectral and 50 ps temporal resolution" (https://doi.org/10.1038/s41467-021-26837-0). The FS-FLIM images provide transformative insights into human lung cancer with extra-dimensional information. This will enable visual and precise detection of early lung cancer. With the methodology in our co-registration paper, FS-FLIM images can be registered with H&E-stained histology images, allowing characterisation of tumour and surrounding cells at a celluar level with abs

1 papers0 benchmarksBiomedical, Hyperspectral images

BreastDICOM4 ([MIMBCD-UI] UTA4: Medical Imaging DICOM Files Dataset)

Several datasets are fostering innovation in higher-level functions for everyone, everywhere. By providing this repository, we hope to encourage the research community to focus on hard problems. In this repository, we present our medical imaging DICOM files of patients from our User Tests and Analysis 4 (UTA4) study. Here, we provide a dataset of the used medical images during the UTA4 tasks. This repository and respective dataset should be paired with the dataset-uta4-rates repository dataset. Work and results are published on a top Human-Computer Interaction (HCI) conference named AVI 2020 (page). Results were analyzed and interpreted on our Statistical Analysis charts. The user tests were made in clinical institutions, where clinicians diagnose several patients for a Single-Modality vs Multi-Modality comparison. For example, in these tests, we used both prototype-single-modality and prototype-multi-modality repositories for the comparison. On the same hand, the hereby dataset repres

1 papers2 benchmarksBiomedical, MRI, Medical

BreastRates4 ([MIMBCD-UI] UTA4: Rates Dataset)

Several datasets are fostering innovation in higher-level functions for everyone, everywhere. By providing this repository, we hope to encourage the research community to focus on hard problems. In this repository, we present our severity rates (BIRADS) of clinicians while diagnosing several patients from our User Tests and Analysis 4 (UTA4) study. Here, we provide a dataset for the measurements of severity rates (BIRADS) concerning the patient diagnostic. Work and results are published on a top Human-Computer Interaction (HCI) conference named AVI 2020 (page). Results were analyzed and interpreted from our Statistical Analysis charts. The user tests were made in clinical institutions, where clinicians diagnose several patients for a Single-Modality vs Multi-Modality comparison. For example, in these tests, we used both prototype-single-modality and prototype-multi-modality repositories for the comparison. On the same hand, the hereby dataset represents the pieces of information of bot

1 papers0 benchmarksBiomedical, Images, Medical, Tabular

CoCaHis (Colon Cancer Histology Dataset)

Highlights

1 papers0 benchmarksBiomedical, Images, Medical

Replication Data for: Do uHear? Validation of uHear App for Preliminary Screening of Hearing Ability in Soundscape Studies

Audiogram data based on a "gold standard" audiometer and the uHear iOS application of 163 participants

1 papers0 benchmarksBiomedical

HuTu 80 (HuTu 80 cell populations)

The image set contains 180 high-resolution color microscopic images of human duodenum adenocarcinoma HuTu 80 cell populations obtained in an in vitro scratch assay (for the details of the experimental protocol, we refer to (Liang et al., 2007)). Briefly, cells were seeded in 12-well culture plates ($20 \times 10^3$ cells per well) and grown to form a monolayer with 85\% or more confluency. Then the cell monolayer was scraped in a straight line using a pipette tip ($200 \mu L$). The debris was removed by washing with a growth medium and the medium in wells was replaced. The scratch areas were marked to obtain the same field during the image acquisition. Images of the scratches were captured immediately following the scratch formation, as well as after 24, 48 and 72 h of cultivation.

1 papers2 benchmarksBiomedical, Images

Harmonized US National Health and Nutrition Examination Survey (NHANES) 1988-2018

The National Health and Nutrition Examination Survey (NHANES) provides data on the health and environmental exposure of the non-institutionalized US population. Such data have considerable potential to understand how the environment and behaviors impact human health. These data are also currently leveraged to answer public health questions such as prevalence of disease. However, these data need to first be processed before new insights can be derived through large-scale analyses. NHANES data are stored across hundreds of files with multiple inconsistencies. Correcting such inconsistencies takes systematic cross examination and considerable efforts but is required for accurately and reproducibly characterizing the associations between the exposome and diseases (e.g., cancer mortality outcomes). Thus, we developed a set of curated and unified datasets and accompanied code by merging 614 separate files and harmonizing unrestricted data across NHANES III (1988-1994) and Continuous (1999-20

1 papers0 benchmarksBiomedical, Environment, Tabular

Fraunhofer Portugal AICOS EDoF Dataset

The Fraunhofer Portugal AICOS EDoF Dataset was produced within the TAMI project and is composed of images of microscopic fields of view (FOV) of Liquid-based Cervical Cytology (LBC) samples. A total of 15 LBC samples were supplied by the Pathology Services from Hospital Fernando Fonseca and the Portuguese Oncology Institute of Porto. For each LBC sample, a set of images were obtained using a version of µSmartScope [1,2] prototype adapted to the cervical cytology use case [3,4].

1 papers0 benchmarksBiomedical, Images, Medical

MuCeD

MuCeD, a dataset that is carefully curated and validated by expert pathologists from the All India Institute of Medical Science (AIIMS), Delhi, India. The H&E-stained histopathology images of the human duodenum in MuCeD are captured through an Olympus BX50 microscope at 20x zoom using a DP26 camera with each image being 1920x2148 in dimension. The dataset has 55 images, with bounding boxes for 2,090 IELs and 6,518 ENs annotated using the LabelMe software and are further validated by multiple pathologists. These cells are selected from the epithelial area -- a region of interest that has been explicitly segmented by experts. The epithelial area denotes the area of continuous villi and is used for cell detection, whereas rest of the area is masked out. Further, each image is sliced into 9 subimages and each subimage is re-scaled to 640x640, before it is given as input to object detection models. We divide 55 images into five folds of 11 images each and report 5-fold crossvalidation num

1 papers0 benchmarksBiomedical, Images, Medical

ACCT Data Repository (ACCT is a fast and accessible automatic cell counting tool using machine learning for 2D image segmentation)

This dataset is a collection of fluorescent images from mice in order to test an automatic cell counting tool that we developed. 62 images viewed from 2 or 3 different fields of views are shown. In brief, the dataset was derived from brain sections of a model for HIV-induced brain injury (HIVgp120tg), which expresses soluble gp120 envelope protein in astrocytes under the control of a modified GFAP promoter. The mice were in a mixed C57BL/6.129/SJL genetic background, and two genotypes of 9 month old male mice were selected: wild type controls (Resting, n = 3) and transgenic littermates (HIVgp120tg, Activated, n = 3). No randomization was performed. HIVgp120tg mice show among other hallmarks of human HIV neuropathology an increase in microglia numbers which indicates activation of the cells compared to non-transgenic littermate controls.

1 papers0 benchmarksBiology, Biomedical, Images, Medical

The EMBO SourceData-NLP dataset (The SourceData-NLP dataset: integrating curation into scientific publishing for training large language models)

We present the SourceData-NLP dataset produced through the routine curation of papers during the publication process. A unique feature of this dataset is its emphasis on the annotation of bioentities in figure legends. We annotate eight classes of biomedical entities (small molecules, gene products, subcellular components, cell lines, cell types, tissues, organisms, and diseases), their role in the experimental design, and the nature of the experimental method as an additional class. SourceData-NLP contains more than 620,000 annotated biomedical entities, curated from 18,689 figures in 3,223 papers in molecular and cell biology. We illustrate the dataset's usefulness by assessing BioLinkBERT and PubmedBERT, two transformers-based models, fine-tuned on the SourceData-NLP dataset for NER. We also introduce a novel context-dependent semantic task that infers whether an entity is the target of a controlled intervention or the object of measurement.

1 papers4 benchmarksBiology, Biomedical, Texts

A Dataset for Relation Extraction of Natural-Products (A curated evaluation dataset for end-to-end Relation Extraction of relationships between organisms and natural-products)

A curated evaluation dataset for end-to-end Relation Extraction of relationships between organisms and natural-products.

1 papers0 benchmarksBiomedical, Texts

Radio-Freqency Ultrasound volume dataset for pre-clinical liver tumors

A total of 227 cross sectional images (20 x 54 mm with a resolution of 289 x 648 pixels) of hind-leg xenograft tumors from 29 mice were obtained with 1mm step-wise movement of the array mounted on a manual positioning device. The whole tumor volume was acquired using a diagnostic ultrasound system with a 10 MHz linear transducer and 50 MHz sampling.

1 papers0 benchmarks3D, Biomedical, Images, Medical

SR-CACO-2

Confocal fluorescence microscopy is one of the most accessible and widely used imaging techniques for the study of biological processes at the cellular and subcellular levels. Scanning confocal microscopy allows the capture of high-quality images from thick three-dimensional (3D) samples, yet suffers from well-known limitations such as photobleaching and phototoxicity of specimens caused by intense light exposure, which limits its use in some applications, especially for living cells. Cellular damage can be alleviated by changing imaging parameters to reduce light exposure, often at the expense of image quality. Machine/deep learning methods for single-image super-resolution (SISR) can be applied to restore image quality by upscaling lower-resolution (LR) images to produce high-resolution images (HR). These SISR methods have been successfully applied to photo-realistic images due partly to the abundance of publicly available data. In contrast, the lack of publicly available data partl

1 papers0 benchmarksBiomedical, Images

Microscopy Image Dataset of Pulmonary Vascular Changes (Microscopy Image Dataset for Deep Learning-Based Quantitative Assessment of Pulmonary Vascular Changes)

Pulmonary hypertension (PH) is a syndrome complex that accompanies a number of diseases of different etiologies, associated with basic mechanisms of structural and functional changes of the pulmonary circulation vessels and revealed pressure increasing in the pulmonary artery. The structural changes in the pulmonary circulation vessels are the main limiting factor determining the prognosis of patients with PH. Thickening and irreversible deposition of collagen in the pulmonary artery branches walls leads to rapid disease progression and a therapy effectiveness decreasing. In this regard, histological examination of the pulmonary circulation vessels is critical both in preclinical studies and clinical practice. However, measurements of quantitative parameters such as the average vessel outer diameter, the vessel walls area, and the hypertrophy index claimed significant time investment and the requirement for specialist training to analyze micrographs. A dataset of pulmonary circulation

1 papers0 benchmarksBiomedical, Images

uBench (MicroBench)

Microscopy is a cornerstone of biomedical research, enabling detailed study of biological structures at multiple scales. Advances in cryo-electron microscopy, high-throughput fluorescence microscopy, and whole-slide imaging allow the rapid generation of terabytes of image data, which are essential for fields such as cell biology, biomedical research, and pathology. These data span multiple scales, allowing researchers to examine atomic/molecular, subcellular/cellular, and cell/tissue-level structures with high precision. A crucial first step in microscopy analysis is interpreting and reasoning about the significance of image findings. This requires domain expertise and comprehensive knowledge of biology, normal/abnormal states, and the capabilities and limitations of microscopy techniques. Vision-language models (VLMs) offer a promising solution for large-scale biological image analysis, enhancing researchers’ efficiency, identifying new image biomarkers, and accelerating hypothesis ge

1 papers0 benchmarksBiology, Biomedical, Images, Texts

MedTrinity-25M

Click to add a brief description of the dataset (Markdown and LaTeX enabled).

1 papers0 benchmarksBiomedical, Images, MRI, Medical, Texts
PreviousPage 5 of 7Next