TasksSotADatasetsPapersMethodsSubmitAbout
Papers With Code 2

A community resource for machine learning research: papers, code, benchmarks, and state-of-the-art results.

Explore

Notable BenchmarksAll SotADatasetsPapersMethods

Community

Submit ResultsAbout

Data sourced from the PWC Archive (CC-BY-SA 4.0). Built by the community, for the community.

Datasets

395 machine learning datasets

Filter by Modality

  • Images3,275
  • Texts3,148
  • Videos1,019
  • Audio486
  • Medical395
  • 3D383
  • Time series298
  • Graphs285
  • Tabular271
  • Speech199
  • RGB-D192
  • Environment148
  • Point cloud135
  • Biomedical123
  • LiDAR95
  • RGB Video87
  • Tracking78
  • Biology71
  • Actions68
  • 3d meshes65
  • Tables52
  • Music48
  • EEG45
  • Hyperspectral images45
  • Stereo44
  • MRI39
  • Physics32
  • Interactive29
  • Dialog25
  • Midi22
  • 6D17
  • Replay data11
  • Financial10
  • Ranking10
  • Cad9
  • fMRI7
  • Parallel6
  • Lyrics2
  • PSG2
Clear filter

395 dataset results

ACCT Data Repository (ACCT is a fast and accessible automatic cell counting tool using machine learning for 2D image segmentation)

This dataset is a collection of fluorescent images from mice in order to test an automatic cell counting tool that we developed. 62 images viewed from 2 or 3 different fields of views are shown. In brief, the dataset was derived from brain sections of a model for HIV-induced brain injury (HIVgp120tg), which expresses soluble gp120 envelope protein in astrocytes under the control of a modified GFAP promoter. The mice were in a mixed C57BL/6.129/SJL genetic background, and two genotypes of 9 month old male mice were selected: wild type controls (Resting, n = 3) and transgenic littermates (HIVgp120tg, Activated, n = 3). No randomization was performed. HIVgp120tg mice show among other hallmarks of human HIV neuropathology an increase in microglia numbers which indicates activation of the cells compared to non-transgenic littermate controls.

1 papers0 benchmarksBiology, Biomedical, Images, Medical

Blood Cell Detection Dataset

Overview This is a dataset of blood cells photos.

1 papers0 benchmarksImages, Medical

AI-ready multiplex IHC-IF dataset (AI-ready restained and co-registered multiplex dataset for head-and-neck squamous cell carcinoma)

We introduce a new AI-ready computational pathology dataset containing restained and co-registered digitized images from eight head-and-neck squamous cell carcinoma patients. Specifically, the same tumor sections were stained with the expensive multiplex immunofluorescence (mIF) assay first and then restained with cheaper multiplex immunohistochemistry (mIHC). This is a first public dataset that demonstrates the equivalence of these two staining methods which in turn allows several use cases; due to the equivalence, our cheaper mIHC staining protocol can offset the need for expensive mIF staining/scanning which requires highly skilled lab technicians. As opposed to subjective and error-prone immune cell annotations from individual pathologists (disagreement > 50%) to drive SOTA deep learning approaches, this dataset provides objective immune and tumor cell annotations via mIF/mIHC restaining for more reproducible and accurate characterization of tumor immune microenvironment (e.g. for

1 papers0 benchmarksBiology, Images, Medical

Facial Skeletal angles (Facial Skeletal Angles (Glabella and Maxilla Angle and Length and Width of Piriformis))

Facial Skeletal Angles (Glabella and Maxilla Angle and Length and Width of Piriformis)

1 papers0 benchmarksBiology, Medical

PWISeg (PWISeg Surgical Instruments Dataset)

Overview The Surgical Instruments Recognition Dataset is a groundbreaking collection of high-resolution images (1280x960 pixels) specifically designed for the recognition and categorization of surgical instruments. This dataset captures the intricate details and complexity of surgical tools, particularly when arranged in scenarios reminiscent of an operating room.

1 papers0 benchmarksImages, Medical

LPBA40 (LONI Probabilistic Brain Atlas)

Click to add a brief description of the dataset (Markdown and LaTeX enabled).

1 papers0 benchmarks3D, Images, MRI, Medical

GOD (Generic Object Decoding)

The Generic Object Decoding (GOD) Dataset is a specialized resource developed for fMRI-based decoding. It aggregates fMRI data gathered through the presentation of images from 200 representative object categories, originating from the 2011 fall release of ImageNet. The training session incorporated 1,200 images (8 per category from 150 distinct object categories). In contrast, the test session included 50 images (one from each of the 50 object categories). It is noteworthy that the categories in the test session were unique from those in the training session and were introduced in a randomized sequence across runs. On five subjects the fMRI scanning was conducted.

1 papers1 benchmarksImages, Medical, fMRI

SGMs 4 PET

Data for Score-Based Generative Models for PET Image Reconstruction. All simuations based on BrainWeb dataset. The image simulation either taken from Georg Schramm's BrainWeb simulation in 2D, or in 3D it was simulated using BrainWeb package. The 2D measurements were simulated using pyParallelProj and 3D measurements using SIRF (with STIR backend).

1 papers0 benchmarks3D, Medical

Deep Deep Learning With BART (Trained Weights and Example Data)

Click to add a brief description of the dataset (Markdown and LaTeX enabled).

1 papers0 benchmarksImages, MRI, Medical

BIDS CHB-MIT Scalp EEG Database

This dataset is a BIDS-compatible version of the CHB-MIT Scalp EEG Database. It reorganizes the file structure to comply with the BIDS specification. To this effect:

1 papers0 benchmarksEEG, Medical, Time series

BIDS Siena Scalp EEG Database

This dataset is a BIDS compatible version of the Siena Scalp EEG Database. It reorganizes the file structure to comply with the BIDS specification. To this effect:

1 papers0 benchmarksEEG, Medical, Time series

Siena Scalp EEG Database (Physionet Siena Scalp EEG Database)

The database consists of EEG recordings of 14 patients acquired at the Unit of Neurology and Neurophysiology of the University of Siena. Subjects include 9 males (ages 25-71) and 5 females (ages 20-58). Subjects were monitored with a Video-EEG with a sampling rate of 512 Hz, with electrodes arranged on the basis of the international 10-20 System. Most of the recordings also contain 1 or 2 EKG signals. The diagnosis of epilepsy and the classification of seizures according to the criteria of the International League Against Epilepsy were performed by an expert clinician after a careful review of the clinical and electrophysiological data of each patient.

1 papers0 benchmarksEEG, Medical, Time series

SeizeIT1

This dataset is obtained during an ICON project (2017-2018) in collaboration with KU Leuven (ESAT-STADIUS), UZ Leuven, UCB, Byteflies and Pilipili. The goal of this project was to design a system using Behind the ear (bhE) EEG electrodes for monitoring the patient in a home environment. This way, a nice balance can be found between sufficient accuracy of seizure detection algorithms (because EEG is used) and wearability (bhe EEG is relatively subtle, similar to a hear-aid device). The dataset acquired in the hospital during presurgical evaluation. During such presurgical evaluation, neurologists try to see if a specific part of the brain is causing the seizures, and if so, if that part of the brain can be removed during surgery. During the presurgical evaluation, patients are monitored using the vEEG for multiple days (typically a week). Patients are however restricted to move within their room because of the wiring and video analysis. In this dataset, following data is available per p

1 papers0 benchmarksEEG, Medical, Time series

Symbrain

Click to add a brief description of the dataset (Markdown and LaTeX enabled).

1 papers0 benchmarksImages, MRI, Medical

Radio-Freqency Ultrasound volume dataset for pre-clinical liver tumors

A total of 227 cross sectional images (20 x 54 mm with a resolution of 289 x 648 pixels) of hind-leg xenograft tumors from 29 mice were obtained with 1mm step-wise movement of the array mounted on a manual positioning device. The whole tumor volume was acquired using a diagnostic ultrasound system with a 10 MHz linear transducer and 50 MHz sampling.

1 papers0 benchmarks3D, Biomedical, Images, Medical

The ULS23 Challenge Public Training Dataset

The ULS23 training dataset contains 38,693 diverse lesions from chest-abdomen-pelvis CT examinations. For the challenge, we introduced two novel 3D annotated datasets targeting lesions in the pancreas and bones, which are traditionally challenging to segment. Additionally, we aggregate 10 publicly available datasets with a lesion segmentation component into a single, easily accessible data repository.

1 papers0 benchmarks3D, Images, Medical

MedPromptX-VQA

A new in-context visual question answering dataset encompassing interleaved image and EHR data derived from MIMIC-IV and MIMIC-CXR-JPG databases.

1 papers0 benchmarksImages, Medical, Tabular

VietMed-Sum

In doctor-patient conversations, identifying medically relevant information is crucial, posing the need for conversation summarization. In this work, we propose the first deployable real-time speech summarization system for real-world applications in industry, which generates a local summary after every N speech utterances within a conversation and a global summary after the end of a conversation. Our system could enhance user experience from a business standpoint, while also reducing computational costs from a technical perspective. Secondly, we present VietMed-Sum which, to our knowledge, is the first speech summarization dataset for medical conversations. Thirdly, we are the first to utilize LLM and human annotators collaboratively to create gold standard and synthetic summaries for medical conversation summarization.

1 papers0 benchmarksAudio, Medical, Texts

MedTrinity-25M

Click to add a brief description of the dataset (Markdown and LaTeX enabled).

1 papers0 benchmarksBiomedical, Images, MRI, Medical, Texts

RadCases

RadCases Dataset This HuggingFace (HF) dataset contains the raw case labels for input patient "one-liner" case summaries according to the ACR Appropriateness Criteria. Because many of the sources of data used to construct the RadCases dataset require credentialed access, we cannot publicly release the input patient case summaries. Instead, the "cases" included in this publicly available dataset are the cryptographically secure SHA-512 hashes of the original, "human-readable" cases. In this way, the hashes cannot be used to reconstruct the original RadCases dataset, but can instead be used as a lookup key to determine the ground-truth label for the dataset.

1 papers0 benchmarksMedical, Texts
PreviousPage 17 of 20Next