395 machine learning datasets
395 dataset results
Dataset Card for the ACR Appropriateness Criteria Corpus This dataset contains chunked guidelines and narratives from the ACR Appropriateness Criteria, an set of societal guidelines from the American College of Radiology (ACR) to help clinicians order appropriate diagnostic imaging studies for patients. The corpus is formatted similarly to the corpuses introduced in MedRAG by Xiong et al. (2024), and can therefore be similarly used for medical Retrieval-Augmented Generation (RAG).
MediConfusion is a challenging medical Visual Question Answering (VQA) benchmark dataset, that probes the failure modes of medical Multimodal Large Language Models (MLLMs) from a vision perspective. We reveal that state-of-the-art models are easily confused by image pairs that are otherwise visually dissimilar and clearly distinct for medical experts. <br /> Our benchmark consists of 176 confusing pairs. A confusing pair is a set of two images that share the same question and corresponding answer options, but the correct answer is different for the images. <br /> We evaluate models based on their ability to answer <i>both</i> questions correctly within a confusing pair, which we call <b>set accuracy</b>. This metric indicates how well models can tell the two images apart, as a model that selects the same answer option for both images for all pairs will receive 0% set accuracy. We also report <b>confusion</b>, a metric that describes the proportion of confusing pairs where the model ha
The dataset SCARED-C is introduced in the context of assessing robustness in endoscopic depth prediction models. It is part of the EndoDepth benchmark, which is designed to evaluate the performance of monocular depth prediction models specifically for endoscopic scenarios. The dataset features 16 different types of image corruptions, each with five levels of severity, encompassing challenges like lens distortion, resolution alterations, specular reflection, and color changes that are typical in endoscopic imaging. The ground truth is on the original testing set of SCARED.
RaTE-NER dataset is a large-scale, radiological named entity recognition (NER) dataset, including 13,235 manually annotated sentences from 1,816 reports within the MIMIC-IV database, that spans 9 imaging modalities and 23 anatomical regions, ensuring comprehensive coverage.
Our dataset, BSMDD, was collected from various open social media platforms and translated and annotated by native Bengali speakers with expertise in both language and mental health. It contains 21,910 cleaned samples, including 10,961 labeled as Depressed and 10,949 as Non-Depressed. The dataset is publicly accessible, providing a valuable resource for further research in depression detection in Bengali social media content. The expert annotation process, conducted by professionals, ensures high validity, making BSMDD particularly important for advancing mental health research through social media analysis. This dataset is also published on Mendeley.
Overview PASSION derm is a pioneering initiative dedicated to closing the diversity gap in dermatology datasets. This project provides a unique dataset of skin condition images from Sub-Saharan Africa, with a focus on richly pigmented skin. The dataset is designed to emulate teledermatology settings and includes images of common pediatric skin conditions, such as eczema, fungal infections, scabies, and impetigo, in diverse quality and resolution. PASSION derm aims to improve access to dermatologic care in regions with limited healthcare resources.
The datasets used and analysed from the glucose clamp study are available in this Excel file. They include pseudonymised information on the participants, somatometric data, biomarkers of lipid metabolism and parameters of insulin-glucose homeostasis, i.e. concentrations of insulin, glucose and c-peptide as well as data from glucose-clamp experiments, HOMA, SPINA Carb parameters (SPINA-GBeta and SPINA-GR), Matsuda index, insulinogenic index, disposition index and McAuley index.
The datasets used and analysed from the glucose clamp study are available in this DIF file. They include pseudonymised information on the participants, somatometric data, biomarkers of lipid metabolism and parameters of insulin-glucose homeostasis, i.e. concentrations of insulin, glucose and c-peptide as well as data from glucose-clamp experiments, HOMA, SPINA Carb parameters (SPINA-GBeta and SPINA-GR), Matsuda index, insulinogenic index, disposition index and McAuley index.
This dataset supports the research detailed in the pre-print "Virtual Imaging Trials Improved the Transparency and Reliability of AI Systems in COVID-19 Imaging." The study employs both clinical and simulated CT data to evaluate AI models for COVID-19 diagnosis. By leveraging the Virtual Imaging Trials (VIT) framework, the research addresses reproducibility and generalizability issues prevalent in medical imaging AI models.
Click to add a brief description of the dataset (Markdown and LaTeX enabled).
High-resolution early gastric cancer (EGC) detection and analysis: Patient Dataļ¼Datasets often include images from patients diagnosed with gastric cancer, specifically distinguishing between early gastric cancer (EGC) and Non -pathogenic gastric cancer (NGC). The study utilized data from 341 patients, with 124 classified as EGC and 217 as NGC. Image Types: High-resolution images are typically obtained from endoscopy image. Data Volume: The size of datasets mentioned a dataset of 1120 images specifically for EGC detection and 2150 images for NGC.
This dataset comprises 77,175 Reddit posts from 115 subreddit forums, annotated for the presence of 15 topics related to eating disorders and dieting. The dataset includes labels and scores on all 77,175 Reddit posts, determined by 5 Large Language Models: GPT-4o, Llama-3.1-8B-Instruct, Qwen2.5-7B-Instruct, Mistral-7B-Instruct-v0.3, Vicuna-7b-v1.5, as well as by the ensemble of the four open-source LLMs. The dataset also includes a subset of 1,080 human-annotated posts for evaluation.
Rethinking Abdominal Organ Segmentation (RAOS) in the clinical scenario: A robustness evaluation benchmark with challenging cases.
ENSeg Dataset Overview This dataset represents an enhanced subset of the ENS dataset. The ENS dataset comprises image samples extracted from the enteric nervous system (ENS) of male adult Wistar rats (Rattus norvegicus, albius variety), specifically from the jejunum, the second segment of the small intestine.
The Liver-US dataset is a comprehensive collection of high-quality ultrasound images of the liver, including both normal and abnormal cases. This dataset is designed to facilitate research in medical image classification, with a focus on liver-related conditions. It includes a diverse range of ultrasound images acquired from multiple clinical settings, providing a robust foundation for developing and validating machine learning models in medical image analysis. Detailed Dataset Description
PlainFact is a high-quality human-annotated dataset with fine-grained explanation (i.e., added information) annotations.
The structure for the dataset is as follows :
LLM Health Benchmarks Dataset The Health Benchmarks Dataset is a specialized resource for evaluating large language models (LLMs) in different medical specialties. It provides structured question-answer pairs designed to test the performance of AI models in understanding and generating domain-specific knowledge.
LLaVA-Rad MIMIC-CXR features more accurate section extractions from MIMIC-CXR free-text radiology reports. Traditionally, rule-based methods were used to extract sections such as the reason for exam, findings, and impression. However, these approaches often fail due to inconsistencies in report structure and clinical language. In this work, we leverage GPT-4 to extract these sections more reliably, adding 237,073 image-text pairs to the training split and 1,952 pairs to the validation split. This enhancement afforded the development and fine-tuning of LLaVA-Rad, a multimodal large language model (LLM) tailored for radiology applications, achieving improved performance on report generation tasks.
This dataset contains pre-processed versions of datasets introduced in prior works. Additionally, it also contains new data that are pertinent to the paper.