PanCancer Multimodal
HoneyBee
ImagesMedicalTabularTextsCC-BY-NC-ND-4.0Introduced 2024-05-13
Dataset Card for The Cancer Genome Atlas (TCGA) Multimodal Dataset
<!-- Provide a quick summary of the dataset. -->The Cancer Genome Atlas (TCGA) Multimodal Dataset is a comprehensive collection of clinical data, pathology reports, molecular, and slide images for cancer patients. This dataset aims to facilitate research in multimodal machine learning for oncology by providing embeddings generated using state-of-the-art models such as GatorTron, SeNMo, and UNI.
- Curated by: Lab Rasool
- Language(s) (NLP): English
Uses
<!-- Address questions around how the dataset is intended to be used. -->from datasets import load_dataset
clinical_dataset = load_dataset("Lab-Rasool/TCGA", "clinical", split="train")
pathology_report_dataset = load_dataset("Lab-Rasool/TCGA", "pathology_report", split="train")
wsi_dataset = load_dataset("Lab-Rasool/TCGA", "wsi", split="train")
molecular_dataset = load_dataset("Lab-Rasool/TCGA", "molecular", split="train")