TasksSotADatasetsPapersMethodsSubmitAbout
Papers With Code 2

A community resource for machine learning research: papers, code, benchmarks, and state-of-the-art results.

Explore

Notable BenchmarksAll SotADatasetsPapersMethods

Community

Submit ResultsAbout

Data sourced from the PWC Archive (CC-BY-SA 4.0). Built by the community, for the community.

Papers/From Classical Machine Learning to Emerging Foundation Mod...

From Classical Machine Learning to Emerging Foundation Models: Review on Multimodal Data Integration for Cancer Research

Amgad Muneer, Muhammad Waqas, Maliazurina B Saad, Eman Showkatian, Rukhmini Bandyopadhyay, Hui Xu, Wentao Li, Joe Y Chang, Zhongxing Liao, Cara Haymaker, Luisa Solis Soto, Carol C Wu, Natalie I Vokes, Xiuning Le, Lauren A Byers, Don L Gibbons, John V Heymach, Jianjun Zhang, Jia Wu

2025-07-11Data Integration
PaperPDFCode(official)

Abstract

Cancer research is increasingly driven by the integration of diverse data modalities, spanning from genomics and proteomics to imaging and clinical factors. However, extracting actionable insights from these vast and heterogeneous datasets remains a key challenge. The rise of foundation models (FMs) -- large deep-learning models pretrained on extensive amounts of data serving as a backbone for a wide range of downstream tasks -- offers new avenues for discovering biomarkers, improving diagnosis, and personalizing treatment. This paper presents a comprehensive review of widely adopted integration strategies of multimodal data to assist advance the computational approaches for data-driven discoveries in oncology. We examine emerging trends in machine learning (ML) and deep learning (DL), including methodological frameworks, validation protocols, and open-source resources targeting cancer subtype classification, biomarker discovery, treatment guidance, and outcome prediction. This study also comprehensively covers the shift from traditional ML to FMs for multimodal integration. We present a holistic view of recent FMs advancements and challenges faced during the integration of multi-omics with advanced imaging data. We identify the state-of-the-art FMs, publicly available multi-modal repositories, and advanced tools and methods for data integration. We argue that current state-of-the-art integrative methods provide the essential groundwork for developing the next generation of large-scale, pre-trained models poised to further revolutionize oncology. To the best of our knowledge, this is the first review to systematically map the transition from conventional ML to advanced FM for multimodal data integration in oncology, while also framing these developments as foundational for the forthcoming era of large-scale AI models in cancer research.

Related Papers

Empowering Digital Agriculture: A Privacy-Preserving Framework for Data Sharing and Collaborative Research2025-06-25Intelligent Operation and Maintenance and Prediction Model Optimization for Improving Wind Power Generation Efficiency2025-06-19Ring-lite: Scalable Reasoning via C3PO-Stabilized Reinforcement Learning for LLMs2025-06-17Brain Imaging Foundation Models, Are We There Yet? A Systematic Review of Foundation Models for Brain Imaging and Biomedical Research2025-06-16Leveraging MIMIC Datasets for Better Digital Health: A Review on Open Problems, Progress Highlights, and Future Promises2025-06-15Enhancing Bagging Ensemble Regression with Data Integration for Time Series-Based Diabetes Prediction2025-06-11scSSL-Bench: Benchmarking Self-Supervised Learning for Single-Cell Data2025-06-10The Cell Ontology in the age of single-cell omics2025-06-10