Revisiting End-to-End Learning with Slide-level Supervision in Computational Pathology

Wenhao Tang, Rong Qin, Heng Fang, Fengtao Zhou, Hao Chen, Xiang Li, Ming-Ming Cheng

2025-06-03Multiple Instance Learning

Abstract

Pre-trained encoders for offline feature extraction followed by multiple instance learning (MIL) aggregators have become the dominant paradigm in computational pathology (CPath), benefiting cancer diagnosis and prognosis. However, performance limitations arise from the absence of encoder fine-tuning for downstream tasks and disjoint optimization with MIL. While slide-level supervised end-to-end (E2E) learning is an intuitive solution to this issue, it faces challenges such as high computational demands and suboptimal results. These limitations motivate us to revisit E2E learning. We argue that prior work neglects inherent E2E optimization challenges, leading to performance disparities compared to traditional two-stage methods. In this paper, we pioneer the elucidation of optimization challenge caused by sparse-attention MIL and propose a novel MIL called ABMILX. It mitigates this problem through global correlation-based attention refinement and multi-head mechanisms. With the efficient multi-scale random patch sampling strategy, an E2E trained ResNet with ABMILX surpasses SOTA foundation models under the two-stage paradigm across multiple challenging benchmarks, while remaining computationally efficient (<10 RTX3090 hours). We show the potential of E2E learning in CPath and calls for greater research focus in this area. The code is https://github.com/DearCaat/E2E-WSI-ABMILX.

Related Papers

GNN-ViTCap: GNN-Enhanced Multiple Instance Learning with Vision Transformers for Whole Slide Image Classification and Captioning2025-07-09 The Trilemma of Truth in Large Language Models2025-06-30 OTSurv: A Novel Multiple Instance Learning Framework for Survival Prediction with Heterogeneity-aware Optimal Transport2025-06-25 Benchmarking histopathology foundation models in a multi-center dataset for skin cancer subtyping2025-06-23 MiCo: Multiple Instance Learning with Context-Aware Clustering for Whole Slide Image Analysis2025-06-22 HyperPath: Knowledge-Guided Hyperbolic Semantic Hierarchy Modeling for WSI Analysis2025-06-19 Dual‑detector Re‑optimization for Federated Weakly Supervised Video Anomaly Detection Via Adaptive Dynamic Recursive Mapping2025-06-13 BioLangFusion: Multimodal Fusion of DNA, mRNA, and Protein Language Models2025-06-10