Skip to content

Whole-Slide

Catalog entries using this tag (links open the entry card on its page):

Entries

mSTAR

AI Imaging Pathology Foundation Model Multimodal Gene Expression Whole-Slide HKUST
PUBMED_LINK
41387679
FULL NAME
mSTAR — Multimodal Self-TAught Pretraining (WSI + Reports + Gene Expression)
DESCRIPTION
mSTAR (Multimodal Self-TAught PRetraining) is a pathology foundation model from HKUST/SJTU that integrates three modalities: pathology slides (WSIs), expert pathology reports, and gene expression (RNA-Seq) data. Curates the largest multimodal dataset of 26,169 slide-level modality pairs across 32 cancer types from 10,275 TCGA patients (>116M patch images). Uses a two-stage paradigm: (1) slide-level contrastive learning across WSI-report-gene modalities, (2) self-taught training that propagates multimodal knowledge from slide aggregator (teacher) to patch extractor (student). Evaluated on 97 tasks across 15 application types, outperforming UNI, CONCH, CHIEF, and GigaPath. Key finding: multimodal integration yields greater improvements than simply expanding vision-only datasets (53x data efficiency vs Virchow). Published in Nat Commun, Dec 2025.
URL
https://github.com/Innse/mSTAR
TITLE
A multimodal knowledge-enhanced whole-slide pathology foundation model.
Main citation
Xu Y, Wang Y, Zhou F, Ma J, Yang S, Lin H, Wang X, Wang J, Liang L, Han A, Jin C, Cheng KT, Chen H. (2025) A multimodal knowledge-enhanced whole-slide pathology foundation model. Nature Communications, 16:11406. doi:10.1038/s41467-025-66220-x. PMID 41387679
ABSTRACT
Computational pathology has advanced through foundation models, yet faces challenges in multimodal integration and capturing whole-slide context. We present mSTAR, the pathology foundation model that incorporates three modalities: pathology slides, expert-created reports, and gene expression data, within a unified framework. Our dataset includes 26,169 slide-level modality pairs across 32 cancer types, comprising over 116 million patch images. This approach injects multimodal whole-slide context into patch representations, expanding modeling from single to multiple modalities and from patch-level to slide-level analysis. Across 97 tasks, mSTAR outperforms previous SOTA models, particularly in molecular prediction, revealing that multimodal integration yields greater improvements than simply expanding vision-only datasets.
DOI
10.1038/s41467-025-66220-x

Prov-GigaPath

AI Imaging Pathology Foundation Model Whole-Slide Microsoft Real-World Data
PUBMED_LINK
38778098
FULL NAME
Prov-GigaPath — Whole-Slide Foundation Model for Digital Pathology
DESCRIPTION
Prov-GigaPath by Microsoft Research, Providence, and UW is a whole-slide pathology foundation model pretrained on 1.3 billion 256x256 image tiles from 171,189 whole slides across 28 cancer centers (>30,000 patients, 31 tissue types). Uses a novel GigaPath vision transformer with dilated self-attention (LongNet) for gigapixel-level context. Achieves SOTA on 25/26 benchmark tasks including cancer subtyping, mutation prediction, and TMB classification. The first large-scale whole-slide foundation model trained on real-world clinical data.
URL
https://github.com/prov-gigapath/prov-gigapath
TITLE
A whole-slide foundation model for digital pathology from real-world data.
Main citation
Xu H, Usuyama N, Bagal V, Bredell M, Chamby A, Chen Z, Ding J, Fuhlbrück T, Géro Z, Gonzalez J, Gu Y, Xu Y, Wei MH, Wang W, Ma S, Wei F, Yang J, Li C, Gao J, Rosemon J, Bower T, Lee S, Weerasinghe R, Wright B, Robicsek A, Piening B, Bifulco C, Wang S, Poon H. (2024) A whole-slide foundation model for digital pathology from real-world data. Nature, 630(8015):181-188. doi:10.1038/s41586-024-07441-w. PMID 38778098
ABSTRACT
Digital pathology poses unique computational challenges, as a standard gigapixel slide may comprise tens of thousands of image tiles. Prior models have often resorted to subsampling a small portion of tiles for each slide, thus missing important slide-level context. Here we present Prov-GigaPath, a whole-slide pathology foundation model pretrained on 1.3 billion pathology image tiles in 171,189 whole slides from Providence, a large US health network comprising 28 cancer centres. To pretrain Prov-GigaPath, we propose GigaPath, a novel vision transformer for pretraining gigapixel pathology slides using dilated self-attention. Prov-GigaPath attains state-of-the-art performance on 25 out of 26 benchmark tasks.
DOI
10.1038/s41586-024-07441-w

TITAN

AI Imaging Pathology Foundation Model Vision-Language Whole-Slide Mahmood Lab
PUBMED_LINK
41193692
FULL NAME
TITAN — Transformer-based pathology Image and Text Alignment Network
DESCRIPTION
TITAN (Transformer-based pathology Image and Text Alignment Network) is a multimodal whole-slide foundation model from Mahmood Lab (Harvard/BWH). Pretrained on 335,645 WSIs via visual self-supervised learning and vision-language alignment with 423K synthetic captions from PathChat + 183K pathology reports. Without any fine-tuning, TITAN produces general-purpose slide representations for zero-shot classification, rare cancer retrieval, cross-modal retrieval, and pathology report generation. Outperforms both ROI and slide foundation models across diverse clinical tasks.
URL
https://github.com/mahmoodlab/TITAN
TITLE
A multimodal whole-slide foundation model for pathology.
Main citation
Ding T, Wagner SJ, Song AH, Chen RJ, Lu MY, Zhang A, Vaidya AJ, Jaume G, Shaban M, Kim A, Williamson DFK, Oldenburg L, Chen B, Alajaji A, Noor G, Sang Y, Peng T, Le LP, Mahmood F. (2025) A multimodal whole-slide foundation model for pathology. Nature Medicine, 31:3749-3761. doi:10.1038/s41591-025-03982-3. PMID 41193692
ABSTRACT
The field of computational pathology has been transformed with recent advances in foundation models that encode histopathology region-of-interests into versatile feature representations. However, translating these advancements to address complex clinical challenges at the patient and slide level remains constrained by limited clinical data. We propose TITAN, a multimodal whole-slide foundation model pretrained using 335,645 whole-slide images via visual self-supervised learning and vision-language alignment with pathology reports and 423,122 synthetic captions. Without any fine-tuning, TITAN can extract general-purpose slide representations and generate pathology reports that generalize to resource-limited clinical scenarios such as rare disease retrieval and cancer prognosis.
DOI
10.1038/s41591-025-03982-3