Skip to content

AI Imaging

Curation of Imaging — listings under the AI tab.

Pathology & Medical Imaging Foundation Models

Rapidly growing since 2024. Models for computational pathology that learn from millions of whole-slide images:

  • Self-supervised pretraining on slide patches without manual labels (UNI, Chen et al. Nat Med 2024; Virchow, Vorontsov et al. Nat Med 2024; Prov-GigaPath, Xu et al. PMID 38931993, Nature 2024).
  • Vision-language alignment connecting histology images with text descriptions for zero-shot tasks (CONCH, Lu et al. Nat Med 2024; TITAN, Ding et al. Nat Med 2025; mSTAR, Guo et al. Nat Med 2025).
  • Knowledge-enhanced architectures incorporating biomedical ontologies and multimodal clinical data (KEEP, Li et al. PMID 39972922, Nat Med 2026; PathOrchestra, Xiong et al. Nat Med 2025).

Trend: from single-modal patch-level models → multimodal whole-slide understanding with clinical context.

Summary Table

Click a column header to sort the table.

NAME CATEGORY Main citation YEAR
ENLIGHT-DeepPT Cross-modal Prediction
Hoang DT et al., Nat Cancer, 2024
2024
CONCH General Feature Extraction
Lu MY et al., Nat Med, 2024
2024
KEEP General Feature Extraction
Zhou X et al., Cancer Cell, 2026
2026
UNI General Feature Extraction
Chen RJ et al., Nat Med, 2024
2024
CHIEF WSI Model
Wang X et al., Nature, 2024
2024
PathOrchestra WSI Model
Yan F et al., npj Digit Med, 2025
2025
Prov-GigaPath WSI Model
Xu H et al., Nature, 2024
2024
TITAN WSI Model
Ding T et al., Nat Med, 2025
2025
Virchow WSI Model
Vorontsov E et al., Nat Med, 2024
2024
mSTAR WSI Model
Xu Y et al., Nat Commun, 2025
2025

Cross-modal Prediction

ENLIGHT-DeepPT

AI Imaging Histopathology Transcriptomics Treatment Response Precision Oncology
PUBMED_LINK
38961276
FULL NAME
ENLIGHT-DeepPT — Deep-Learning Framework for Cancer Treatment Response from Histopathology Images
DESCRIPTION
ENLIGHT-DeepPT (Deep Phenotyping of Tumors) is a deep-learning framework (ResNet50 + MLP) that predicts genome-wide tumor mRNA expression from routine H&E histopathology images across 16 TCGA cancer types. The imputed transcriptomics then drive treatment response prediction, achieving odds ratio of 2.28 across 5 independent treatment cohorts. Directly links medical imaging (histopathology) with genomics/transcriptomics via AI, enabling precision oncology from standard pathology slides.
TITLE
A deep-learning framework to predict cancer treatment response from histopathology images through imputed transcriptomics.
Main citation
Hoang DT, Shulman ED, Shuaib M, Nguyen JD, Maqbool HH, Nguyen Q, Iyer P, Liu S, Ruppin E, Stone EA. (2024) A deep-learning framework to predict cancer treatment response from histopathology images through imputed transcriptomics. Nature Cancer, 5(9):1305-1317. doi:10.1038/s43018-024-00793-2. PMID 38961276
ABSTRACT
Predicting cancer treatment response from routinely collected clinical material is a central challenge in precision oncology. Here we present ENLIGHT-DeepPT, a deep-learning framework that predicts genome-wide tumor mRNA expression from routine H&E histopathology images. Using a two-stage approach (image-to-transcriptomics via ResNet50 + MLP, then transcriptomics-to-treatment response), ENLIGHT-DeepPT achieves an odds ratio of 2.28 across 5 independent treatment cohorts spanning multiple cancer types and drug classes.
DOI
10.1038/s43018-024-00793-2

General Feature Extraction

CONCH

AI Imaging Pathology Foundation Model Vision-Language Histopathology Mahmood Lab Zero-Shot
PUBMED_LINK
38504017
FULL NAME
CONCH — Contrastive learning from Captions for Histopathology (Vision-Language Foundation Model)
DESCRIPTION
CONCH (CONtrastive learning from Captions for Histopathology) is a vision-language foundation model from Mahmood Lab (Harvard/BWH). Pretrained on 1.17M histopathology image-text pairs from diverse sources (PubMed, educational resources, textbooks). Evaluated across 14 clinically relevant tasks including zero-shot cancer classification, text-to-image retrieval, image-to-text retrieval, caption generation, and tissue segmentation. Outperforms standard models including CLIP and PLIP. CONCH also works on non-H&E stains (IHC, special stains), demonstrating broad applicability. Available as an open-source model for academic use.
URL
https://github.com/mahmoodlab/CONCH
TITLE
A visual-language foundation model for computational pathology.
Main citation
Lu MY, Chen B, Williamson DFK, Chen RJ, Liang I, Ding T, Jaume G, Odintsov I, Le LP, Gerber G, Parwani AV, Zhang A, Mahmood F. (2024) A visual-language foundation model for computational pathology. Nature Medicine, 30(3):863-874. doi:10.1038/s41591-024-02856-4. PMID 38504017
ABSTRACT
We introduce CONCH, a visual-language foundation model developed using diverse sources of histopathology images and text. Trained on 1.17 million pathology image-text pairs, CONCH achieves state-of-the-art performance across 14 clinically relevant tasks, including zero-shot cancer classification, text-to-image and image-to-text retrieval, caption generation, and tissue segmentation. CONCH outperforms standard models like CLIP and PLIP, and generalizes to non-H&E stains including immunohistochemistry and special stains, demonstrating its versatility as a foundation model for computational pathology.
DOI
10.1038/s41591-024-02856-4

KEEP

AI Imaging Pathology Foundation Model Vision-Language Knowledge Graph Rare Cancer Cancer Cell
PUBMED_LINK
41720085
FULL NAME
KEEP — Knowledge-Enhanced Pathology Vision-Language Foundation Model
DESCRIPTION
KEEP (KnowledgE-Enhanced Pathology) is a vision-language foundation model from Shanghai AI Lab / SJTU that systematically integrates disease knowledge into pretraining for cancer diagnosis. Uses a comprehensive disease knowledge graph with 11,454 diseases and 139,143 attributes from DO and UMLS to reorganize millions of pathology image-text pairs into 143,000 semantically structured groups aligned with disease ontology hierarchies. Across 18 public benchmarks (14,000+ WSIs) and 4 institutional rare cancer datasets (926 cases), KEEP consistently outperforms existing foundation models (CHIEF, CONCH, UNI), with substantial gains for rare subtypes (+8.5 pts balanced accuracy vs CONCH on 30 rare brain cancers). Published in Cancer Cell, Feb 2026.
URL
https://github.com/MAGIC-AI4Med/KEEP
TITLE
Knowledge-enhanced pretraining for vision-language pathology foundation model on cancer diagnosis.
Main citation
Zhou X, Sun L, He D, Guan W, Wang G, Wang R, Wang L, Yuan X, Sun X, Zhang Y, Sun K, Wang Y, Xie W. (2026) Knowledge-enhanced pretraining for vision-language pathology foundation model on cancer diagnosis. Cancer Cell, 44(4):777-791. doi:10.1016/j.ccell.2026.01.019. PMID 41720085
ABSTRACT
Vision-language foundation models have shown great promise in computational pathology but remain primarily data-driven, lacking explicit integration of medical knowledge. We introduce KEEP, a foundation model that systematically incorporates disease knowledge into pretraining for cancer diagnosis. KEEP leverages a comprehensive disease knowledge graph encompassing 11,454 diseases and 139,143 attributes to reorganize millions of pathology image-text pairs into 143,000 semantically structured groups aligned with disease ontology hierarchies. Across 18 public benchmarks (over 14,000 WSIs) and 4 institutional rare cancer datasets (926 cases), KEEP consistently outperformed existing foundation models, showing substantial gains for rare subtypes.
DOI
10.1016/j.ccell.2026.01.019

UNI

AI Imaging Pathology Foundation Model Self-Supervised Computational Pathology
PUBMED_LINK
38504018
FULL NAME
UNI — General-Purpose Foundation Model for Computational Pathology
DESCRIPTION
UNI is a general-purpose self-supervised foundation model for computational pathology from Mahmood Lab (Harvard/BWH), pretrained on >100 million images from >100,000 H&E-stained WSIs (>77 TB) across 20 tissue types. Evaluated on 34 representative CPath tasks — outperforming prior models across cancer classification, organ transplant assessment, and rare disease analysis. Demonstrates resolution-agnostic classification, few-shot slide classification, and generalization to 108 cancer types in the OncoTree system. 1,300+ citations.
URL
https://github.com/mahmoodlab/UNI
TITLE
Towards a general-purpose foundation model for computational pathology.
Main citation
Chen RJ, Ding T, Lu MY, Williamson DFK, Jaume G, Chen B, Zhang A, Shao D, Song AH, Shaban M, Williams M, Oldenburg L, Weishaupt LL, Wang JJ, Vaidya A, Le LP, Gerber G, Sahai S, Williams W, Mahmood F. (2024) Towards a general-purpose foundation model for computational pathology. Nature Medicine, 30(3):850-862. doi:10.1038/s41591-024-02857-3. PMID 38504018
ABSTRACT
Quantitative evaluation of tissue images is crucial for computational pathology tasks. The high resolution of WSIs and the variability of morphological features present significant challenges. We introduce UNI, a general-purpose self-supervised model for pathology, pretrained using more than 100 million images from over 100,000 diagnostic H&E-stained WSIs across 20 major tissue types. The model was evaluated on 34 representative CPath tasks. UNI outperforms previous state-of-the-art models and demonstrates new capabilities including resolution-agnostic tissue classification, few-shot slide classification, and disease subtyping generalization to 108 cancer types.
DOI
10.1038/s41591-024-02857-3

WSI Model

CHIEF

AI Imaging Pathology Foundation Model Weakly Supervised Cancer Diagnosis Histopathology
PUBMED_LINK
39232164
FULL NAME
CHIEF — Clinical Histopathology Imaging Evaluation Foundation Model
DESCRIPTION
CHIEF (Clinical Histopathology Imaging Evaluation Foundation) is a general-purpose weakly supervised machine learning framework from Harvard Medical School. Trained on 60,530 WSIs spanning 19 anatomical sites (44TB data), CHIEF leverages two complementary pretraining methods: unsupervised pretraining for tile-level feature identification and weakly supervised pretraining for whole-slide pattern recognition. Validated on 19,491 WSIs from 32 independent slide sets across 24 hospitals internationally. Outperforms SOTA deep learning methods by up to 36.1%, demonstrating strong generalization across diverse populations and slide preparation methods.
URL
https://github.com/hms-dbmi/CHIEF
TITLE
A pathology foundation model for cancer diagnosis and prognosis prediction.
Main citation
Wang X, Zhao J, Marostica E, Yuan W, Jin J, Zhang Y, Wang F, Li Y, Yu KH, Baris T, Anand D, Hughes K, Rosemon J, Bower T, Lee S, Weerasinghe R, Wright BJ, Robicsek A, Piening B, Bifulco C, Wang S, Poon H. (2024) A pathology foundation model for cancer diagnosis and prognosis prediction. Nature, 634(8035):970-978. doi:10.1038/s41586-024-07894-z. PMID 39232164
ABSTRACT
Histopathology image evaluation is indispensable for cancer diagnoses and subtype classification. Standard AI methods for histopathology image analyses have focused on optimizing specialized models for each diagnostic task, often with limited generalizability. To address this challenge, we devised CHIEF, a general-purpose weakly supervised machine learning framework to extract pathology imaging features for systematic cancer evaluation. CHIEF leverages two complementary pretraining methods to extract diverse pathology representations: unsupervised pretraining for tile-level feature identification and weakly supervised pretraining for whole-slide pattern recognition. Developed using 60,530 whole-slide images spanning 19 anatomical sites, CHIEF outperformed SOTA deep learning methods by up to 36.1%, showing its ability to address domain shifts observed in samples from diverse populations.
DOI
10.1038/s41586-024-07894-z

PathOrchestra

AI Imaging Pathology Foundation Model Self-Supervised Clinical-Grade Structured Report
PUBMED_LINK
41258399
FULL NAME
PathOrchestra — Comprehensive Pathology Foundation Model with 100+ Clinical-Grade Tasks
DESCRIPTION
PathOrchestra is a versatile pathology foundation model from Shanghai AI Lab and multiple Chinese institutions, trained via self-supervised learning on 287,424 H&E-stained WSIs from 21 tissue types across 3 independent clinical centers. Evaluated on the largest known clinical task benchmark (112 tasks: 61 private + 51 public) spanning digital slide preprocessing, pan-cancer classification (17 cancer types), lesion identification, multi-cancer subtype classification (36 tasks), biomarker assessment (36 tasks), gene expression prediction, and structured report generation. Achieves over 0.950 accuracy in 47 tasks. First model to generate structured pathology reports for colorectal cancer and lymphoma. Apache 2.0 open-source license.
URL
https://github.com/yanfang-research/PathOrchestra
TITLE
PathOrchestra: a comprehensive foundation model for computational pathology with over 100 diverse clinical-grade tasks.
Main citation
Yan F, et al. (2025) PathOrchestra: a comprehensive foundation model for computational pathology with over 100 diverse clinical-grade tasks. npj Digital Medicine, 8(1):695. doi:10.1038/s41746-025-02027-w. PMID 41258399
ABSTRACT
The complexity and variability of high-resolution pathological images present significant challenges in computational pathology. We present PathOrchestra, a versatile pathology foundation model trained via self-supervised learning on 287,424 slides from 21 tissue types across three centers. Evaluated on 112 tasks from 61 private and 51 public datasets, covering digital slide preprocessing, pan-cancer classification, lesion identification, multi-cancer subtype classification, biomarker assessment, gene expression prediction, and structured report generation. Across 27,755 WSIs and 9,415,729 ROI images, it achieved over 0.950 accuracy in 47 tasks. It is the first to generate structured reports for colorectal cancer and lymphoma.
DOI
10.1038/s41746-025-02027-w

Prov-GigaPath

AI Imaging Pathology Foundation Model Whole-Slide Microsoft Real-World Data
PUBMED_LINK
38778098
FULL NAME
Prov-GigaPath — Whole-Slide Foundation Model for Digital Pathology
DESCRIPTION
Prov-GigaPath by Microsoft Research, Providence, and UW is a whole-slide pathology foundation model pretrained on 1.3 billion 256x256 image tiles from 171,189 whole slides across 28 cancer centers (>30,000 patients, 31 tissue types). Uses a novel GigaPath vision transformer with dilated self-attention (LongNet) for gigapixel-level context. Achieves SOTA on 25/26 benchmark tasks including cancer subtyping, mutation prediction, and TMB classification. The first large-scale whole-slide foundation model trained on real-world clinical data.
URL
https://github.com/prov-gigapath/prov-gigapath
TITLE
A whole-slide foundation model for digital pathology from real-world data.
Main citation
Xu H, Usuyama N, Bagal V, Bredell M, Chamby A, Chen Z, Ding J, Fuhlbrück T, Géro Z, Gonzalez J, Gu Y, Xu Y, Wei MH, Wang W, Ma S, Wei F, Yang J, Li C, Gao J, Rosemon J, Bower T, Lee S, Weerasinghe R, Wright B, Robicsek A, Piening B, Bifulco C, Wang S, Poon H. (2024) A whole-slide foundation model for digital pathology from real-world data. Nature, 630(8015):181-188. doi:10.1038/s41586-024-07441-w. PMID 38778098
ABSTRACT
Digital pathology poses unique computational challenges, as a standard gigapixel slide may comprise tens of thousands of image tiles. Prior models have often resorted to subsampling a small portion of tiles for each slide, thus missing important slide-level context. Here we present Prov-GigaPath, a whole-slide pathology foundation model pretrained on 1.3 billion pathology image tiles in 171,189 whole slides from Providence, a large US health network comprising 28 cancer centres. To pretrain Prov-GigaPath, we propose GigaPath, a novel vision transformer for pretraining gigapixel pathology slides using dilated self-attention. Prov-GigaPath attains state-of-the-art performance on 25 out of 26 benchmark tasks.
DOI
10.1038/s41586-024-07441-w

TITAN

AI Imaging Pathology Foundation Model Vision-Language Whole-Slide Mahmood Lab
PUBMED_LINK
41193692
FULL NAME
TITAN — Transformer-based pathology Image and Text Alignment Network
DESCRIPTION
TITAN (Transformer-based pathology Image and Text Alignment Network) is a multimodal whole-slide foundation model from Mahmood Lab (Harvard/BWH). Pretrained on 335,645 WSIs via visual self-supervised learning and vision-language alignment with 423K synthetic captions from PathChat + 183K pathology reports. Without any fine-tuning, TITAN produces general-purpose slide representations for zero-shot classification, rare cancer retrieval, cross-modal retrieval, and pathology report generation. Outperforms both ROI and slide foundation models across diverse clinical tasks.
URL
https://github.com/mahmoodlab/TITAN
TITLE
A multimodal whole-slide foundation model for pathology.
Main citation
Ding T, Wagner SJ, Song AH, Chen RJ, Lu MY, Zhang A, Vaidya AJ, Jaume G, Shaban M, Kim A, Williamson DFK, Oldenburg L, Chen B, Alajaji A, Noor G, Sang Y, Peng T, Le LP, Mahmood F. (2025) A multimodal whole-slide foundation model for pathology. Nature Medicine, 31:3749-3761. doi:10.1038/s41591-025-03982-3. PMID 41193692
ABSTRACT
The field of computational pathology has been transformed with recent advances in foundation models that encode histopathology region-of-interests into versatile feature representations. However, translating these advancements to address complex clinical challenges at the patient and slide level remains constrained by limited clinical data. We propose TITAN, a multimodal whole-slide foundation model pretrained using 335,645 whole-slide images via visual self-supervised learning and vision-language alignment with pathology reports and 423,122 synthetic captions. Without any fine-tuning, TITAN can extract general-purpose slide representations and generate pathology reports that generalize to resource-limited clinical scenarios such as rare disease retrieval and cancer prognosis.
DOI
10.1038/s41591-025-03982-3

Virchow

AI Imaging Pathology Foundation Model Paige Microsoft Rare Cancer Self-Supervised
PUBMED_LINK
39080966
FULL NAME
Virchow — Million-Scale Digital Pathology Foundation Model (Paige/Microsoft)
DESCRIPTION
Virchow is the first million-slide foundation model for computational pathology, developed by Paige in collaboration with Microsoft. A 632M-parameter ViT-H model trained using DINOv2 on 1.5 million H&E-stained WSIs from MSKCC (17 tissue types). Demonstrates clinical-grade pan-cancer detection with 0.95 AUC across nine common and seven rare cancers. With less training data, the pan-cancer detector built on Virchow achieves similar performance to tissue-specific clinical-grade models in production, outperforming them on rare cancer variants. Serves as the foundation for Paige's Virchow2 (3M WSIs, multimodal) and Virchow2G (1.8B parameters) models.
URL
https://huggingface.co/paige-ai/Virchow
TITLE
A foundation model for clinical-grade computational pathology and rare cancers detection.
Main citation
Vorontsov E, Bozkurt A, Casson A, Shaikovski G, Zelechowski M, Severson K, Zimmermann E, Hall J, Tenenholtz N, Fusi N, Yang E, Mathieu P, van Eck A, Lee D, Viret J, Robert E, Wang YK, Kunz JD, Lee MCH, Bernhard JH, Godrich RA, Oakley G, Millar E, Hanna M, Wen H, Retamero JA, Moye WA, Yousfi R, Kanan C, Klimstra DS, Rothrock B, Liu S, Fuchs TJ. (2024) A foundation model for clinical-grade computational pathology and rare cancers detection. Nature Medicine, 30(10):2924-2935. doi:10.1038/s41591-024-03141-0. PMID 39080966
ABSTRACT
The analysis of histopathology images with artificial intelligence aims to enable clinical decision support systems and precision medicine. We present Virchow, the largest foundation model for computational pathology to date. In addition to the evaluation of biomarker prediction and cell identification, we demonstrate that a large foundation model enables pan-cancer detection, achieving 0.95 specimen-level AUC across nine common and seven rare cancers. With less training data, the pan-cancer detector built on Virchow achieved similar performance to tissue-specific clinical-grade models in production and outperformed them on some rare variants of cancer.
DOI
10.1038/s41591-024-03141-0

mSTAR

AI Imaging Pathology Foundation Model Multimodal Gene Expression Whole-Slide HKUST
PUBMED_LINK
41387679
FULL NAME
mSTAR — Multimodal Self-TAught Pretraining (WSI + Reports + Gene Expression)
DESCRIPTION
mSTAR (Multimodal Self-TAught PRetraining) is a pathology foundation model from HKUST/SJTU that integrates three modalities: pathology slides (WSIs), expert pathology reports, and gene expression (RNA-Seq) data. Curates the largest multimodal dataset of 26,169 slide-level modality pairs across 32 cancer types from 10,275 TCGA patients (>116M patch images). Uses a two-stage paradigm: (1) slide-level contrastive learning across WSI-report-gene modalities, (2) self-taught training that propagates multimodal knowledge from slide aggregator (teacher) to patch extractor (student). Evaluated on 97 tasks across 15 application types, outperforming UNI, CONCH, CHIEF, and GigaPath. Key finding: multimodal integration yields greater improvements than simply expanding vision-only datasets (53x data efficiency vs Virchow). Published in Nat Commun, Dec 2025.
URL
https://github.com/Innse/mSTAR
TITLE
A multimodal knowledge-enhanced whole-slide pathology foundation model.
Main citation
Xu Y, Wang Y, Zhou F, Ma J, Yang S, Lin H, Wang X, Wang J, Liang L, Han A, Jin C, Cheng KT, Chen H. (2025) A multimodal knowledge-enhanced whole-slide pathology foundation model. Nature Communications, 16:11406. doi:10.1038/s41467-025-66220-x. PMID 41387679
ABSTRACT
Computational pathology has advanced through foundation models, yet faces challenges in multimodal integration and capturing whole-slide context. We present mSTAR, the pathology foundation model that incorporates three modalities: pathology slides, expert-created reports, and gene expression data, within a unified framework. Our dataset includes 26,169 slide-level modality pairs across 32 cancer types, comprising over 116 million patch images. This approach injects multimodal whole-slide context into patch representations, expanding modeling from single to multiple modalities and from patch-level to slide-level analysis. Across 97 tasks, mSTAR outperforms previous SOTA models, particularly in molecular prediction, revealing that multimodal integration yields greater improvements than simply expanding vision-only datasets.
DOI
10.1038/s41467-025-66220-x