AI Imaging

Curation of Imaging — listings under the AI tab.

Pathology & Medical Imaging Foundation Models

Rapidly growing since 2024. Models for computational pathology that learn from millions of whole-slide images:

Self-supervised pretraining on slide patches without manual labels (UNI, Chen et al. Nat Med 2024; Virchow, Vorontsov et al. Nat Med 2024; Prov-GigaPath, Xu et al. PMID 38931993, Nature 2024).
Vision-language alignment connecting histology images with text descriptions for zero-shot tasks (CONCH, Lu et al. Nat Med 2024; TITAN, Ding et al. Nat Med 2025; mSTAR, Guo et al. Nat Med 2025).
Knowledge-enhanced architectures incorporating biomedical ontologies and multimodal clinical data (KEEP, Li et al. PMID 39972922, Nat Med 2026; PathOrchestra, Xiong et al. Nat Med 2025).

Trend: from single-modal patch-level models → multimodal whole-slide understanding with clinical context.

Summary Table

Click a column header to sort the table.

NAME	CATEGORY	Main citation	YEAR
ENLIGHT-DeepPT	Cross-modal Prediction	Hoang DT et al., Nat Cancer, 2024	2024
CONCH	General Feature Extraction	Lu MY et al., Nat Med, 2024	2024
KEEP	General Feature Extraction	Zhou X et al., Cancer Cell, 2026	2026
UNI	General Feature Extraction	Chen RJ et al., Nat Med, 2024	2024
CHIEF	WSI Model	Wang X et al., Nature, 2024	2024
PathOrchestra	WSI Model	Yan F et al., npj Digit Med, 2025	2025
Prov-GigaPath	WSI Model	Xu H et al., Nature, 2024	2024
TITAN	WSI Model	Ding T et al., Nat Med, 2025	2025
Virchow	WSI Model	Vorontsov E et al., Nat Med, 2024	2024
mSTAR	WSI Model	Xu Y et al., Nat Commun, 2025	2025

ENLIGHT-DeepPT

AI Imaging Histopathology Transcriptomics Treatment Response Precision Oncology

PUBMED_LINK

38961276

FULL NAME

ENLIGHT-DeepPT — Deep-Learning Framework for Cancer Treatment Response from Histopathology Images

DESCRIPTION

ENLIGHT-DeepPT (Deep Phenotyping of Tumors) is a deep-learning framework (ResNet50 + MLP) that predicts genome-wide tumor mRNA expression from routine H&E histopathology images across 16 TCGA cancer types. The imputed transcriptomics then drive treatment response prediction, achieving odds ratio of 2.28 across 5 independent treatment cohorts. Directly links medical imaging (histopathology) with genomics/transcriptomics via AI, enabling precision oncology from standard pathology slides.

Show full descriptionShow less

TITLE

A deep-learning framework to predict cancer treatment response from histopathology images through imputed transcriptomics.

Main citation

Hoang DT, Shulman ED, Shuaib M, Nguyen JD, Maqbool HH, Nguyen Q, Iyer P, Liu S, Ruppin E, Stone EA. (2024) A deep-learning framework to predict cancer treatment response from histopathology images through imputed transcriptomics. Nature Cancer, 5(9):1305-1317. doi:10.1038/s43018-024-00793-2. PMID 38961276

ABSTRACT

Predicting cancer treatment response from routinely collected clinical material is a central challenge in precision oncology. Here we present ENLIGHT-DeepPT, a deep-learning framework that predicts genome-wide tumor mRNA expression from routine H&E histopathology images. Using a two-stage approach (image-to-transcriptomics via ResNet50 + MLP, then transcriptomics-to-treatment response), ENLIGHT-DeepPT achieves an odds ratio of 2.28 across 5 independent treatment cohorts spanning multiple cancer types and drug classes.

Show full abstractShow less

DOI

10.1038/s43018-024-00793-2

General Feature Extraction

CONCH

AI Imaging Pathology Foundation Model Vision-Language Histopathology Mahmood Lab Zero-Shot

PUBMED_LINK

38504017

FULL NAME

CONCH — Contrastive learning from Captions for Histopathology (Vision-Language Foundation Model)

DESCRIPTION

CONCH (CONtrastive learning from Captions for Histopathology) is a vision-language foundation model from Mahmood Lab (Harvard/BWH). Pretrained on 1.17M histopathology image-text pairs from diverse sources (PubMed, educational resources, textbooks). Evaluated across 14 clinically relevant tasks including zero-shot cancer classification, text-to-image retrieval, image-to-text retrieval, caption generation, and tissue segmentation. Outperforms standard models including CLIP and PLIP. CONCH also works on non-H&E stains (IHC, special stains), demonstrating broad applicability. Available as an open-source model for academic use.

Show full descriptionShow less

URL

https://github.com/mahmoodlab/CONCH

TITLE

A visual-language foundation model for computational pathology.

Main citation

Lu MY, Chen B, Williamson DFK, Chen RJ, Liang I, Ding T, Jaume G, Odintsov I, Le LP, Gerber G, Parwani AV, Zhang A, Mahmood F. (2024) A visual-language foundation model for computational pathology. Nature Medicine, 30(3):863-874. doi:10.1038/s41591-024-02856-4. PMID 38504017

ABSTRACT

We introduce CONCH, a visual-language foundation model developed using diverse sources of histopathology images and text. Trained on 1.17 million pathology image-text pairs, CONCH achieves state-of-the-art performance across 14 clinically relevant tasks, including zero-shot cancer classification, text-to-image and image-to-text retrieval, caption generation, and tissue segmentation. CONCH outperforms standard models like CLIP and PLIP, and generalizes to non-H&E stains including immunohistochemistry and special stains, demonstrating its versatility as a foundation model for computational pathology.

Show full abstractShow less

DOI

10.1038/s41591-024-02856-4

KEEP

AI Imaging Pathology Foundation Model Vision-Language Knowledge Graph Rare Cancer Cancer Cell

PUBMED_LINK

41720085

FULL NAME

KEEP — Knowledge-Enhanced Pathology Vision-Language Foundation Model

DESCRIPTION

KEEP (KnowledgE-Enhanced Pathology) is a vision-language foundation model from Shanghai AI Lab / SJTU that systematically integrates disease knowledge into pretraining for cancer diagnosis. Uses a comprehensive disease knowledge graph with 11,454 diseases and 139,143 attributes from DO and UMLS to reorganize millions of pathology image-text pairs into 143,000 semantically structured groups aligned with disease ontology hierarchies. Across 18 public benchmarks (14,000+ WSIs) and 4 institutional rare cancer datasets (926 cases), KEEP consistently outperforms existing foundation models (CHIEF, CONCH, UNI), with substantial gains for rare subtypes (+8.5 pts balanced accuracy vs CONCH on 30 rare brain cancers). Published in Cancer Cell, Feb 2026.

Show full descriptionShow less

URL

https://github.com/MAGIC-AI4Med/KEEP

TITLE

Knowledge-enhanced pretraining for vision-language pathology foundation model on cancer diagnosis.

Main citation

Zhou X, Sun L, He D, Guan W, Wang G, Wang R, Wang L, Yuan X, Sun X, Zhang Y, Sun K, Wang Y, Xie W. (2026) Knowledge-enhanced pretraining for vision-language pathology foundation model on cancer diagnosis. Cancer Cell, 44(4):777-791. doi:10.1016/j.ccell.2026.01.019. PMID 41720085

ABSTRACT

Vision-language foundation models have shown great promise in computational pathology but remain primarily data-driven, lacking explicit integration of medical knowledge. We introduce KEEP, a foundation model that systematically incorporates disease knowledge into pretraining for cancer diagnosis. KEEP leverages a comprehensive disease knowledge graph encompassing 11,454 diseases and 139,143 attributes to reorganize millions of pathology image-text pairs into 143,000 semantically structured groups aligned with disease ontology hierarchies. Across 18 public benchmarks (over 14,000 WSIs) and 4 institutional rare cancer datasets (926 cases), KEEP consistently outperformed existing foundation models, showing substantial gains for rare subtypes.

Show full abstractShow less

DOI

10.1016/j.ccell.2026.01.019

UNI

AI Imaging Pathology Foundation Model Self-Supervised Computational Pathology

PUBMED_LINK

38504018

FULL NAME

UNI — General-Purpose Foundation Model for Computational Pathology

DESCRIPTION

UNI is a general-purpose self-supervised foundation model for computational pathology from Mahmood Lab (Harvard/BWH), pretrained on >100 million images from >100,000 H&E-stained WSIs (>77 TB) across 20 tissue types. Evaluated on 34 representative CPath tasks — outperforming prior models across cancer classification, organ transplant assessment, and rare disease analysis. Demonstrates resolution-agnostic classification, few-shot slide classification, and generalization to 108 cancer types in the OncoTree system. 1,300+ citations.

Show full descriptionShow less

URL

https://github.com/mahmoodlab/UNI

TITLE

Towards a general-purpose foundation model for computational pathology.

Main citation

Chen RJ, Ding T, Lu MY, Williamson DFK, Jaume G, Chen B, Zhang A, Shao D, Song AH, Shaban M, Williams M, Oldenburg L, Weishaupt LL, Wang JJ, Vaidya A, Le LP, Gerber G, Sahai S, Williams W, Mahmood F. (2024) Towards a general-purpose foundation model for computational pathology. Nature Medicine, 30(3):850-862. doi:10.1038/s41591-024-02857-3. PMID 38504018

ABSTRACT

Quantitative evaluation of tissue images is crucial for computational pathology tasks. The high resolution of WSIs and the variability of morphological features present significant challenges. We introduce UNI, a general-purpose self-supervised model for pathology, pretrained using more than 100 million images from over 100,000 diagnostic H&E-stained WSIs across 20 major tissue types. The model was evaluated on 34 representative CPath tasks. UNI outperforms previous state-of-the-art models and demonstrates new capabilities including resolution-agnostic tissue classification, few-shot slide classification, and disease subtyping generalization to 108 cancer types.

Show full abstractShow less

DOI

10.1038/s41591-024-02857-3

WSI Model

CHIEF

AI Imaging Pathology Foundation Model Weakly Supervised Cancer Diagnosis Histopathology

PUBMED_LINK

39232164

FULL NAME

CHIEF — Clinical Histopathology Imaging Evaluation Foundation Model

DESCRIPTION

CHIEF (Clinical Histopathology Imaging Evaluation Foundation) is a general-purpose weakly supervised machine learning framework from Harvard Medical School. Trained on 60,530 WSIs spanning 19 anatomical sites (44TB data), CHIEF leverages two complementary pretraining methods: unsupervised pretraining for tile-level feature identification and weakly supervised pretraining for whole-slide pattern recognition. Validated on 19,491 WSIs from 32 independent slide sets across 24 hospitals internationally. Outperforms SOTA deep learning methods by up to 36.1%, demonstrating strong generalization across diverse populations and slide preparation methods.

Show full descriptionShow less

URL

https://github.com/hms-dbmi/CHIEF

TITLE

A pathology foundation model for cancer diagnosis and prognosis prediction.

Main citation

Wang X, Zhao J, Marostica E, Yuan W, Jin J, Zhang Y, Wang F, Li Y, Yu KH, Baris T, Anand D, Hughes K, Rosemon J, Bower T, Lee S, Weerasinghe R, Wright BJ, Robicsek A, Piening B, Bifulco C, Wang S, Poon H. (2024) A pathology foundation model for cancer diagnosis and prognosis prediction. Nature, 634(8035):970-978. doi:10.1038/s41586-024-07894-z. PMID 39232164

ABSTRACT

Histopathology image evaluation is indispensable for cancer diagnoses and subtype classification. Standard AI methods for histopathology image analyses have focused on optimizing specialized models for each diagnostic task, often with limited generalizability. To address this challenge, we devised CHIEF, a general-purpose weakly supervised machine learning framework to extract pathology imaging features for systematic cancer evaluation. CHIEF leverages two complementary pretraining methods to extract diverse pathology representations: unsupervised pretraining for tile-level feature identification and weakly supervised pretraining for whole-slide pattern recognition. Developed using 60,530 whole-slide images spanning 19 anatomical sites, CHIEF outperformed SOTA deep learning methods by up to 36.1%, showing its ability to address domain shifts observed in samples from diverse populations.

Show full abstractShow less

DOI

10.1038/s41586-024-07894-z

PathOrchestra

AI Imaging Pathology Foundation Model Self-Supervised Clinical-Grade Structured Report

PUBMED_LINK

41258399

FULL NAME

PathOrchestra — Comprehensive Pathology Foundation Model with 100+ Clinical-Grade Tasks

DESCRIPTION

PathOrchestra is a versatile pathology foundation model from Shanghai AI Lab and multiple Chinese institutions, trained via self-supervised learning on 287,424 H&E-stained WSIs from 21 tissue types across 3 independent clinical centers. Evaluated on the largest known clinical task benchmark (112 tasks: 61 private + 51 public) spanning digital slide preprocessing, pan-cancer classification (17 cancer types), lesion identification, multi-cancer subtype classification (36 tasks), biomarker assessment (36 tasks), gene expression prediction, and structured report generation. Achieves over 0.950 accuracy in 47 tasks. First model to generate structured pathology reports for colorectal cancer and lymphoma. Apache 2.0 open-source license.

Show full descriptionShow less

URL

https://github.com/yanfang-research/PathOrchestra

TITLE

PathOrchestra: a comprehensive foundation model for computational pathology with over 100 diverse clinical-grade tasks.

Main citation

Yan F, et al. (2025) PathOrchestra: a comprehensive foundation model for computational pathology with over 100 diverse clinical-grade tasks. npj Digital Medicine, 8(1):695. doi:10.1038/s41746-025-02027-w. PMID 41258399

ABSTRACT

The complexity and variability of high-resolution pathological images present significant challenges in computational pathology. We present PathOrchestra, a versatile pathology foundation model trained via self-supervised learning on 287,424 slides from 21 tissue types across three centers. Evaluated on 112 tasks from 61 private and 51 public datasets, covering digital slide preprocessing, pan-cancer classification, lesion identification, multi-cancer subtype classification, biomarker assessment, gene expression prediction, and structured report generation. Across 27,755 WSIs and 9,415,729 ROI images, it achieved over 0.950 accuracy in 47 tasks. It is the first to generate structured reports for colorectal cancer and lymphoma.

Show full abstractShow less

DOI

10.1038/s41746-025-02027-w

Prov-GigaPath

AI Imaging Pathology Foundation Model Whole-Slide Microsoft Real-World Data

PUBMED_LINK

38778098

FULL NAME

Prov-GigaPath — Whole-Slide Foundation Model for Digital Pathology

DESCRIPTION

Prov-GigaPath by Microsoft Research, Providence, and UW is a whole-slide pathology foundation model pretrained on 1.3 billion 256x256 image tiles from 171,189 whole slides across 28 cancer centers (>30,000 patients, 31 tissue types). Uses a novel GigaPath vision transformer with dilated self-attention (LongNet) for gigapixel-level context. Achieves SOTA on 25/26 benchmark tasks including cancer subtyping, mutation prediction, and TMB classification. The first large-scale whole-slide foundation model trained on real-world clinical data.

Show full descriptionShow less

URL

https://github.com/prov-gigapath/prov-gigapath

TITLE

A whole-slide foundation model for digital pathology from real-world data.

Main citation

Xu H, Usuyama N, Bagal V, Bredell M, Chamby A, Chen Z, Ding J, Fuhlbrück T, Géro Z, Gonzalez J, Gu Y, Xu Y, Wei MH, Wang W, Ma S, Wei F, Yang J, Li C, Gao J, Rosemon J, Bower T, Lee S, Weerasinghe R, Wright B, Robicsek A, Piening B, Bifulco C, Wang S, Poon H. (2024) A whole-slide foundation model for digital pathology from real-world data. Nature, 630(8015):181-188. doi:10.1038/s41586-024-07441-w. PMID 38778098

ABSTRACT

Digital pathology poses unique computational challenges, as a standard gigapixel slide may comprise tens of thousands of image tiles. Prior models have often resorted to subsampling a small portion of tiles for each slide, thus missing important slide-level context. Here we present Prov-GigaPath, a whole-slide pathology foundation model pretrained on 1.3 billion pathology image tiles in 171,189 whole slides from Providence, a large US health network comprising 28 cancer centres. To pretrain Prov-GigaPath, we propose GigaPath, a novel vision transformer for pretraining gigapixel pathology slides using dilated self-attention. Prov-GigaPath attains state-of-the-art performance on 25 out of 26 benchmark tasks.

Show full abstractShow less

DOI

10.1038/s41586-024-07441-w

TITAN

AI Imaging Pathology Foundation Model Vision-Language Whole-Slide Mahmood Lab

PUBMED_LINK

41193692

FULL NAME

TITAN — Transformer-based pathology Image and Text Alignment Network

DESCRIPTION

TITAN (Transformer-based pathology Image and Text Alignment Network) is a multimodal whole-slide foundation model from Mahmood Lab (Harvard/BWH). Pretrained on 335,645 WSIs via visual self-supervised learning and vision-language alignment with 423K synthetic captions from PathChat + 183K pathology reports. Without any fine-tuning, TITAN produces general-purpose slide representations for zero-shot classification, rare cancer retrieval, cross-modal retrieval, and pathology report generation. Outperforms both ROI and slide foundation models across diverse clinical tasks.

Show full descriptionShow less

URL

https://github.com/mahmoodlab/TITAN

TITLE

A multimodal whole-slide foundation model for pathology.

Main citation

Ding T, Wagner SJ, Song AH, Chen RJ, Lu MY, Zhang A, Vaidya AJ, Jaume G, Shaban M, Kim A, Williamson DFK, Oldenburg L, Chen B, Alajaji A, Noor G, Sang Y, Peng T, Le LP, Mahmood F. (2025) A multimodal whole-slide foundation model for pathology. Nature Medicine, 31:3749-3761. doi:10.1038/s41591-025-03982-3. PMID 41193692

ABSTRACT

The field of computational pathology has been transformed with recent advances in foundation models that encode histopathology region-of-interests into versatile feature representations. However, translating these advancements to address complex clinical challenges at the patient and slide level remains constrained by limited clinical data. We propose TITAN, a multimodal whole-slide foundation model pretrained using 335,645 whole-slide images via visual self-supervised learning and vision-language alignment with pathology reports and 423,122 synthetic captions. Without any fine-tuning, TITAN can extract general-purpose slide representations and generate pathology reports that generalize to resource-limited clinical scenarios such as rare disease retrieval and cancer prognosis.

Show full abstractShow less

DOI

10.1038/s41591-025-03982-3

Virchow

AI Imaging Pathology Foundation Model Paige Microsoft Rare Cancer Self-Supervised

PUBMED_LINK

39080966

FULL NAME

Virchow — Million-Scale Digital Pathology Foundation Model (Paige/Microsoft)

DESCRIPTION

Virchow is the first million-slide foundation model for computational pathology, developed by Paige in collaboration with Microsoft. A 632M-parameter ViT-H model trained using DINOv2 on 1.5 million H&E-stained WSIs from MSKCC (17 tissue types). Demonstrates clinical-grade pan-cancer detection with 0.95 AUC across nine common and seven rare cancers. With less training data, the pan-cancer detector built on Virchow achieves similar performance to tissue-specific clinical-grade models in production, outperforming them on rare cancer variants. Serves as the foundation for Paige's Virchow2 (3M WSIs, multimodal) and Virchow2G (1.8B parameters) models.

Show full descriptionShow less

URL

https://huggingface.co/paige-ai/Virchow

TITLE

A foundation model for clinical-grade computational pathology and rare cancers detection.

Main citation

Vorontsov E, Bozkurt A, Casson A, Shaikovski G, Zelechowski M, Severson K, Zimmermann E, Hall J, Tenenholtz N, Fusi N, Yang E, Mathieu P, van Eck A, Lee D, Viret J, Robert E, Wang YK, Kunz JD, Lee MCH, Bernhard JH, Godrich RA, Oakley G, Millar E, Hanna M, Wen H, Retamero JA, Moye WA, Yousfi R, Kanan C, Klimstra DS, Rothrock B, Liu S, Fuchs TJ. (2024) A foundation model for clinical-grade computational pathology and rare cancers detection. Nature Medicine, 30(10):2924-2935. doi:10.1038/s41591-024-03141-0. PMID 39080966

ABSTRACT

The analysis of histopathology images with artificial intelligence aims to enable clinical decision support systems and precision medicine. We present Virchow, the largest foundation model for computational pathology to date. In addition to the evaluation of biomarker prediction and cell identification, we demonstrate that a large foundation model enables pan-cancer detection, achieving 0.95 specimen-level AUC across nine common and seven rare cancers. With less training data, the pan-cancer detector built on Virchow achieved similar performance to tissue-specific clinical-grade models in production and outperformed them on some rare variants of cancer.

Show full abstractShow less

DOI

10.1038/s41591-024-03141-0

mSTAR

AI Imaging Pathology Foundation Model Multimodal Gene Expression Whole-Slide HKUST

PUBMED_LINK

41387679

FULL NAME

mSTAR — Multimodal Self-TAught Pretraining (WSI + Reports + Gene Expression)

DESCRIPTION

mSTAR (Multimodal Self-TAught PRetraining) is a pathology foundation model from HKUST/SJTU that integrates three modalities: pathology slides (WSIs), expert pathology reports, and gene expression (RNA-Seq) data. Curates the largest multimodal dataset of 26,169 slide-level modality pairs across 32 cancer types from 10,275 TCGA patients (>116M patch images). Uses a two-stage paradigm: (1) slide-level contrastive learning across WSI-report-gene modalities, (2) self-taught training that propagates multimodal knowledge from slide aggregator (teacher) to patch extractor (student). Evaluated on 97 tasks across 15 application types, outperforming UNI, CONCH, CHIEF, and GigaPath. Key finding: multimodal integration yields greater improvements than simply expanding vision-only datasets (53x data efficiency vs Virchow). Published in Nat Commun, Dec 2025.

Show full descriptionShow less

URL

https://github.com/Innse/mSTAR

TITLE

A multimodal knowledge-enhanced whole-slide pathology foundation model.

Main citation

Xu Y, Wang Y, Zhou F, Ma J, Yang S, Lin H, Wang X, Wang J, Liang L, Han A, Jin C, Cheng KT, Chen H. (2025) A multimodal knowledge-enhanced whole-slide pathology foundation model. Nature Communications, 16:11406. doi:10.1038/s41467-025-66220-x. PMID 41387679

ABSTRACT

Computational pathology has advanced through foundation models, yet faces challenges in multimodal integration and capturing whole-slide context. We present mSTAR, the pathology foundation model that incorporates three modalities: pathology slides, expert-created reports, and gene expression data, within a unified framework. Our dataset includes 26,169 slide-level modality pairs across 32 cancer types, comprising over 116 million patch images. This approach injects multimodal whole-slide context into patch representations, expanding modeling from single to multiple modalities and from patch-level to slide-level analysis. Across 97 tasks, mSTAR outperforms previous SOTA models, particularly in molecular prediction, revealing that multimodal integration yields greater improvements than simply expanding vision-only datasets.

Show full abstractShow less

DOI

10.1038/s41467-025-66220-x

AI Imaging

Summary Table

Cross-modal Prediction

General Feature Extraction

WSI Model