Foundation Model
Catalog entries using this tag (links open the entry card on its page):
- CHIEF — AI
- CONCH — AI
- CONCH — AI
- DNA Foundation Benchmark — AI
- KEEP — AI
- mSTAR — AI
- PathOrchestra — AI
- Prov-GigaPath — AI
- scGPT — AI
- TITAN — AI
- UNI — AI
- Virchow — AI
Entries
CHIEF
PUBMED_LINK
FULL NAME
CHIEF — Clinical Histopathology Imaging Evaluation Foundation Model
DESCRIPTION
CHIEF (Clinical Histopathology Imaging Evaluation Foundation) is a general-purpose weakly supervised machine learning framework from Harvard Medical School. Trained on 60,530 WSIs spanning 19 anatomical sites (44TB data), CHIEF leverages two complementary pretraining methods: unsupervised pretraining for tile-level feature identification and weakly supervised pretraining for whole-slide pattern recognition. Validated on 19,491 WSIs from 32 independent slide sets across 24 hospitals internationally. Outperforms SOTA deep learning methods by up to 36.1%, demonstrating strong generalization across diverse populations and slide preparation methods.
URL
TITLE
A pathology foundation model for cancer diagnosis and prognosis prediction.
Main citation
Wang X, Zhao J, Marostica E, Yuan W, Jin J, Zhang Y, Wang F, Li Y, Yu KH, Baris T, Anand D, Hughes K, Rosemon J, Bower T, Lee S, Weerasinghe R, Wright BJ, Robicsek A, Piening B, Bifulco C, Wang S, Poon H. (2024) A pathology foundation model for cancer diagnosis and prognosis prediction. Nature, 634(8035):970-978. doi:10.1038/s41586-024-07894-z. PMID 39232164
ABSTRACT
Histopathology image evaluation is indispensable for cancer diagnoses and subtype classification. Standard AI methods for histopathology image analyses have focused on optimizing specialized models for each diagnostic task, often with limited generalizability. To address this challenge, we devised CHIEF, a general-purpose weakly supervised machine learning framework to extract pathology imaging features for systematic cancer evaluation. CHIEF leverages two complementary pretraining methods to extract diverse pathology representations: unsupervised pretraining for tile-level feature identification and weakly supervised pretraining for whole-slide pattern recognition. Developed using 60,530 whole-slide images spanning 19 anatomical sites, CHIEF outperformed SOTA deep learning methods by up to 36.1%, showing its ability to address domain shifts observed in samples from diverse populations.
DOI
10.1038/s41586-024-07894-z
CONCH
PUBMED_LINK
FULL NAME
CONCH — Contrastive learning from Captions for Histopathology (Vision-Language Foundation Model)
DESCRIPTION
CONCH (CONtrastive learning from Captions for Histopathology) is a vision-language foundation model from Mahmood Lab (Harvard/BWH). Pretrained on 1.17M histopathology image-text pairs from diverse sources (PubMed, educational resources, textbooks). Evaluated across 14 clinically relevant tasks including zero-shot cancer classification, text-to-image retrieval, image-to-text retrieval, caption generation, and tissue segmentation. Outperforms standard models including CLIP and PLIP. CONCH also works on non-H&E stains (IHC, special stains), demonstrating broad applicability. Available as an open-source model for academic use.
URL
TITLE
A visual-language foundation model for computational pathology.
Main citation
Lu MY, Chen B, Williamson DFK, Chen RJ, Liang I, Ding T, Jaume G, Odintsov I, Le LP, Gerber G, Parwani AV, Zhang A, Mahmood F. (2024) A visual-language foundation model for computational pathology. Nature Medicine, 30(3):863-874. doi:10.1038/s41591-024-02856-4. PMID 38504017
ABSTRACT
We introduce CONCH, a visual-language foundation model developed using diverse sources of histopathology images and text. Trained on 1.17 million pathology image-text pairs, CONCH achieves state-of-the-art performance across 14 clinically relevant tasks, including zero-shot cancer classification, text-to-image and image-to-text retrieval, caption generation, and tissue segmentation. CONCH outperforms standard models like CLIP and PLIP, and generalizes to non-H&E stains including immunohistochemistry and special stains, demonstrating its versatility as a foundation model for computational pathology.
DOI
10.1038/s41591-024-02856-4
CONCH
PUBMED_LINK
FULL NAME
CONCH — Contrastive Learning from Captions for Histopathology
DESCRIPTION
CONCH (CONtrastive learning from Captions for Histopathology) is a vision-language foundation model pretrained on 1.17 million histopathology image-caption pairs. It achieves state-of-the-art performance across 14 diverse benchmarks including histology image classification, segmentation, captioning, text-to-image and image-to-text retrieval. As a multimodal model bridging visual pathology data with biomedical text, CONCH enables zero-shot transfer and minimal fine-tuning for diverse computational pathology tasks.
URL
TITLE
A visual-language foundation model for computational pathology.
Main citation
Lu MY, Chen B, Williamson DFK, Chen RJ, Liang I, Ding T, Noor G, Sang Y, Mahmood F. (2024) A visual-language foundation model for computational pathology. Nature Medicine, 30(3):863-874. doi:10.1038/s41591-024-02856-4. PMID 38480913
ABSTRACT
The accelerated adoption of digital pathology and advances in deep learning have enabled the development of robust models for various pathology tasks. However, model training is often difficult due to label scarcity. Additionally, most models in histopathology leverage only image data. We introduce CONCH, a visual-language foundation model developed using diverse sources of histopathology images, biomedical text, and over 1.17 million image-caption pairs via task-agnostic pretraining. Evaluated on 14 diverse benchmarks, CONCH achieves state-of-the-art performance on histology image classification, segmentation, captioning, and cross-modal retrieval.
DOI
10.1038/s41591-024-02856-4
DNA Foundation Benchmark (DNA FM Benchmark)
PUBMED_LINK
FULL NAME
Benchmarking DNA Foundation Models for Genomic and Genetic Tasks
DESCRIPTION
First comprehensive, unbiased benchmark of five DNA foundation models (DNABERT-2, Nucleotide Transformer V2, HyenaDNA, Caduceus-Ph, GROVER) across 57 datasets spanning sequence classification, gene expression prediction, variant effect quantification, and TAD recognition using zero-shot embeddings. Key finding: mean token embedding pooling consistently outperforms other strategies. Model choice should align with task — Caduceus-Ph excels at TFBS, NT-v2 at pathogenic variants, HyenaDNA scales to long sequences. Specialized models (Enformer, Sei) still outperform general DNA models on QTL prediction.
URL
TITLE
Benchmarking DNA foundation models for genomic and genetic tasks.
Main citation
Feng H, Wu L, Zhao B, Huff C, Zhang J, Wu J, Lin L, Wei P, Wu C. (2025) Benchmarking DNA foundation models for genomic and genetic tasks. Nature Communications, 16:10780. doi:10.1038/s41467-025-65823-8. PMID 41315262
ABSTRACT
The rapid evolution of DNA foundation models promises to revolutionize genomics, yet comprehensive evaluations are lacking. Here, we present a comprehensive, unbiased benchmark of five models (DNABERT-2, Nucleotide Transformer V2, HyenaDNA, Caduceus-Ph, and GROVER) across diverse genomic and genetic tasks including sequence classification, gene expression prediction, variant effect quantification, and TAD region recognition, using zero-shot embeddings. Our analysis reveals that mean token embedding consistently and significantly improves sequence classification performance. Model performance varies among tasks and datasets; while general purpose DNA foundation models showed competitive performance in pathogenic variant identification, they were less effective in predicting gene expression and identifying putative causal QTLs compared to specialized models.
DOI
10.1038/s41467-025-65823-8
KEEP
PUBMED_LINK
FULL NAME
KEEP — Knowledge-Enhanced Pathology Vision-Language Foundation Model
DESCRIPTION
KEEP (KnowledgE-Enhanced Pathology) is a vision-language foundation model from Shanghai AI Lab / SJTU that systematically integrates disease knowledge into pretraining for cancer diagnosis. Uses a comprehensive disease knowledge graph with 11,454 diseases and 139,143 attributes from DO and UMLS to reorganize millions of pathology image-text pairs into 143,000 semantically structured groups aligned with disease ontology hierarchies. Across 18 public benchmarks (14,000+ WSIs) and 4 institutional rare cancer datasets (926 cases), KEEP consistently outperforms existing foundation models (CHIEF, CONCH, UNI), with substantial gains for rare subtypes (+8.5 pts balanced accuracy vs CONCH on 30 rare brain cancers). Published in Cancer Cell, Feb 2026.
URL
TITLE
Knowledge-enhanced pretraining for vision-language pathology foundation model on cancer diagnosis.
Main citation
Zhou X, Sun L, He D, Guan W, Wang G, Wang R, Wang L, Yuan X, Sun X, Zhang Y, Sun K, Wang Y, Xie W. (2026) Knowledge-enhanced pretraining for vision-language pathology foundation model on cancer diagnosis. Cancer Cell, 44(4):777-791. doi:10.1016/j.ccell.2026.01.019. PMID 41720085
ABSTRACT
Vision-language foundation models have shown great promise in computational pathology but remain primarily data-driven, lacking explicit integration of medical knowledge. We introduce KEEP, a foundation model that systematically incorporates disease knowledge into pretraining for cancer diagnosis. KEEP leverages a comprehensive disease knowledge graph encompassing 11,454 diseases and 139,143 attributes to reorganize millions of pathology image-text pairs into 143,000 semantically structured groups aligned with disease ontology hierarchies. Across 18 public benchmarks (over 14,000 WSIs) and 4 institutional rare cancer datasets (926 cases), KEEP consistently outperformed existing foundation models, showing substantial gains for rare subtypes.
DOI
10.1016/j.ccell.2026.01.019
mSTAR
PUBMED_LINK
FULL NAME
mSTAR — Multimodal Self-TAught Pretraining (WSI + Reports + Gene Expression)
DESCRIPTION
mSTAR (Multimodal Self-TAught PRetraining) is a pathology foundation model from HKUST/SJTU that integrates three modalities: pathology slides (WSIs), expert pathology reports, and gene expression (RNA-Seq) data. Curates the largest multimodal dataset of 26,169 slide-level modality pairs across 32 cancer types from 10,275 TCGA patients (>116M patch images). Uses a two-stage paradigm: (1) slide-level contrastive learning across WSI-report-gene modalities, (2) self-taught training that propagates multimodal knowledge from slide aggregator (teacher) to patch extractor (student). Evaluated on 97 tasks across 15 application types, outperforming UNI, CONCH, CHIEF, and GigaPath. Key finding: multimodal integration yields greater improvements than simply expanding vision-only datasets (53x data efficiency vs Virchow). Published in Nat Commun, Dec 2025.
URL
TITLE
A multimodal knowledge-enhanced whole-slide pathology foundation model.
Main citation
Xu Y, Wang Y, Zhou F, Ma J, Yang S, Lin H, Wang X, Wang J, Liang L, Han A, Jin C, Cheng KT, Chen H. (2025) A multimodal knowledge-enhanced whole-slide pathology foundation model. Nature Communications, 16:11406. doi:10.1038/s41467-025-66220-x. PMID 41387679
ABSTRACT
Computational pathology has advanced through foundation models, yet faces challenges in multimodal integration and capturing whole-slide context. We present mSTAR, the pathology foundation model that incorporates three modalities: pathology slides, expert-created reports, and gene expression data, within a unified framework. Our dataset includes 26,169 slide-level modality pairs across 32 cancer types, comprising over 116 million patch images. This approach injects multimodal whole-slide context into patch representations, expanding modeling from single to multiple modalities and from patch-level to slide-level analysis. Across 97 tasks, mSTAR outperforms previous SOTA models, particularly in molecular prediction, revealing that multimodal integration yields greater improvements than simply expanding vision-only datasets.
DOI
10.1038/s41467-025-66220-x
PathOrchestra
PUBMED_LINK
FULL NAME
PathOrchestra — Comprehensive Pathology Foundation Model with 100+ Clinical-Grade Tasks
DESCRIPTION
PathOrchestra is a versatile pathology foundation model from Shanghai AI Lab and multiple Chinese institutions, trained via self-supervised learning on 287,424 H&E-stained WSIs from 21 tissue types across 3 independent clinical centers. Evaluated on the largest known clinical task benchmark (112 tasks: 61 private + 51 public) spanning digital slide preprocessing, pan-cancer classification (17 cancer types), lesion identification, multi-cancer subtype classification (36 tasks), biomarker assessment (36 tasks), gene expression prediction, and structured report generation. Achieves over 0.950 accuracy in 47 tasks. First model to generate structured pathology reports for colorectal cancer and lymphoma. Apache 2.0 open-source license.
URL
TITLE
PathOrchestra: a comprehensive foundation model for computational pathology with over 100 diverse clinical-grade tasks.
Main citation
Yan F, et al. (2025) PathOrchestra: a comprehensive foundation model for computational pathology with over 100 diverse clinical-grade tasks. npj Digital Medicine, 8(1):695. doi:10.1038/s41746-025-02027-w. PMID 41258399
ABSTRACT
The complexity and variability of high-resolution pathological images present significant challenges in computational pathology. We present PathOrchestra, a versatile pathology foundation model trained via self-supervised learning on 287,424 slides from 21 tissue types across three centers. Evaluated on 112 tasks from 61 private and 51 public datasets, covering digital slide preprocessing, pan-cancer classification, lesion identification, multi-cancer subtype classification, biomarker assessment, gene expression prediction, and structured report generation. Across 27,755 WSIs and 9,415,729 ROI images, it achieved over 0.950 accuracy in 47 tasks. It is the first to generate structured reports for colorectal cancer and lymphoma.
DOI
10.1038/s41746-025-02027-w
Prov-GigaPath
PUBMED_LINK
FULL NAME
Prov-GigaPath — Whole-Slide Foundation Model for Digital Pathology
DESCRIPTION
Prov-GigaPath by Microsoft Research, Providence, and UW is a whole-slide pathology foundation model pretrained on 1.3 billion 256x256 image tiles from 171,189 whole slides across 28 cancer centers (>30,000 patients, 31 tissue types). Uses a novel GigaPath vision transformer with dilated self-attention (LongNet) for gigapixel-level context. Achieves SOTA on 25/26 benchmark tasks including cancer subtyping, mutation prediction, and TMB classification. The first large-scale whole-slide foundation model trained on real-world clinical data.
URL
TITLE
A whole-slide foundation model for digital pathology from real-world data.
Main citation
Xu H, Usuyama N, Bagal V, Bredell M, Chamby A, Chen Z, Ding J, Fuhlbrück T, Géro Z, Gonzalez J, Gu Y, Xu Y, Wei MH, Wang W, Ma S, Wei F, Yang J, Li C, Gao J, Rosemon J, Bower T, Lee S, Weerasinghe R, Wright B, Robicsek A, Piening B, Bifulco C, Wang S, Poon H. (2024) A whole-slide foundation model for digital pathology from real-world data. Nature, 630(8015):181-188. doi:10.1038/s41586-024-07441-w. PMID 38778098
ABSTRACT
Digital pathology poses unique computational challenges, as a standard gigapixel slide may comprise tens of thousands of image tiles. Prior models have often resorted to subsampling a small portion of tiles for each slide, thus missing important slide-level context. Here we present Prov-GigaPath, a whole-slide pathology foundation model pretrained on 1.3 billion pathology image tiles in 171,189 whole slides from Providence, a large US health network comprising 28 cancer centres. To pretrain Prov-GigaPath, we propose GigaPath, a novel vision transformer for pretraining gigapixel pathology slides using dilated self-attention. Prov-GigaPath attains state-of-the-art performance on 25 out of 26 benchmark tasks.
DOI
10.1038/s41586-024-07441-w
scGPT
PUBMED_LINK
FULL NAME
scGPT — Foundation Model for Single-Cell Multi-Omics Using Generative AI
DESCRIPTION
scGPT is a generative pretrained transformer foundation model for single-cell biology, pretrained on over 33 million human cells from 51 organs across 441 studies. Uses a GPT architecture adapted for gene expression data with a specialized attention mask. Outperforms traditional methods on cell type annotation, multi-batch integration, multi-omic integration, perturbation response prediction, and gene network inference. Represents a foundational AI model for cellular biology analogous to GPT for natural language.
URL
TITLE
scGPT: toward building a foundation model for single-cell multi-omics using generative AI.
Main citation
Cui H, Wang C, Maan H, Pang K, Luo F, Duan N, Wang B. (2024) scGPT: toward building a foundation model for single-cell multi-omics using generative AI. Nature Methods, 21(8):1470-1480. doi:10.1038/s41592-024-02201-0. PMID 38840054
ABSTRACT
Generative pretrained models have achieved remarkable success in various domains such as language and computer vision. Using burgeoning single-cell sequencing data, we have constructed a foundation model for single-cell biology, scGPT, based on a generative pretrained transformer across a repository of over 33 million cells. Our findings illustrate that scGPT effectively distills critical biological insights concerning genes and cells. Through further adaptation of transfer learning, scGPT can be optimized to achieve superior performance across diverse downstream applications including cell type annotation, multi-batch integration, multi-omic integration, perturbation response prediction and gene network inference.
DOI
10.1038/s41592-024-02201-0
TITAN
PUBMED_LINK
FULL NAME
TITAN — Transformer-based pathology Image and Text Alignment Network
DESCRIPTION
TITAN (Transformer-based pathology Image and Text Alignment Network) is a multimodal whole-slide foundation model from Mahmood Lab (Harvard/BWH). Pretrained on 335,645 WSIs via visual self-supervised learning and vision-language alignment with 423K synthetic captions from PathChat + 183K pathology reports. Without any fine-tuning, TITAN produces general-purpose slide representations for zero-shot classification, rare cancer retrieval, cross-modal retrieval, and pathology report generation. Outperforms both ROI and slide foundation models across diverse clinical tasks.
URL
TITLE
A multimodal whole-slide foundation model for pathology.
Main citation
Ding T, Wagner SJ, Song AH, Chen RJ, Lu MY, Zhang A, Vaidya AJ, Jaume G, Shaban M, Kim A, Williamson DFK, Oldenburg L, Chen B, Alajaji A, Noor G, Sang Y, Peng T, Le LP, Mahmood F. (2025) A multimodal whole-slide foundation model for pathology. Nature Medicine, 31:3749-3761. doi:10.1038/s41591-025-03982-3. PMID 41193692
ABSTRACT
The field of computational pathology has been transformed with recent advances in foundation models that encode histopathology region-of-interests into versatile feature representations. However, translating these advancements to address complex clinical challenges at the patient and slide level remains constrained by limited clinical data. We propose TITAN, a multimodal whole-slide foundation model pretrained using 335,645 whole-slide images via visual self-supervised learning and vision-language alignment with pathology reports and 423,122 synthetic captions. Without any fine-tuning, TITAN can extract general-purpose slide representations and generate pathology reports that generalize to resource-limited clinical scenarios such as rare disease retrieval and cancer prognosis.
DOI
10.1038/s41591-025-03982-3
UNI
PUBMED_LINK
FULL NAME
UNI — General-Purpose Foundation Model for Computational Pathology
DESCRIPTION
UNI is a general-purpose self-supervised foundation model for computational pathology from Mahmood Lab (Harvard/BWH), pretrained on >100 million images from >100,000 H&E-stained WSIs (>77 TB) across 20 tissue types. Evaluated on 34 representative CPath tasks — outperforming prior models across cancer classification, organ transplant assessment, and rare disease analysis. Demonstrates resolution-agnostic classification, few-shot slide classification, and generalization to 108 cancer types in the OncoTree system. 1,300+ citations.
URL
TITLE
Towards a general-purpose foundation model for computational pathology.
Main citation
Chen RJ, Ding T, Lu MY, Williamson DFK, Jaume G, Chen B, Zhang A, Shao D, Song AH, Shaban M, Williams M, Oldenburg L, Weishaupt LL, Wang JJ, Vaidya A, Le LP, Gerber G, Sahai S, Williams W, Mahmood F. (2024) Towards a general-purpose foundation model for computational pathology. Nature Medicine, 30(3):850-862. doi:10.1038/s41591-024-02857-3. PMID 38504018
ABSTRACT
Quantitative evaluation of tissue images is crucial for computational pathology tasks. The high resolution of WSIs and the variability of morphological features present significant challenges. We introduce UNI, a general-purpose self-supervised model for pathology, pretrained using more than 100 million images from over 100,000 diagnostic H&E-stained WSIs across 20 major tissue types. The model was evaluated on 34 representative CPath tasks. UNI outperforms previous state-of-the-art models and demonstrates new capabilities including resolution-agnostic tissue classification, few-shot slide classification, and disease subtyping generalization to 108 cancer types.
DOI
10.1038/s41591-024-02857-3
Virchow
PUBMED_LINK
FULL NAME
Virchow — Million-Scale Digital Pathology Foundation Model (Paige/Microsoft)
DESCRIPTION
Virchow is the first million-slide foundation model for computational pathology, developed by Paige in collaboration with Microsoft. A 632M-parameter ViT-H model trained using DINOv2 on 1.5 million H&E-stained WSIs from MSKCC (17 tissue types). Demonstrates clinical-grade pan-cancer detection with 0.95 AUC across nine common and seven rare cancers. With less training data, the pan-cancer detector built on Virchow achieves similar performance to tissue-specific clinical-grade models in production, outperforming them on rare cancer variants. Serves as the foundation for Paige's Virchow2 (3M WSIs, multimodal) and Virchow2G (1.8B parameters) models.
URL
TITLE
A foundation model for clinical-grade computational pathology and rare cancers detection.
Main citation
Vorontsov E, Bozkurt A, Casson A, Shaikovski G, Zelechowski M, Severson K, Zimmermann E, Hall J, Tenenholtz N, Fusi N, Yang E, Mathieu P, van Eck A, Lee D, Viret J, Robert E, Wang YK, Kunz JD, Lee MCH, Bernhard JH, Godrich RA, Oakley G, Millar E, Hanna M, Wen H, Retamero JA, Moye WA, Yousfi R, Kanan C, Klimstra DS, Rothrock B, Liu S, Fuchs TJ. (2024) A foundation model for clinical-grade computational pathology and rare cancers detection. Nature Medicine, 30(10):2924-2935. doi:10.1038/s41591-024-03141-0. PMID 39080966
ABSTRACT
The analysis of histopathology images with artificial intelligence aims to enable clinical decision support systems and precision medicine. We present Virchow, the largest foundation model for computational pathology to date. In addition to the evaluation of biomarker prediction and cell identification, we demonstrate that a large foundation model enables pan-cancer detection, achieving 0.95 specimen-level AUC across nine common and seven rare cancers. With less training data, the pan-cancer detector built on Virchow achieved similar performance to tissue-specific clinical-grade models in production and outperformed them on some rare variants of cancer.
DOI
10.1038/s41591-024-03141-0