Topic Modeling
Catalog entries using this tag (links open the entry card on its page):
Entries
MixEHR-SAGE
PUBMED_LINK
FULL NAME
MixEHR-SAGE - Multi-modal Topic Modeling for PheWAS and GWAS
DESCRIPTION
MixEHR-SAGE is a PheCode-guided multi-modal topic model that integrates diagnoses, procedures, and medications from EHR to enhance phenotyping for GWAS. By combining expert-informed priors with probabilistic inference, it identifies over 1000 interpretable phenotype topics from UK Biobank data and improves disease incidence prediction and GWAS discovery. Published in Briefings in Bioinformatics.
TITLE
PheCode-guided multi-modal topic modeling of electronic health records improves disease incidence prediction and GWAS discovery from UK Biobank.
ABSTRACT
Phenome-wide association studies rely on disease definitions derived from diagnostic codes, often failing to leverage the full richness of electronic health records (EHR). We present MixEHR-SAGE, a PheCode-guided multi-modal topic model that integrates diagnoses, procedures, and medications to enhance phenotyping from large-scale EHRs. Applied to 350,000 individuals with high-quality genetic data, MixEHR-SAGE-derived risk scores accurately predicted disease incidence and improved GWAS discovery.
DOI
10.1093/bib/bbag030