Machine Learning
Catalog entries using this tag (links open the entry card on its page):
Entries
Causal ML for scGenomics (Causal ML sc)
PUBMED_LINK
FULL NAME
Causal Machine Learning for Single-Cell Genomics
DESCRIPTION
A Perspective from Nature Genetics delineating the application of causal machine learning to single-cell genomics. Discusses causal models, challenges in inferring causative roles of genes from single-cell omics data combined with perturbation screens, and the potential for integrating causal ML with GWAS to understand disease mechanisms at single-cell resolution.
TITLE
Causal machine learning for single-cell genomics.
ABSTRACT
Advances in single-cell '-omics' allow unprecedented insights into the transcriptional profiles of individual cells and, when combined with large-scale perturbation screens, enable measuring of the effect of targeted perturbations on the whole transcriptome. In this Perspective, we delineate the application of causal machine learning to single-cell genomics and its associated challenges, presenting the causal model most commonly applied to single-cell biology.
DOI
10.1038/s41588-025-02124-2
Haas ME (ML Liver Fat GWAS)
PUBMED_LINK
FULL NAME
Machine Learning Enables New Insights into Genetic Contributions to Liver Fat Accumulation
DESCRIPTION
Developed an abdominal MRI-based machine-learning regression model (gradient-boosted regression on raw MRI signal intensities) to accurately estimate liver fat from UK Biobank abdominal MRI scans (correlation 0.97-0.99 with ground truth). Trained on 4,511 participants with gold-standard MRI biomarker measurements and applied to 32,192 additional individuals. GWAS identified 8 associated variants (5 novel: MTARC1, ADH1B, TRIB1, GPAM, MAST3) and a polygenic score strongly associated with future chronic liver disease risk (HR>1.32 per SD, p<9e-17).
KEYWORDS
MRI signal regression, liver fat quantification, abdominal MRI, hepatic steatosis, gradient boosting, UK Biobank
TITLE
Machine learning enables new insights into genetic contributions to liver fat accumulation.
Main citation
Haas ME, Pirruccello JP, Friedman SN, Wang M, ...&, Khera AV. (2021) Machine learning enables new insights into genetic contributions to liver fat accumulation. Cell Genom, 1 (3). doi:10.1016/j.xgen.2021.100066. PMID 34957434
ABSTRACT
Excess liver fat, called hepatic steatosis, is a leading risk factor for end-stage liver disease and cardiometabolic diseases but often remains undiagnosed in clinical practice because of the need for direct imaging assessments. We developed an abdominal MRI-based machine-learning algorithm to accurately estimate liver fat from a truth dataset of 4,511 middle-aged UK Biobank participants, enabling quantification in 32,192 additional individuals. A genome-wide association study of common genetic variants and liver fat replicated three known associations and identified five newly associated variants.
DOI
10.1016/j.xgen.2021.100066
MILTON
PUBMED_LINK
FULL NAME
MILTON - Machine Learning with Phenotype Associations for Disease Prediction
DESCRIPTION
MILTON is an ensemble machine learning framework that utilizes biomarkers and multi-omics data to predict 3,213 diseases in the UK Biobank. It predicts incident disease cases undiagnosed at time of recruitment and demonstrates utility in augmenting genetic association discovery by empowering case-control GWAS with predicted phenotypes. Published in Nature Genetics.
TITLE
Disease prediction with multi-omics and biomarkers empowers case-control genetic discoveries in the UK Biobank.
ABSTRACT
The emergence of biobank-level datasets offers new opportunities to discover novel biomarkers and develop predictive algorithms for human disease. Here, we present an ensemble machine-learning framework (machine learning with phenotype associations, MILTON) utilizing a range of biomarkers to predict 3,213 diseases in the UK Biobank. MILTON predicts incident disease cases undiagnosed at time of recruitment, largely outperforming available polygenic risk scores, and augments genetic association discovery.
DOI
10.1038/s41588-024-01898-1
PoPS
PUBMED_LINK
FULL NAME
PoPS - Polygenic Priority Score for Gene Prioritization
DESCRIPTION
PoPS (Polygenic Priority Score) is a method that learns trait-relevant gene features, such as cell-type-specific expression, to prioritize genes at GWAS loci. It leverages polygenic enrichments across multiple gene features to predict causal genes underlying complex traits and diseases. Published in Nature Genetics.
URL
TITLE
Leveraging polygenic enrichments of gene features to predict genes underlying complex traits and diseases.
ABSTRACT
Genome-wide association studies (GWASs) are a valuable tool for understanding the biology of complex human traits and diseases, but associated variants rarely point directly to causal genes. In the present study, we introduce a new method, polygenic priority score (PoPS), that learns trait-relevant gene features, such as cell-type-specific expression, to prioritize genes at GWAS loci. PoPS and the closest gene individually outperform other gene prioritization methods.
DOI
10.1038/s41588-023-01443-6
SynSurr
PUBMED_LINK
FULL NAME
SynSurr - Synthetic Surrogates for GWAS of Missing Phenotypes
DESCRIPTION
SynSurr (Synthetic Surrogate analysis) is a method that makes GWAS on imputed phenotypes robust to imputation errors. Rather than replacing missing values, SynSurr jointly analyzes the observed and imputed data to provide calibrated association statistics, improving power for genome-wide association studies of partially missing phenotypes in population biobanks. Published in Nature Genetics.
TITLE
Synthetic surrogates improve power for genome-wide association studies of partially missing phenotypes in population biobanks.
ABSTRACT
Within population biobanks, incomplete measurement of certain traits limits the power for genetic discovery. Machine learning is increasingly used to impute the missing values from the available data. However, performing GWAS on imputed traits can introduce spurious associations. Here we introduce SynSurr analysis, which makes GWAS on imputed phenotypes robust to imputation errors by jointly analyzing observed and imputed data.
DOI
10.1038/s41588-024-01793-9