Skip to content

Polygenic_risk_scores

Summary Table

NAME CATEGORY CITATION YEAR
Benchmark-Wang Benchmark Wang C, Zhang J, Veldsman WP, Zhou X, ...&, Zhang L. (2022) A comprehensive investigation of statistical and machine learning approaches for predicting complex human diseases on genomic variants Brief. Bioinform., () . doi:10.1093/bib/bbac552. PMID 36585786 2022
Ellis CA Bias Ellis CA, Oliver KL, Harris RV, Ottman R, ...&, Bahlo M. (2024) Inflation of polygenic risk scores caused by sample overlap and relatedness: Examples of a major risk of bias Am. J. Hum. Genet., 0 (0) . doi:10.1016/j.ajhg.2024.07.014. PMID 39168121 2024
PRS credible intervals Bias Ding Y, Hou K, Burch KS, Lapinska S, ...&, Pasaniuc B. (2022) Large uncertainty in individual polygenic risk score estimation impacts PRS-based risk stratification Nat. Genet., 54 (1) 30-39. doi:10.1038/s41588-021-00961-5. PMID 34931067 2022
BridgePRS Cross-population Hoggart CJ, Choi SW, García-González J, Souaiaia T, ...&, O'Reilly PF. (2023) BridgePRS leverages shared genetic effects across ancestries to increase polygenic risk score portability Nat. Genet., 56 (1) 180-186. doi:10.1038/s41588-023-01583-9. PMID 38123642 2023
CT-SLEB Cross-population Zhang H, Zhan J, Jin J, Zhang J, ...&, Chatterjee N. (2023) A new method for multiancestry polygenic prediction improves performance across diverse populations Nat. Genet., () . doi:10.1038/s41588-023-01501-z. PMID 37749244 2023
PROSPER Cross-population Zhang J, Zhan J, Jin J, Ma C, ...&, Chatterjee N. (2024) An ensemble penalized regression method for multi-ancestry polygenic risk prediction Nat. Commun., 15 (1) 3238. doi:10.1038/s41467-024-47357-7. PMID 38622117 2024
PRS-CSx Cross-population Ruan Y, Lin YF, Feng YC, Chen CY, ...&, Ge T. (2022) Improving polygenic prediction in ancestrally diverse populations Nat. Genet., 54 (5) 573-580. doi:10.1038/s41588-022-01054-7. PMID 35513724 2022
PRS-FH Cross-population Hujoel MLA, Loh PR, Neale BM, Price AL. (2022) Incorporating family history of disease improves polygenic risk scores in diverse populations Cell Genom., 2 (7) 100152. doi:10.1016/j.xgen.2022.100152. PMID 35935918 2022
SDPRX Cross-population Zhou G, Chen T, Zhao H. (2023) SDPRX: A statistical method for cross-population prediction of complex traits Am. J. Hum. Genet., 110 (1) 13-22. doi:10.1016/j.ajhg.2022.11.007. PMID 36460009 2023
TL-PRS Cross-population Zhao Z, Fritsche LG, Smith JA, Mukherjee B, ...&, Lee S. (2022) The construction of cross-population polygenic risk scores using transfer learning Am. J. Hum. Genet., 109 (11) 1998-2008. doi:10.1016/j.ajhg.2022.09.010. PMID 36240765 2022
shaPRS Cross-population Kelemen M, Vigorito E, Fachal L, Anderson CA, ...&, Wallace C. (2024) shaPRS: Leveraging shared genetic effects across traits or ancestries improves accuracy of polygenic scores Am. J. Hum. Genet., () . doi:10.1016/j.ajhg.2024.04.009. PMID 38703768 2024
DDx-PRS Cross-trait Peyrot, W. J., Panagiotaropoulou, G., Olde Loohuis, L. M., Adams, M., Awasthi, S., Ge, T., ... & Price, A. L. (2024). Distinguishing different psychiatric disorders using DDx-PRS. medRxiv, 2024-02. NA
MiXeR Cross-trait Frei O, Holland D, Smeland OB, Shadrin AA, ...&, Dale AM. (2019) Bivariate causal mixture model quantifies polygenic overlap between complex traits beyond genetic correlation Nat. Commun., 10 (1) 2417. doi:10.1038/s41467-019-10310-0. PMID 31160569 2019
Multi-PGS Cross-trait Albiñana C, Zhu Z, Schork AJ, Ingason A, ...&, Vilhjálmsson BJ. (2023) Multi-PGS enhances polygenic prediction by combining 937 polygenic scores Nat. Commun., 14 (1) 4702. doi:10.1038/s41467-023-40330-w. PMID 37543680 2023
PUMA-CUBS Cross-trait Zhao, Zijie, et al. "Optimizing and benchmarking polygenic risk scores with GWAS summary statistics." bioRxiv (2022). NA
wMT-SBLUP Cross-trait Maier RM, Zhu Z, Lee SH, Trzaskowski M, ...&, Robinson MR. (2018) Improving genetic prediction by leveraging genetic correlations among human diseases and traits Nat. Commun., 9 (1) 989. doi:10.1038/s41467-017-02769-6. PMID 29515099 2018
PRSet Pathway Choi SW, García-González J, Ruan Y, Wu HM, ...&, O'Reilly PF. (2023) PRSet: Pathway-based polygenic risk score analyses and software PLoS Genet., 19 (2) e1010624. doi:10.1371/journal.pgen.1010624. PMID 36749789 2023
PLINK2 Pipeline Chang CC, Chow CC, Tellier LC, Vattikuti S, ...&, Lee JJ. (2015) Second-generation PLINK: rising to the challenge of larger and richer datasets Gigascience, 4 (1) 7. doi:10.1186/s13742-015-0047-8. PMID 25722852 2015
PRSice-2 Pipeline Choi SW, O'Reilly PF. (2019) PRSice-2: Polygenic Risk Score software for biobank-scale data Gigascience, 8 (7) 1-6. doi:10.1093/gigascience/giz082. PMID 31307061 2019
pgsc_calc Pipeline Lambert, Wingfield et al. (2024) The Polygenic Score Catalog: new functionality and tools to enable FAIR research. medRxiv. doi:10.1101/2024.05.29.24307783. NA
Cancer PRSweb Platform Fritsche LG, Patil S, Beesley LJ, VandeHaar P, ...&, Mukherjee B. (2020) Cancer PRSweb: An online repository with polygenic risk scores for major cancer traits and their evaluation in two independent biobanks Am. J. Hum. Genet., 107 (5) 815-836. doi:10.1016/j.ajhg.2020.08.025. PMID 32991828 2020
ExPRSweb Platform Ma Y, Patil S, Zhou X, Mukherjee B, ...&, Fritsche LG. (2022) ExPRSweb: An online repository with polygenic risk scores for common health-related exposures Am. J. Hum. Genet., 109 (10) 1742-1760. doi:10.1016/j.ajhg.2022.09.001. PMID 36152628 2022
PGSCatalog Platform Lambert SA, Gil L, Jupp S, Ritchie SC, ...&, Inouye M. (2021) The Polygenic Score Catalog as an open database for reproducibility and systematic evaluation Nat. Genet., 53 (4) 420-425. doi:10.1038/s41588-021-00783-5. PMID 33692568 2021
PGSFusion Platform Yang, S., Ye, X., Ji, X., Li, Z., Tian, M., Huang, P., & Cao, C. (2024). PGSFusion streamlines polygenic score construction and epidemiological applications in biobank-scale cohorts. bioRxiv, 2024-08. NA
PRS atlas Platform Richardson TG, Harrison S, Hemani G, Davey Smith G. (2019) An atlas of polygenic risk score associations to highlight putative causal relationships across the human phenome Elife, 8 () . doi:10.7554/eLife.43657. PMID 30835202 2019
metabolites PRS atlas Platform Fang S, Holmes MV, Gaunt TR, Davey Smith G, ...&, Richardson TG. (2022) Constructing an atlas of associations between polygenic scores from across the human phenome and circulating metabolic biomarkers Elife, 11 () e73951. doi:10.7554/eLife.73951. PMID 36219204 2022
BayesR Polygenicity Moser G, Lee SH, Hayes BJ, Goddard ME, ...&, Visscher PM. (2015) Simultaneous discovery, estimation and prediction analysis of complex traits using a bayesian mixture model PLoS Genet., 11 (4) e1004969. doi:10.1371/journal.pgen.1004969. PMID 25849665 2015
BayesS Polygenicity Zeng J, de Vlaming R, Wu Y, Robinson MR, ...&, Yang J. (2018) Signatures of negative selection in the genetic architecture of human complex traits Nat. Genet., 50 (5) 746-753. doi:10.1038/s41588-018-0101-4. PMID 29662166 2018
SBayesRC Polygenicity Zheng Z, Liu S, Sidorenko J, Wang Y, ...&, Zeng J. (2024) Leveraging functional genomic annotations and genome coverage to improve polygenic prediction of complex traits within and between ancestries Nat. Genet., 56 (5) 767-777. doi:10.1038/s41588-024-01704-y. PMID 38689000 2024
SBayesR Polygenicity Lloyd-Jones LR, Zeng J, Sidorenko J, Yengo L, ...&, Visscher PM. (2019) Improved polygenic prediction by Bayesian multiple regression on summary statistics Nat. Commun., 10 (1) 5086. doi:10.1038/s41467-019-12653-0. PMID 31704910 2019
SBayesS Polygenicity Zeng J, Xue A, Jiang L, Lloyd-Jones LR, ...&, Yang J. (2021) Widespread signatures of natural selection across human complex traits and functional genomic categories Nat. Commun., 12 (1) 1164. doi:10.1038/s41467-021-21446-3. PMID 33608517 2021
Review-Kachuri Review Kachuri L, Chatterjee N, Hirbo J, Schaid DJ, ...&, Ge T. (2023) Principles and methods for transferring polygenic risk scores across global populations Nat. Rev. Genet., 25 (1) 8-25. doi:10.1038/s41576-023-00637-2. PMID 37620596 2023
Review-Peter Review Visscher PM, Yengo L, Cox NJ, Wray NR. (2021) Discovery and implications of polygenicity of common diseases Science, 373 (6562) 1468-1473. doi:10.1126/science.abi8206. PMID 34554790 2021
Review-Wang Review Wang Y, Tsuo K, Kanai M, Neale BM, ...&, Martin AR. (2022) Challenges and opportunities for developing more generalizable polygenic risk scores Annu. Rev. Biomed. Data Sci., 5 (1) 293-320. doi:10.1146/annurev-biodatasci-111721-074830. PMID 35576555 2022
ALL-Sum Single-trait Chen T, Zhang H, Mazumder R, Lin X. (2024) Fast and scalable ensemble learning method for versatile polygenic risk prediction Proc. Natl. Acad. Sci. U. S. A., 121 (33) e2403210121. doi:10.1073/pnas.2403210121. PMID 39110727 2024
CalPred Single-trait Hou K, Xu Z, Ding Y, Mandla R, ...&, Pasaniuc B. (2024) Calibrated prediction intervals for polygenic scores across diverse contexts Nat. Genet., 56 (7) 1386-1396. doi:10.1038/s41588-024-01792-w. PMID 38886587 2024
DBSLMM Single-trait Yang S, Zhou X. (2020) Accurate and scalable construction of polygenic scores in large biobank data sets Am. J. Hum. Genet., 106 (5) 679-693. doi:10.1016/j.ajhg.2020.03.013. PMID 32330416 2020
GMRM Single-trait Orliac EJ, Trejo Banos D, Ojavee SE, Läll K, ...&, Robinson MR. (2022) Improving GWAS discovery and genomic prediction accuracy in biobank data Proc. Natl. Acad. Sci. U. S. A., 119 (31) e2121279119. doi:10.1073/pnas.2121279119. PMID 35905320 2022
GRPa-PRS Single-trait Li X, Fernandes BS, Liu A, Chen J, ...&, Dai Y. (2024) GRPa-PRS: A risk stratification method to identify genetically-regulated pathways in polygenic diseases medRxiv, () 2023.06.19.23291621. doi:10.1101/2023.06.19.23291621. PMID 37425929 2024
GenoBoost Single-trait Ohta R, Tanigawa Y, Suzuki Y, Kellis M, ...&, Morishita S. (2024) A polygenic score method boosted by non-additive models Nat. Commun., 15 (1) 4433. doi:10.1038/s41467-024-48654-x. PMID 38811555 2024
LDpred-funct Single-trait Márquez-Luna C, Gazal S, Loh PR, Kim SS, ...&, Price AL. (2021) Incorporating functional priors improves polygenic prediction accuracy in UK Biobank and 23andMe data sets Nat. Commun., 12 (1) 6052. doi:10.1038/s41467-021-25171-9. PMID 34663819 2021
LDpred2-auto Single-trait Privé F, Albiñana C, Arbel J, Pasaniuc B, ...&, Vilhjálmsson BJ. (2023) Inferring disease architecture and predictive ability with LDpred2-auto Am. J. Hum. Genet., 110 (12) 2042-2055. doi:10.1016/j.ajhg.2023.10.010. PMID 37944514 2023
LDpred2 Single-trait Privé F, Arbel J, Vilhjálmsson BJ. (2021) LDpred2: better, faster, stronger Bioinformatics, 36 (22-23) 5424-5431. doi:10.1093/bioinformatics/btaa1029. PMID 33326037 2021
LDpred Single-trait Vilhjálmsson BJ, Yang J, Finucane HK, Gusev A, ...&, Price AL. (2015) Modeling linkage disequilibrium increases accuracy of polygenic risk scores Am. J. Hum. Genet., 97 (4) 576-592. doi:10.1016/j.ajhg.2015.09.001. PMID 26430803 2015
MegaPRS Single-trait Zhang Q, Privé F, Vilhjálmsson B, Speed D. (2021) Improved genetic prediction of complex traits from individual-level data or summary statistics Nat. Commun., 12 (1) 4192. doi:10.1038/s41467-021-24485-y. PMID 34234142 2021
MiXeR Single-trait Holland D, Frei O, Desikan R, Fan CC, ...&, Dale AM. (2020) Beyond SNP heritability: Polygenicity and discoverability of phenotypes estimated with a univariate Gaussian mixture model PLoS Genet., 16 (5) e1008612. doi:10.1371/journal.pgen.1008612. PMID 32427991 2020
MultiBLUP Single-trait Speed D, Balding DJ. (2014) MultiBLUP: improved SNP-based prediction for complex traits Genome Res., 24 (9) 1550-1557. doi:10.1101/gr.169375.113. PMID 24963154 2014
PRS-CS Single-trait Ge T, Chen CY, Ni Y, Feng YA, ...&, Smoller JW. (2019) Polygenic prediction via Bayesian regression and continuous shrinkage priors Nat. Commun., 10 (1) 1776. doi:10.1038/s41467-019-09718-5. PMID 30992449 2019
PRSMix_AOI Single-trait Misra, A. et al. Instability of high polygenic risk classification and mitigation by integrative scoring. bioRxiv 2024.07.24.24310897 (2024) doi:10.1101/2024.07.24.24310897. NA
PRS_to_Abs Single-trait Pain O, Gillett AC, Austin JC, Folkersen L, ...&, Lewis CM. (2022) A tool for translating polygenic scores onto the absolute scale using summary statistics Eur. J. Hum. Genet., 30 (3) 339-348. doi:10.1038/s41431-021-01028-z. PMID 34983942 2022
PRStuning Single-trait Jiang, W., Chen, L., Girgenti, M. J., & Zhao, H. (2023). Tuning Parameters for Polygenic Risk Score Methods Using GWAS Summary Statistics from Training Data. Research Square. NA
SDPR Single-trait Zhou G, Zhao H. (2021) A fast and robust Bayesian nonparametric method for prediction of complex traits using summary statistics PLoS Genet., 17 (7) e1009697. doi:10.1371/journal.pgen.1009697. PMID 34310601 2021
VIPRS Single-trait Zabad S, Gravel S, Li Y. (2023) Fast and accurate Bayesian polygenic risk modeling with variational inference Am. J. Hum. Genet., 110 (5) 741-761. doi:10.1016/j.ajhg.2023.03.009. PMID 37030289 2023
lassosum2 Single-trait Privé F, Arbel J, Aschard H, Vilhjálmsson BJ. (2022) Identifying and correcting for misspecifications in GWAS summary statistics and polygenic scores HGG Adv, 3 (4) 100136. doi:10.1016/j.xhgg.2022.100136. PMID 36105883 2022
lassosum Single-trait Mak TSH, Porsch RM, Choi SW, Zhou X, ...&, Sham PC. (2017) Polygenic scores via penalized regression on summary statistics Genet. Epidemiol., 41 (6) 469-480. doi:10.1002/gepi.22050. PMID 28480976 2017
meta-PRS Single-trait Albiñana C, Grove J, McGrath JJ, Agerbo E, ...&, Vilhjálmsson BJ. (2021) Leveraging both individual-level genetic data and GWAS summary statistics increases polygenic prediction Am. J. Hum. Genet., 108 (6) 1001-1011. doi:10.1016/j.ajhg.2021.04.014. PMID 33964208 2021
rtPRS-CS Single-trait Tubbs, J. D., Chen, Y., Duan, R., Huang, H. & Ge, T. Real-time dynamic polygenic prediction for streaming data. bioRxiv 2024.07.12.24310357 (2024) doi:10.1101/2024.07.12.24310357. NA
PRS-RS Standards Wand H, Lambert SA, Tamburro C, Iacocca MA, ...&, Wojcik GL. (2021) Improving reporting standards for polygenic scores in risk prediction studies Nature, 591 (7849) 211-219. doi:10.1038/s41586-021-03243-6. PMID 33692554 2021
Tutorial-Choi Tutorial Choi SW, Mak TS, O'Reilly PF. (2020) Tutorial: a guide to performing polygenic risk score analyses Nat. Protoc., 15 (9) 2759-2772. doi:10.1038/s41596-020-0353-1. PMID 32709988 2020

Benchmark

Benchmark-Wang

  • NAME : Benchmark-Wang
  • TITLE : A comprehensive investigation of statistical and machine learning approaches for predicting complex human diseases on genomic variants
  • DOI : 10.1093/bib/bbac552
  • ABSTRACT : Quantifying an individual's risk for common diseases is an important goal of precision health. The polygenic risk score (PRS), which aggregates multiple risk alleles of candidate diseases, has emerged as a standard approach for identifying high-risk individuals. Although several studies have been performed to benchmark the PRS calculation tools and assess their potential to guide future clinical applications, some issues remain to be further investigated, such as lacking (i) various simulated data with different genetic effects; (ii) evaluation of machine learning models and (iii) evaluation on multiple ancestries studies. In this study, we systematically validated and compared 13 statistical methods, 5 machine learning models and 2 ensemble models using simulated data with additive and genetic interaction models, 22 common diseases with internal training sets, 4 common diseases with external summary statistics and 3 common diseases for trans-ancestry studies in UK Biobank. The statistical methods were better in simulated data from additive models and machine learning models have edges for data that include genetic interactions. Ensemble models are generally the best choice by integrating various statistical methods. LDpred2 outperformed the other standalone tools, whereas PRS-CS, lassosum and DBSLMM showed comparable performance. We also identified that disease heritability strongly affected the predictive performance of all methods. Both the number and effect sizes of risk SNPs are important; and sample size strongly influences the performance of all methods. For the trans-ancestry studies, we found that the performance of most methods became worse when training and testing sets were from different populations.
  • CITATION : Wang C, Zhang J, Veldsman WP, Zhou X, ...&, Zhang L. (2022) A comprehensive investigation of statistical and machine learning approaches for predicting complex human diseases on genomic variants Brief. Bioinform., () . doi:10.1093/bib/bbac552. PMID 36585786
  • JOURNAL_INFO : Briefings in bioinformatics ; Brief. Bioinform. ; 2022 ; ; ;
  • PUBMED_LINK : 36585786

Bias

Ellis CA

  • NAME : Ellis CA
  • TITLE : Inflation of polygenic risk scores caused by sample overlap and relatedness: Examples of a major risk of bias
  • DOI : 10.1016/j.ajhg.2024.07.014
  • ABSTRACT : Polygenic risk scores (PRSs) are an important tool for understanding the role of common genetic variants in human disease. Standard best practices recommend that PRSs be analyzed in cohorts that are independent of the genome-wide association study (GWAS) used to derive the scores without sample overlap or relatedness between the two cohorts. However, identifying sample overlap and relatedness can be challenging in an era of GWASs performed by large biobanks and international research consortia. Although most genomics researchers are aware of best practices and theoretical concerns about sample overlap and relatedness between GWAS and PRS cohorts, the prevailing assumption is that the risk of bias is small for very large GWASs. Here, we present two real-world examples demonstrating that sample overlap and relatedness is not a minor or theoretical concern but an important potential source of bias in PRS studies. Using a recently developed statistical adjustment tool, we found that excluding overlapping and related samples was equal to or more powerful than adjusting for overlap bias. Our goal is to make genomics researchers aware of the magnitude of risk of bias from sample overlap and relatedness and to highlight the need for mitigation tools, including independent validation cohorts in PRS studies, continued development of statistical adjustment methods, and tools for researchers to test their cohorts for overlap and relatedness with GWAS cohorts without sharing individual-level data.
  • CITATION : Ellis CA, Oliver KL, Harris RV, Ottman R, ...&, Bahlo M. (2024) Inflation of polygenic risk scores caused by sample overlap and relatedness: Examples of a major risk of bias Am. J. Hum. Genet., 0 (0) . doi:10.1016/j.ajhg.2024.07.014. PMID 39168121
  • JOURNAL_INFO : The American Journal of Human Genetics ; Am. J. Hum. Genet. ; 2024 ; 0 ; 0 ;
  • PUBMED_LINK : 39168121

PRS credible intervals

  • NAME : PRS credible intervals
  • SHORT NAME : PRS credible intervals
  • FULL NAME : PRS credible intervals
  • URL : https://privefl.github.io/bigsnpr/articles/prs_uncertainty.html
  • KEYWORDS : uncertainty
  • TITLE : Large uncertainty in individual polygenic risk score estimation impacts PRS-based risk stratification
  • DOI : 10.1038/s41588-021-00961-5
  • ABSTRACT : Although the cohort-level accuracy of polygenic risk scores (PRSs)-estimates of genetic value at the individual level-has been widely assessed, uncertainty in PRSs remains underexplored. In the present study, we show that Bayesian PRS methods can estimate the variance of an individual's PRS and can yield well-calibrated credible intervals via posterior sampling. For 13 real traits in the UK Biobank (n = 291,273 unrelated 'white British'), we observe large variances in individual PRS estimates which impact interpretation of PRS-based stratification; averaging across traits, only 0.8% (s.d. = 1.6%) of individuals with PRS point estimates in the top decile have corresponding 95% credible intervals fully contained in the top decile. We provide an analytical estimator for the expectation of individual PRS variance as a function of SNP heritability, number of causal SNPs and sample size. Our results showcase the importance of incorporating uncertainty in individual PRS estimates into subsequent analyses.
  • COPYRIGHT : https://www.springernature.com/gp/researchers/text-and-data-mining
  • CITATION : Ding Y, Hou K, Burch KS, Lapinska S, ...&, Pasaniuc B. (2022) Large uncertainty in individual polygenic risk score estimation impacts PRS-based risk stratification Nat. Genet., 54 (1) 30-39. doi:10.1038/s41588-021-00961-5. PMID 34931067
  • JOURNAL_INFO : Nature genetics ; Nat. Genet. ; 2022 ; 54 ; 1 ; 30-39
  • PUBMED_LINK : 34931067

Cross-population

BridgePRS

  • NAME : BridgePRS
  • SHORT NAME : BridgePRS
  • FULL NAME : BridgePRS
  • DESCRIPTION : BridgePRS is a Bayesian-ridge (Bridge) approach, which "bridges" the PRS between two populations of different ancestry, developed to tackle the "PRS Portability Problem". The PRS Portability Problem causes lower accuracy PRS in underrepresented populations due to the biased sampling in GWAS data collection.
  • URL : https://www.bridgeprs.net/
  • TITLE : BridgePRS leverages shared genetic effects across ancestries to increase polygenic risk score portability
  • DOI : 10.1038/s41588-023-01583-9
  • ABSTRACT : Here we present BridgePRS, a novel Bayesian polygenic risk score (PRS) method that leverages shared genetic effects across ancestries to increase PRS portability. We evaluate BridgePRS via simulations and real UK Biobank data across 19 traits in individuals of African, South Asian and East Asian ancestry, using both UK Biobank and Biobank Japan genome-wide association study summary statistics; out-of-cohort validation is performed in the Mount Sinai (New York) BioMe biobank. BridgePRS is compared with the leading alternative, PRS-CSx, and two other PRS methods. Simulations suggest that the performance of BridgePRS relative to PRS-CSx increases as uncertainty increases: with lower trait heritability, higher polygenicity and greater between-population genetic diversity; and when causal variants are not present in the data. In real data, BridgePRS has a 61% larger average R2 than PRS-CSx in out-of-cohort prediction of African ancestry samples in BioMe (P = 6 × 10-5). BridgePRS is a computationally efficient, user-friendly and powerful approach for PRS analyses in non-European ancestries.
  • CITATION : Hoggart CJ, Choi SW, García-González J, Souaiaia T, ...&, O'Reilly PF. (2023) BridgePRS leverages shared genetic effects across ancestries to increase polygenic risk score portability Nat. Genet., 56 (1) 180-186. doi:10.1038/s41588-023-01583-9. PMID 38123642
  • JOURNAL_INFO : Nature genetics ; Nat. Genet. ; 2023 ; 56 ; 1 ; 180-186
  • PUBMED_LINK : 38123642

CT-SLEB

  • NAME : CT-SLEB
  • SHORT NAME : CT-SLEB
  • DESCRIPTION : CT-SLEB is a method designed to generate multi-ancestry PRSs that incorporate existing large GWAS from EUR populations and smaller GWAS from non-EUR populations. The method has three key steps: 1. Clumping and Thresholding for selecting SNPs to be included in a PRS for the target population; 2. Empirical-Bayes method for estimating the coefficients of the SNPs; 3. Super-learning model to combine a series of PRSs generated under different SNP selection thresholds.
  • URL : https://github.com/andrewhaoyu/CTSLEB
  • TITLE : A new method for multiancestry polygenic prediction improves performance across diverse populations
  • DOI : 10.1038/s41588-023-01501-z
  • ABSTRACT : Polygenic risk scores (PRSs) increasingly predict complex traits; however, suboptimal performance in non-European populations raise concerns about clinical applications and health inequities. We developed CT-SLEB, a powerful and scalable method to calculate PRSs, using ancestry-specific genome-wide association study summary statistics from multiancestry training samples, integrating clumping and thresholding, empirical Bayes and superlearning. We evaluated CT-SLEB and nine alternative methods with large-scale simulated genome-wide association studies (~19 million common variants) and datasets from 23andMe, Inc., the Global Lipids Genetics Consortium, All of Us and UK Biobank, involving 5.1 million individuals of diverse ancestry, with 1.18 million individuals from four non-European populations across 13 complex traits. Results demonstrated that CT-SLEB significantly improves PRS performance in non-European populations compared with simple alternatives, with comparable or superior performance to a recent, computationally intensive method. Moreover, our simulation studies offered insights into sample size requirements and SNP density effects on multiancestry risk prediction.
  • CITATION : Zhang H, Zhan J, Jin J, Zhang J, ...&, Chatterjee N. (2023) A new method for multiancestry polygenic prediction improves performance across diverse populations Nat. Genet., () . doi:10.1038/s41588-023-01501-z. PMID 37749244
  • JOURNAL_INFO : Nature genetics ; Nat. Genet. ; 2023 ; ; ;
  • PUBMED_LINK : 37749244

PROSPER

  • NAME : PROSPER
  • SHORT NAME : PROSPER
  • FULL NAME : Polygenic Risk scOres based on enSemble of PEnalized Regression models
  • DESCRIPTION : PROSPER is a new multi-ancestry PRS method with penalized regression followed by ensemble learning. This software is a command line tool based on R programming language. Large-scale benchmarking study shows that PROSPER could be the leading method to reduce the disparity of PRS performance across ancestry groups
  • URL : https://github.com/Jingning-Zhang/PROSPER
  • TITLE : An ensemble penalized regression method for multi-ancestry polygenic risk prediction
  • DOI : 10.1038/s41467-024-47357-7
  • ABSTRACT : Great efforts are being made to develop advanced polygenic risk scores (PRS) to improve the prediction of complex traits and diseases. However, most existing PRS are primarily trained on European ancestry populations, limiting their transferability to non-European populations. In this article, we propose a novel method for generating multi-ancestry Polygenic Risk scOres based on enSemble of PEnalized Regression models (PROSPER). PROSPER integrates genome-wide association studies (GWAS) summary statistics from diverse populations to develop ancestry-specific PRS with improved predictive power for minority populations. The method uses a combination of L 1 (lasso) and L 2 (ridge) penalty functions, a parsimonious specification of the penalty parameters across populations, and an ensemble step to combine PRS generated across different penalty parameters. We evaluate the performance of PROSPER and other existing methods on large-scale simulated and real datasets, including those from 23andMe Inc., the Global Lipids Genetics Consortium, and All of Us. Results show that PROSPER can substantially improve multi-ancestry polygenic prediction compared to alternative methods across a wide variety of genetic architectures. In real data analyses, for example, PROSPER increased out-of-sample prediction R2 for continuous traits by an average of 70% compared to a state-of-the-art Bayesian method (PRS-CSx) in the African ancestry population. Further, PROSPER is computationally highly scalable for the analysis of large SNP contents and many diverse populations.
  • COPYRIGHT : https://creativecommons.org/licenses/by/4.0
  • CITATION : Zhang J, Zhan J, Jin J, Ma C, ...&, Chatterjee N. (2024) An ensemble penalized regression method for multi-ancestry polygenic risk prediction Nat. Commun., 15 (1) 3238. doi:10.1038/s41467-024-47357-7. PMID 38622117
  • JOURNAL_INFO : Nature communications ; Nat. Commun. ; 2024 ; 15 ; 1 ; 3238
  • PUBMED_LINK : 38622117

PRS-CSx

  • NAME : PRS-CSx
  • SHORT NAME : PRS-CSx
  • FULL NAME : PRS-CSx
  • DESCRIPTION : PRS-CSx is a Python based command line tool that integrates GWAS summary statistics and external LD reference panels from multiple populations to improve cross-population polygenic prediction. Posterior SNP effect sizes are inferred under coupled continuous shrinkage (CS) priors across populations.
  • URL : https://github.com/getian107/PRScsx
  • KEYWORDS : continuous shrinkage (CS) prior, cross-population
  • TITLE : Improving polygenic prediction in ancestrally diverse populations
  • DOI : 10.1038/s41588-022-01054-7
  • ABSTRACT : Polygenic risk scores (PRS) have attenuated cross-population predictive performance. As existing genome-wide association studies (GWAS) have been conducted predominantly in individuals of European descent, the limited transferability of PRS reduces their clinical value in non-European populations, and may exacerbate healthcare disparities. Recent efforts to level ancestry imbalance in genomic research have expanded the scale of non-European GWAS, although most remain underpowered. Here, we present a new PRS construction method, PRS-CSx, which improves cross-population polygenic prediction by integrating GWAS summary statistics from multiple populations. PRS-CSx couples genetic effects across populations via a shared continuous shrinkage (CS) prior, enabling more accurate effect size estimation by sharing information between summary statistics and leveraging linkage disequilibrium diversity across discovery samples, while inheriting computational efficiency and robustness from PRS-CS. We show that PRS-CSx outperforms alternative methods across traits with a wide range of genetic architectures, cross-population genetic overlaps and discovery GWAS sample sizes in simulations, and improves the prediction of quantitative traits and schizophrenia risk in non-European populations.
  • CITATION : Ruan Y, Lin YF, Feng YC, Chen CY, ...&, Ge T. (2022) Improving polygenic prediction in ancestrally diverse populations Nat. Genet., 54 (5) 573-580. doi:10.1038/s41588-022-01054-7. PMID 35513724
  • JOURNAL_INFO : Nature genetics ; Nat. Genet. ; 2022 ; 54 ; 5 ; 573-580
  • PUBMED_LINK : 35513724

PRS-FH

  • NAME : PRS-FH
  • SHORT NAME : PRS-FH
  • FULL NAME : family history
  • URL : https://alkesgroup.broadinstitute.org/UKBB/PRSFH/PRSFH/
  • KEYWORDS : family history
  • TITLE : Incorporating family history of disease improves polygenic risk scores in diverse populations
  • DOI : 10.1016/j.xgen.2022.100152
  • ABSTRACT : Polygenic risk scores (PRSs) derived from genotype data and family history (FH) of disease provide valuable information for predicting disease risk, but PRSs perform poorly when applied to diverse populations. Here, we explore methods for combining both types of information (PRS-FH) in UK Biobank data. PRSs were trained using all British individuals (n = 409,000), and target samples consisted of unrelated non-British Europeans (n = 42,000), South Asians (n = 7,000), or Africans (n = 7,000). We evaluated PRS, FH, and PRS-FH using liability-scale R 2, primarily focusing on 3 well-powered diseases (type 2 diabetes, hypertension, and depression). PRS attained average prediction R 2s of 5.8%, 4.0%, and 0.53% in non-British Europeans, South Asians, and Africans, confirming poor cross-population transferability. In contrast, PRS-FH attained average prediction R 2s of 13%, 12%, and 10%, respectively, representing a large improvement in Europeans and an extremely large improvement in Africans. In conclusion, including family history improves the accuracy of polygenic risk scores, particularly in diverse populations.
  • COPYRIGHT : http://creativecommons.org/licenses/by-nc-nd/4.0/
  • CITATION : Hujoel MLA, Loh PR, Neale BM, Price AL. (2022) Incorporating family history of disease improves polygenic risk scores in diverse populations Cell Genom., 2 (7) 100152. doi:10.1016/j.xgen.2022.100152. PMID 35935918
  • JOURNAL_INFO : Cell genomics ; Cell Genom. ; 2022 ; 2 ; 7 ; 100152
  • PUBMED_LINK : 35935918

SDPRX

  • NAME : SDPRX
  • SHORT NAME : SDPRX
  • FULL NAME : SDPRX
  • DESCRIPTION : SDPRX is a statistical method for cross-population prediction of complex traits. It integrates GWAS summary statistics and LD matrices from two populations (EUR and non-EUR) to compuate polygenic risk scores.
  • URL : https://github.com/eldronzhou/SDPRX
  • TITLE : SDPRX: A statistical method for cross-population prediction of complex traits
  • DOI : 10.1016/j.ajhg.2022.11.007
  • ABSTRACT : Polygenic risk score (PRS) has demonstrated its great utility in biomedical research through identifying high-risk individuals for different diseases from their genotypes. However, the broader application of PRS to the general population is hindered by the limited transferability of PRS developed in Europeans to non-European populations. To improve PRS prediction accuracy in non-European populations, we develop a statistical method called SDPRX that can effectively integrate genome wide association study summary statistics from different populations. SDPRX automatically adjusts for linkage disequilibrium differences between populations and characterizes the joint distribution of the effect sizes of a variant in two populations to be both null, population specific, or shared with correlation. Through simulations and applications to real traits, we show that SDPRX improves the prediction performance over existing methods in non-European populations.
  • COPYRIGHT : http://www.elsevier.com/open-access/userlicense/1.0/
  • CITATION : Zhou G, Chen T, Zhao H. (2023) SDPRX: A statistical method for cross-population prediction of complex traits Am. J. Hum. Genet., 110 (1) 13-22. doi:10.1016/j.ajhg.2022.11.007. PMID 36460009
  • JOURNAL_INFO : The American Journal of Human Genetics ; Am. J. Hum. Genet. ; 2023 ; 110 ; 1 ; 13-22
  • PUBMED_LINK : 36460009

TL-PRS

  • NAME : TL-PRS
  • SHORT NAME : TL-PRS
  • FULL NAME : transfer learning PRS
  • DESCRIPTION : This R package helps users to construct multi-ethnic polygenic risk score (PRS) using transfer learning. It can help predict PRS of minor ancestry using summary statistics from exsiting resources, such as UK Biobank.
  • URL : https://github.com/ZhangchenZhao/TLPRS
  • TITLE : The construction of cross-population polygenic risk scores using transfer learning
  • DOI : 10.1016/j.ajhg.2022.09.010
  • ABSTRACT : As most existing genome-wide association studies (GWASs) were conducted in European-ancestry cohorts, and as the existing polygenic risk score (PRS) models have limited transferability across ancestry groups, PRS research on non-European-ancestry groups needs to make efficient use of available data until we attain large sample sizes across all ancestry groups. Here we propose a PRS method using transfer learning techniques. Our approach, TL-PRS, uses gradient descent to fine-tune the baseline PRS model from an ancestry group with large sample GWASs to the dataset of target ancestry. In our application of constructing PRS for seven quantitative and two dichotomous traits for 10,285 individuals of South Asian ancestry and 8,168 individuals of African ancestry in UK Biobank, TL-PRS using PRS-CS as a baseline method obtained 25% average relative improvement for South Asian samples and 29% for African samples compared to the standard PRS-CS method in terms of predicted R2. Our approach increases the transferability of PRSs across ancestries and thereby helps reduce existing inequities in genetics research.
  • COPYRIGHT : http://creativecommons.org/licenses/by-nc-nd/4.0/
  • CITATION : Zhao Z, Fritsche LG, Smith JA, Mukherjee B, ...&, Lee S. (2022) The construction of cross-population polygenic risk scores using transfer learning Am. J. Hum. Genet., 109 (11) 1998-2008. doi:10.1016/j.ajhg.2022.09.010. PMID 36240765
  • JOURNAL_INFO : The American Journal of Human Genetics ; Am. J. Hum. Genet. ; 2022 ; 109 ; 11 ; 1998-2008
  • PUBMED_LINK : 36240765

shaPRS

  • NAME : shaPRS
  • SHORT NAME : shaPRS
  • FULL NAME : shaPRS
  • DESCRIPTION : Leveraging shared genetic effects across traits and ancestries improves accuracy of polygenic scores
  • URL : https://github.com/mkelcb/shaprs
  • KEYWORDS : cross-ancestry, genetic correlation
  • TITLE : shaPRS: Leveraging shared genetic effects across traits or ancestries improves accuracy of polygenic scores
  • DOI : 10.1016/j.ajhg.2024.04.009
  • ABSTRACT : We present shaPRS, a method that leverages widespread pleiotropy between traits or shared genetic effects across ancestries, to improve the accuracy of polygenic scores. The method uses genome-wide summary statistics from two diseases or ancestries to improve the genetic effect estimate and standard error at SNPs where there is homogeneity of effect between the two datasets. When there is significant evidence of heterogeneity, the genetic effect from the disease or population closest to the target population is maintained. We show via simulation and a series of real-world examples that shaPRS substantially enhances the accuracy of polygenic risk scores (PRSs) for complex diseases and greatly improves PRS performance across ancestries. shaPRS is a PRS pre-processing method that is agnostic to the actual PRS generation method, and as a result, it can be integrated into existing PRS generation pipelines and continue to be applied as more performant PRS methods are developed over time.
  • CITATION : Kelemen M, Vigorito E, Fachal L, Anderson CA, ...&, Wallace C. (2024) shaPRS: Leveraging shared genetic effects across traits or ancestries improves accuracy of polygenic scores Am. J. Hum. Genet., () . doi:10.1016/j.ajhg.2024.04.009. PMID 38703768
  • JOURNAL_INFO : American journal of human genetics ; Am. J. Hum. Genet. ; 2024 ; ; ;
  • PUBMED_LINK : 38703768

Cross-trait

DDx-PRS

  • NAME : DDx-PRS
  • SHORT NAME : DDx-PRS
  • FULL NAME : Differential Diagnosis-Polygenic Risk Score
  • DESCRIPTION : The DDxPRS R function provides a tool for distuingishing different disorders based on polygenic prediction.
  • URL : https://github.com/wouterpeyrot/DDxPRS
  • CITATION : Peyrot, W. J., Panagiotaropoulou, G., Olde Loohuis, L. M., Adams, M., Awasthi, S., Ge, T., ... & Price, A. L. (2024). Distinguishing different psychiatric disorders using DDx-PRS. medRxiv, 2024-02.

MiXeR

  • NAME : MiXeR
  • SHORT NAME : MiXeR
  • FULL NAME : MiXeR(cross-trait analysis)
  • DESCRIPTION : Causal Mixture Model for GWAS summary statistics
  • URL : https://github.com/precimed/mixer
  • TITLE : Bivariate causal mixture model quantifies polygenic overlap between complex traits beyond genetic correlation
  • DOI : 10.1038/s41467-019-10310-0
  • ABSTRACT : Accumulating evidence from genome wide association studies (GWAS) suggests an abundance of shared genetic influences among complex human traits and disorders, such as mental disorders. Here we introduce a statistical tool, MiXeR, which quantifies polygenic overlap irrespective of genetic correlation, using GWAS summary statistics. MiXeR results are presented as a Venn diagram of unique and shared polygenic components across traits. At 90% of SNP-heritability explained for each phenotype, MiXeR estimates that 8.3 K variants causally influence schizophrenia and 6.4 K influence bipolar disorder. Among these variants, 6.2 K are shared between the disorders, which have a high genetic correlation. Further, MiXeR uncovers polygenic overlap between schizophrenia and educational attainment. Despite a genetic correlation close to zero, the phenotypes share 8.3 K causal variants, while 2.5 K additional variants influence only educational attainment. By considering the polygenicity, discoverability and heritability of complex phenotypes, MiXeR analysis may improve our understanding of cross-trait genetic architectures.
  • CITATION : Frei O, Holland D, Smeland OB, Shadrin AA, ...&, Dale AM. (2019) Bivariate causal mixture model quantifies polygenic overlap between complex traits beyond genetic correlation Nat. Commun., 10 (1) 2417. doi:10.1038/s41467-019-10310-0. PMID 31160569
  • JOURNAL_INFO : Nature communications ; Nat. Commun. ; 2019 ; 10 ; 1 ; 2417
  • PUBMED_LINK : 31160569

Multi-PGS

  • NAME : Multi-PGS
  • SHORT NAME : Multi-PGS
  • FULL NAME : Multi-PGS
  • DESCRIPTION : a framework to generate enriched PGS from a wealth of publicly available genome-wide association studies, combining thousands of studies focused on many different phenotypes, into a multi-PGS
  • URL : https://github.com/ClaraAlbi/paper_multiPGS
  • TITLE : Multi-PGS enhances polygenic prediction by combining 937 polygenic scores
  • DOI : 10.1038/s41467-023-40330-w
  • ABSTRACT : The predictive performance of polygenic scores (PGS) is largely dependent on the number of samples available to train the PGS. Increasing the sample size for a specific phenotype is expensive and takes time, but this sample size can be effectively increased by using genetically correlated phenotypes. We propose a framework to generate multi-PGS from thousands of publicly available genome-wide association studies (GWAS) with no need to individually select the most relevant ones. In this study, the multi-PGS framework increases prediction accuracy over single PGS for all included psychiatric disorders and other available outcomes, with prediction R2 increases of up to 9-fold for attention-deficit/hyperactivity disorder compared to a single PGS. We also generate multi-PGS for phenotypes without an existing GWAS and for case-case predictions. We benchmark the multi-PGS framework against other methods and highlight its potential application to new emerging biobanks.
  • COPYRIGHT : https://creativecommons.org/licenses/by/4.0
  • CITATION : Albiñana C, Zhu Z, Schork AJ, Ingason A, ...&, Vilhjálmsson BJ. (2023) Multi-PGS enhances polygenic prediction by combining 937 polygenic scores Nat. Commun., 14 (1) 4702. doi:10.1038/s41467-023-40330-w. PMID 37543680
  • JOURNAL_INFO : Nature communications ; Nat. Commun. ; 2023 ; 14 ; 1 ; 4702
  • PUBMED_LINK : 37543680

PUMA-CUBS

  • NAME : PUMA-CUBS
  • SHORT NAME : PUMA-CUBS
  • FULL NAME : PUMA-CUBS
  • DESCRIPTION : an ensemble learning strategy named PUMACUBS to combine multiple PRS models into an ensemble score without requiring external data for model fitting.
  • URL : https://github.com/qlu-lab/PUMAS
  • CITATION : Zhao, Zijie, et al. "Optimizing and benchmarking polygenic risk scores with GWAS summary statistics." bioRxiv (2022).

wMT-SBLUP

  • NAME : wMT-SBLUP
  • SHORT NAME : wMT-SBLUP
  • FULL NAME : weighted approximate multi-trait summary statistic BLUP
  • URL : https://github.com/uqrmaie1/smtpred
  • TITLE : Improving genetic prediction by leveraging genetic correlations among human diseases and traits
  • DOI : 10.1038/s41467-017-02769-6
  • ABSTRACT : Genomic prediction has the potential to contribute to precision medicine. However, to date, the utility of such predictors is limited due to low accuracy for most traits. Here theory and simulation study are used to demonstrate that widespread pleiotropy among phenotypes can be utilised to improve genomic risk prediction. We show how a genetic predictor can be created as a weighted index that combines published genome-wide association study (GWAS) summary statistics across many different traits. We apply this framework to predict risk of schizophrenia and bipolar disorder in the Psychiatric Genomics consortium data, finding substantial heterogeneity in prediction accuracy increases across cohorts. For six additional phenotypes in the UK Biobank data, we find increases in prediction accuracy ranging from 0.7% for height to 47% for type 2 diabetes, when using a multi-trait predictor that combines published summary statistics from multiple traits, as compared to a predictor based only on one trait.
  • CITATION : Maier RM, Zhu Z, Lee SH, Trzaskowski M, ...&, Robinson MR. (2018) Improving genetic prediction by leveraging genetic correlations among human diseases and traits Nat. Commun., 9 (1) 989. doi:10.1038/s41467-017-02769-6. PMID 29515099
  • JOURNAL_INFO : Nature communications ; Nat. Commun. ; 2018 ; 9 ; 1 ; 989
  • PUBMED_LINK : 29515099

Pathway

PRSet

  • NAME : PRSet
  • SHORT NAME : PRSet
  • FULL NAME : PRSet
  • DESCRIPTION : A new feature of PRSice is the ability to perform set base/pathway based analysis. This new feature is called PRSet.
  • URL : https://www.prsice.info/quick_start_prset/
  • KEYWORDS : pathway-based
  • TITLE : PRSet: Pathway-based polygenic risk score analyses and software
  • DOI : 10.1371/journal.pgen.1010624
  • ABSTRACT : Polygenic risk scores (PRSs) have been among the leading advances in biomedicine in recent years. As a proxy of genetic liability, PRSs are utilised across multiple fields and applications. While numerous statistical and machine learning methods have been developed to optimise their predictive accuracy, these typically distil genetic liability to a single number based on aggregation of an individual's genome-wide risk alleles. This results in a key loss of information about an individual's genetic profile, which could be critical given the functional sub-structure of the genome and the heterogeneity of complex disease. In this manuscript, we introduce a 'pathway polygenic' paradigm of disease risk, in which multiple genetic liabilities underlie complex diseases, rather than a single genome-wide liability. We describe a method and accompanying software, PRSet, for computing and analysing pathway-based PRSs, in which polygenic scores are calculated across genomic pathways for each individual. We evaluate the potential of pathway PRSs in two distinct ways, creating two major sections: (1) In the first section, we benchmark PRSet as a pathway enrichment tool, evaluating its capacity to capture GWAS signal in pathways. We find that for target sample sizes of >10,000 individuals, pathway PRSs have similar power for evaluating pathway enrichment as leading methods MAGMA and LD score regression, with the distinct advantage of providing individual-level estimates of genetic liability for each pathway -opening up a range of pathway-based PRS applications, (2) In the second section, we evaluate the performance of pathway PRSs for disease stratification. We show that using a supervised disease stratification approach, pathway PRSs (computed by PRSet) outperform two standard genome-wide PRSs (computed by C+T and lassosum) for classifying disease subtypes in 20 of 21 scenarios tested. As the definition and functional annotation of pathways becomes increasingly refined, we expect pathway PRSs to offer key insights into the heterogeneity of complex disease and treatment response, to generate biologically tractable therapeutic targets from polygenic signal, and, ultimately, to provide a powerful path to precision medicine.
  • CITATION : Choi SW, García-González J, Ruan Y, Wu HM, ...&, O'Reilly PF. (2023) PRSet: Pathway-based polygenic risk score analyses and software PLoS Genet., 19 (2) e1010624. doi:10.1371/journal.pgen.1010624. PMID 36749789
  • JOURNAL_INFO : PLoS genetics ; PLoS Genet. ; 2023 ; 19 ; 2 ; e1010624
  • PUBMED_LINK : 36749789

Pipeline

PLINK2

  • NAME : PLINK2
  • SHORT NAME : PLINK2
  • FULL NAME : PLINK2
  • DESCRIPTION : The second-generation versions of PLINK will offer dramatic improvements in performance and compatibility.
  • URL : https://www.cog-genomics.org/plink/2.0/
  • USE : calculate PRS using genotype data.
  • TITLE : Second-generation PLINK: rising to the challenge of larger and richer datasets
  • DOI : 10.1186/s13742-015-0047-8
  • ABSTRACT : BACKGROUND: PLINK 1 is a widely used open-source C/C++ toolset for genome-wide association studies (GWAS) and research in population genetics. However, the steady accumulation of data from imputation and whole-genome sequencing studies has exposed a strong need for faster and scalable implementations of key functions, such as logistic regression, linkage disequilibrium estimation, and genomic distance evaluation. In addition, GWAS and population-genetic data now frequently contain genotype likelihoods, phase information, and/or multiallelic variants, none of which can be represented by PLINK 1's primary data format. FINDINGS: To address these issues, we are developing a second-generation codebase for PLINK. The first major release from this codebase, PLINK 1.9, introduces extensive use of bit-level parallelism, [Formula: see text]-time/constant-space Hardy-Weinberg equilibrium and Fisher's exact tests, and many other algorithmic improvements. In combination, these changes accelerate most operations by 1-4 orders of magnitude, and allow the program to handle datasets too large to fit in RAM. We have also developed an extension to the data format which adds low-overhead support for genotype likelihoods, phase, multiallelic variants, and reference vs. alternate alleles, which is the basis of our planned second release (PLINK 2.0). CONCLUSIONS: The second-generation versions of PLINK will offer dramatic improvements in performance and compatibility. For the first time, users without access to high-end computing resources can perform several essential analyses of the feature-rich and very large genetic datasets coming into use.
  • CITATION : Chang CC, Chow CC, Tellier LC, Vattikuti S, ...&, Lee JJ. (2015) Second-generation PLINK: rising to the challenge of larger and richer datasets Gigascience, 4 (1) 7. doi:10.1186/s13742-015-0047-8. PMID 25722852
  • JOURNAL_INFO : GigaScience ; Gigascience ; 2015 ; 4 ; 1 ; 7
  • PUBMED_LINK : 25722852

PRSice-2

  • NAME : PRSice-2
  • SHORT NAME : PRSice-2
  • FULL NAME : PRSice-2
  • DESCRIPTION : PRSice (pronounced 'precise') is a Polygenic Risk Score software for calculating, applying, evaluating and plotting the results of polygenic risk scores (PRS) analyses.
  • URL : https://www.prsice.info/
  • TITLE : PRSice-2: Polygenic Risk Score software for biobank-scale data
  • DOI : 10.1093/gigascience/giz082
  • ABSTRACT : BACKGROUND: Polygenic risk score (PRS) analyses have become an integral part of biomedical research, exploited to gain insights into shared aetiology among traits, to control for genomic profile in experimental studies, and to strengthen causal inference, among a range of applications. Substantial efforts are now devoted to biobank projects to collect large genetic and phenotypic data, providing unprecedented opportunity for genetic discovery and applications. To process the large-scale data provided by such biobank resources, highly efficient and scalable methods and software are required. RESULTS: Here we introduce PRSice-2, an efficient and scalable software program for automating and simplifying PRS analyses on large-scale data. PRSice-2 handles both genotyped and imputed data, provides empirical association P-values free from inflation due to overfitting, supports different inheritance models, and can evaluate multiple continuous and binary target traits simultaneously. We demonstrate that PRSice-2 is dramatically faster and more memory-efficient than PRSice-1 and alternative PRS software, LDpred and lassosum, while having comparable predictive power. CONCLUSION: PRSice-2's combination of efficiency and power will be increasingly important as data sizes grow and as the applications of PRS become more sophisticated, e.g., when incorporated into high-dimensional or gene set-based analyses. PRSice-2 is written in C++, with an R script for plotting, and is freely available for download from http://PRSice.info.
  • COPYRIGHT : http://creativecommons.org/licenses/by/4.0/
  • CITATION : Choi SW, O'Reilly PF. (2019) PRSice-2: Polygenic Risk Score software for biobank-scale data Gigascience, 8 (7) 1-6. doi:10.1093/gigascience/giz082. PMID 31307061
  • JOURNAL_INFO : GigaScience ; Gigascience ; 2019 ; 8 ; 7 ; 1-6
  • PUBMED_LINK : 31307061

pgsc_calc

  • NAME : pgsc_calc
  • SHORT NAME : pgsc_calc
  • FULL NAME : The Polygenic Score Catalog Calculator
  • DESCRIPTION : pgsc_calc is a bioinformatics best-practice analysis pipeline for calculating polygenic [risk] scores on samples with imputed genotypes using existing scoring files from the Polygenic Score (PGS) Catalog and/or user-defined PGS/PRS.
  • URL : https://github.com/PGScatalog/pgsc_calc
  • KEYWORDS : PRS calculation pipeline
  • CITATION : Lambert, Wingfield et al. (2024) The Polygenic Score Catalog: new functionality and tools to enable FAIR research. medRxiv. doi:10.1101/2024.05.29.24307783.

Platform

Cancer PRSweb

  • NAME : Cancer PRSweb
  • SHORT NAME : Cancer PRSweb
  • FULL NAME : Cancer PRSweb
  • DESCRIPTION : Our framework condenses these summary statistics into PRS using linkage disequilibrium pruning and p-value thresholding (fixed or data-adaptively optimized thresholds) or penalized, genome-wide effect size weighting. We evaluate them in the cancer-enriched cohort of the Michigan Genomics Initiative (MGI), a longitudinal biorepository effort at Michigan Medicine, and in the population-based UK Biobank Study (UKB). For each PRS construct, measures on performance, calibration, and discrimination are provided. Beyond the cancer PRS evaluation in MGI and UKB, the PRSweb platform features construct downloads, risk evaluation in the top percentiles, and phenome-wide PRS association studies (PRS-PheWAS) for a subset of PRS that are predictive for the primary cancer.
  • URL : https://prsweb.sph.umich.edu:8443/
  • KEYWORDS : Cancer PRS
  • TITLE : Cancer PRSweb: An online repository with polygenic risk scores for major cancer traits and their evaluation in two independent biobanks
  • DOI : 10.1016/j.ajhg.2020.08.025
  • ABSTRACT : To facilitate scientific collaboration on polygenic risk scores (PRSs) research, we created an extensive PRS online repository for 35 common cancer traits integrating freely available genome-wide association studies (GWASs) summary statistics from three sources: published GWASs, the NHGRI-EBI GWAS Catalog, and UK Biobank-based GWASs. Our framework condenses these summary statistics into PRSs using various approaches such as linkage disequilibrium pruning/p value thresholding (fixed or data-adaptively optimized thresholds) and penalized, genome-wide effect size weighting. We evaluated the PRSs in two biobanks: the Michigan Genomics Initiative (MGI), a longitudinal biorepository effort at Michigan Medicine, and the population-based UK Biobank (UKB). For each PRS construct, we provide measures on predictive performance and discrimination. Besides PRS evaluation, the Cancer-PRSweb platform features construct downloads and phenome-wide PRS association study results (PRS-PheWAS) for predictive PRSs. We expect this integrated platform to accelerate PRS-related cancer research.
  • COPYRIGHT : http://creativecommons.org/licenses/by-nc-nd/4.0/
  • CITATION : Fritsche LG, Patil S, Beesley LJ, VandeHaar P, ...&, Mukherjee B. (2020) Cancer PRSweb: An online repository with polygenic risk scores for major cancer traits and their evaluation in two independent biobanks Am. J. Hum. Genet., 107 (5) 815-836. doi:10.1016/j.ajhg.2020.08.025. PMID 32991828
  • JOURNAL_INFO : The American Journal of Human Genetics ; Am. J. Hum. Genet. ; 2020 ; 107 ; 5 ; 815-836
  • PUBMED_LINK : 32991828

ExPRSweb

  • NAME : ExPRSweb
  • SHORT NAME : ExPRSweb
  • FULL NAME : exposure polygenic risk scores (ExPRSs)
  • DESCRIPTION : Integrating published and freely available genome-wide association studies (GWAS) summary statistics from multiple sources (published GWAS, the NHGRI-EBI GWAS Catalog, FinnGen- or UKB-based GWAS), we created an online repository for exposure polygenic risk scores (ExPRS) for health-related exposure traits. Our framework condenses these summary statistics into ExPRS using linkage disequilibrium pruning and p-value thresholding (P&T) or penalized, genome-wide effect size weighting. We evaluate them in the cohort of the Michigan Genomics Initiative (MGI), a longitudinal biorepository effort at Michigan Medicine, and in the population-based UK Biobank Study (UKB). For each ExPRS construct, measures on performance, accuracy, and discrimination are provided. Beyond the ExPRS evaluation in MGI and UKB, the ExPRSweb platform features construct downloads, evaluation in the top percentiles, and phenome-wide ExPRS association studies (ExPRS-PheWAS) for a subset of ExPRS that are predictive for the corresponding exposure.
  • URL : https://exprsweb.sph.umich.edu:8443/
  • KEYWORDS : exposure PRS
  • TITLE : ExPRSweb: An online repository with polygenic risk scores for common health-related exposures
  • DOI : 10.1016/j.ajhg.2022.09.001
  • ABSTRACT : Complex traits are influenced by genetic risk factors, lifestyle, and environmental variables, so-called exposures. Some exposures, e.g., smoking or lipid levels, have common genetic modifiers identified in genome-wide association studies. Because measurements are often unfeasible, exposure polygenic risk scores (ExPRSs) offer an alternative to study the influence of exposures on various phenotypes. Here, we collected publicly available summary statistics for 28 exposures and applied four common PRS methods to generate ExPRSs in two large biobanks: the Michigan Genomics Initiative and the UK Biobank. We established ExPRSs for 27 exposures and demonstrated their applicability in phenome-wide association studies and as predictors for common chronic conditions. Especially the addition of multiple ExPRSs showed, for several chronic conditions, an improvement compared to prediction models that only included traditional, disease-focused PRSs. To facilitate follow-up studies, we share all ExPRS constructs and generated results via an online repository called ExPRSweb.
  • COPYRIGHT : http://www.elsevier.com/open-access/userlicense/1.0/
  • CITATION : Ma Y, Patil S, Zhou X, Mukherjee B, ...&, Fritsche LG. (2022) ExPRSweb: An online repository with polygenic risk scores for common health-related exposures Am. J. Hum. Genet., 109 (10) 1742-1760. doi:10.1016/j.ajhg.2022.09.001. PMID 36152628
  • JOURNAL_INFO : The American Journal of Human Genetics ; Am. J. Hum. Genet. ; 2022 ; 109 ; 10 ; 1742-1760
  • PUBMED_LINK : 36152628

PGSCatalog

  • NAME : PGSCatalog
  • SHORT NAME : PGS Catalog
  • FULL NAME : PGS Catalog
  • DESCRIPTION : The PGS Catalog is an open database of published polygenic scores (PGS). Each PGS in the Catalog is consistently annotated with relevant metadata; including scoring files (variants, effect alleles/weights), annotations of how the PGS was developed and applied, and evaluations of their predictive performance.
  • URL : https://www.pgscatalog.org/
  • KEYWORDS : PGS database
  • TITLE : The Polygenic Score Catalog as an open database for reproducibility and systematic evaluation
  • DOI : 10.1038/s41588-021-00783-5
  • ABSTRACT : We present the Polygenic Score (PGS) Catalog (https://www.PGSCatalog.org), an open resource of published scores (including variants, alleles and weights) and consistently curated metadata required for reproducibility and independent applications. The PGS Catalog has capabilities for user deposition, expert curation and programmatic access, thus providing the community with a platform for PGS dissemination, research and translation.
  • CITATION : Lambert SA, Gil L, Jupp S, Ritchie SC, ...&, Inouye M. (2021) The Polygenic Score Catalog as an open database for reproducibility and systematic evaluation Nat. Genet., 53 (4) 420-425. doi:10.1038/s41588-021-00783-5. PMID 33692568
  • JOURNAL_INFO : Nature genetics ; Nat. Genet. ; 2021 ; 53 ; 4 ; 420-425
  • PUBMED_LINK : 33692568

PGSFusion

  • NAME : PGSFusion
  • SHORT NAME : PGSFusion
  • DESCRIPTION : PGSFusion is your free comprehensive webserver for constructing polygenic scores (PGS)evaluating performance, and unlocking epidemiological insights. This server implements 16 leading summary statistics-based PGS methods in a standardized interface, and rigorously assesses their predictive capabilities using the UK Biobank dataset.
  • URL : http://www.pgsfusion.net/#/
  • PREPRINT_DOI : 10.1101/2024.08.05.606619
  • SERVER : biorxiv
  • CITATION : Yang, S., Ye, X., Ji, X., Li, Z., Tian, M., Huang, P., & Cao, C. (2024). PGSFusion streamlines polygenic score construction and epidemiological applications in biobank-scale cohorts. bioRxiv, 2024-08.

PRS atlas

  • NAME : PRS atlas
  • SHORT NAME : PRS atlas
  • FULL NAME : PRS atlas
  • DESCRIPTION : This web application can be used to query findings from an analysis of 162 polygenic risk scores and 551 complex traits using data from the UK Biobank study1. Traits were selected based on the heritability analysis conducted by the Neale Lab2 (P<0.05). We encourage users of this resource to conduct follow-up analyses of associations to robustly identify causal relationships between complex traits.
  • URL : http://mrcieu.mrsoftware.org/PRS_atlas/
  • TITLE : An atlas of polygenic risk score associations to highlight putative causal relationships across the human phenome
  • DOI : 10.7554/eLife.43657
  • ABSTRACT : The age of large-scale genome-wide association studies (GWAS) has provided us with an unprecedented opportunity to evaluate the genetic liability of complex disease using polygenic risk scores (PRS). In this study, we have analysed 162 PRS (p<5×10-05) derived from GWAS and 551 heritable traits from the UK Biobank study (N = 334,398). Findings can be investigated using a web application (http:‌//‌mrcieu.‌mrsoftware.org/‌PRS‌_atlas/), which we envisage will help uncover both known and novel mechanisms which contribute towards disease susceptibility. To demonstrate this, we have investigated the results from a phenome-wide evaluation of schizophrenia genetic liability. Amongst findings were inverse associations with measures of cognitive function which extensive follow-up analyses using Mendelian randomization (MR) provided evidence of a causal relationship. We have also investigated the effect of multiple risk factors on disease using mediation and multivariable MR frameworks. Our atlas provides a resource for future endeavours seeking to unravel the causal determinants of complex disease.
  • COPYRIGHT : http://creativecommons.org/licenses/by/4.0/
  • CITATION : Richardson TG, Harrison S, Hemani G, Davey Smith G. (2019) An atlas of polygenic risk score associations to highlight putative causal relationships across the human phenome Elife, 8 () . doi:10.7554/eLife.43657. PMID 30835202
  • JOURNAL_INFO : eLife ; Elife ; 2019 ; 8 ; ;
  • PUBMED_LINK : 30835202

metabolites PRS atlas

  • NAME : metabolites PRS atlas
  • SHORT NAME : metabolites PRS atlas
  • FULL NAME : metabolites PRS atlas
  • DESCRIPTION : This web application can be used to query findings from a systematic analysis of 129 polygenic risk scores and 249 circulating metabolits using high-throughput nuclear magnetic resonance data from the UK Biobank study1,2. We encourage users of this resource to conduct follow-up analyses of associations to investigate potential causal and non-causal metabolic biomarkers. Age-stratified results can be used to investigate how potential sources of collider bias (e.g. statin therapy) may influence findings in the full sample
  • URL : http://mrcieu.mrsoftware.org/metabolites_PRS_atlas/
  • TITLE : Constructing an atlas of associations between polygenic scores from across the human phenome and circulating metabolic biomarkers
  • DOI : 10.7554/eLife.73951
  • ABSTRACT : Background: Polygenic scores (PGS) are becoming an increasingly popular approach to predict complex disease risk, although they also hold the potential to develop insight into the molecular profiles of patients with an elevated genetic predisposition to disease. Methods: We sought to construct an atlas of associations between 125 different PGS derived using results from genome-wide association studies and 249 circulating metabolites in up to 83,004 participants from the UK Biobank. Results: As an exemplar to demonstrate the value of this atlas, we conducted a hypothesis-free evaluation of all associations with glycoprotein acetyls (GlycA), an inflammatory biomarker. Using bidirectional Mendelian randomization, we find that the associations highlighted likely reflect the effect of risk factors, such as adiposity or liability towards smoking, on systemic inflammation as opposed to the converse direction. Moreover, we repeated all analyses in our atlas within age strata to investigate potential sources of collider bias, such as medication usage. This was exemplified by comparing associations between lipoprotein lipid profiles and the coronary artery disease PGS in the youngest and oldest age strata, which had differing proportions of individuals undergoing statin therapy. Lastly, we generated all PGS-metabolite associations stratified by sex and separately after excluding 13 established lipid-associated loci to further evaluate the robustness of findings. Conclusions: We envisage that the atlas of results constructed in our study will motivate future hypothesis generation and help prioritize and deprioritize circulating metabolic traits for in-depth investigations. All results can be visualized and downloaded at http://mrcieu.mrsoftware.org/metabolites_PRS_atlas. Funding: This work is supported by funding from the Wellcome Trust, the British Heart Foundation, and the Medical Research Council Integrative Epidemiology Unit.
  • COPYRIGHT : http://creativecommons.org/licenses/by/4.0/
  • CITATION : Fang S, Holmes MV, Gaunt TR, Davey Smith G, ...&, Richardson TG. (2022) Constructing an atlas of associations between polygenic scores from across the human phenome and circulating metabolic biomarkers Elife, 11 () e73951. doi:10.7554/eLife.73951. PMID 36219204
  • JOURNAL_INFO : eLife ; Elife ; 2022 ; 11 ; ; e73951
  • PUBMED_LINK : 36219204

Polygenicity

BayesR

  • NAME : BayesR
  • SHORT NAME : BayesR
  • FULL NAME : BayesR
  • DESCRIPTION : Bayesian mixture model to dissect genetic variation for disease in human populations and to construct more powerful risk predictors
  • URL : https://cnsgenomics.com/software/gctb/#Overview
  • TITLE : Simultaneous discovery, estimation and prediction analysis of complex traits using a bayesian mixture model
  • DOI : 10.1371/journal.pgen.1004969
  • ABSTRACT : Gene discovery, estimation of heritability captured by SNP arrays, inference on genetic architecture and prediction analyses of complex traits are usually performed using different statistical models and methods, leading to inefficiency and loss of power. Here we use a Bayesian mixture model that simultaneously allows variant discovery, estimation of genetic variance explained by all variants and prediction of unobserved phenotypes in new samples. We apply the method to simulated data of quantitative traits and Welcome Trust Case Control Consortium (WTCCC) data on disease and show that it provides accurate estimates of SNP-based heritability, produces unbiased estimators of risk in new samples, and that it can estimate genetic architecture by partitioning variation across hundreds to thousands of SNPs. We estimated that, depending on the trait, 2,633 to 9,411 SNPs explain all of the SNP-based heritability in the WTCCC diseases. The majority of those SNPs (>96%) had small effects, confirming a substantial polygenic component to common diseases. The proportion of the SNP-based variance explained by large effects (each SNP explaining 1% of the variance) varied markedly between diseases, ranging from almost zero for bipolar disorder to 72% for type 1 diabetes. Prediction analyses demonstrate that for diseases with major loci, such as type 1 diabetes and rheumatoid arthritis, Bayesian methods outperform profile scoring or mixed model approaches.
  • COPYRIGHT : http://creativecommons.org/licenses/by/4.0/
  • CITATION : Moser G, Lee SH, Hayes BJ, Goddard ME, ...&, Visscher PM. (2015) Simultaneous discovery, estimation and prediction analysis of complex traits using a bayesian mixture model PLoS Genet., 11 (4) e1004969. doi:10.1371/journal.pgen.1004969. PMID 25849665
  • JOURNAL_INFO : PLoS genetics ; PLoS Genet. ; 2015 ; 11 ; 4 ; e1004969
  • PUBMED_LINK : 25849665

BayesS

  • NAME : BayesS
  • SHORT NAME : BayesS
  • FULL NAME : BayesS
  • URL : https://cnsgenomics.com/software/gctb/#Overview
  • TITLE : Signatures of negative selection in the genetic architecture of human complex traits
  • DOI : 10.1038/s41588-018-0101-4
  • ABSTRACT : We develop a Bayesian mixed linear model that simultaneously estimates single-nucleotide polymorphism (SNP)-based heritability, polygenicity (proportion of SNPs with nonzero effects), and the relationship between SNP effect size and minor allele frequency for complex traits in conventionally unrelated individuals using genome-wide SNP data. We apply the method to 28 complex traits in the UK Biobank data (N = 126,752) and show that on average, 6% of SNPs have nonzero effects, which in total explain 22% of phenotypic variance. We detect significant (P < 0.05/28) signatures of natural selection in the genetic architecture of 23 traits, including reproductive, cardiovascular, and anthropometric traits, as well as educational attainment. The significant estimates of the relationship between effect size and minor allele frequency in complex traits are consistent with a model of negative (or purifying) selection, as confirmed by forward simulation. We conclude that negative selection acts pervasively on the genetic variants associated with human complex traits.
  • CITATION : Zeng J, de Vlaming R, Wu Y, Robinson MR, ...&, Yang J. (2018) Signatures of negative selection in the genetic architecture of human complex traits Nat. Genet., 50 (5) 746-753. doi:10.1038/s41588-018-0101-4. PMID 29662166
  • JOURNAL_INFO : Nature genetics ; Nat. Genet. ; 2018 ; 50 ; 5 ; 746-753
  • PUBMED_LINK : 29662166

SBayesR

  • NAME : SBayesR
  • SHORT NAME : SBayesR
  • FULL NAME : SBayesR
  • DESCRIPTION : extend a powerful individual-level data Bayesian multiple regression model (BayesR) to one that utilises summary statistics from genome-wide association studies, SBayesR.
  • URL : https://cnsgenomics.com/software/gctb/#Overview
  • TITLE : Improved polygenic prediction by Bayesian multiple regression on summary statistics
  • DOI : 10.1038/s41467-019-12653-0
  • ABSTRACT : Accurate prediction of an individual's phenotype from their DNA sequence is one of the great promises of genomics and precision medicine. We extend a powerful individual-level data Bayesian multiple regression model (BayesR) to one that utilises summary statistics from genome-wide association studies (GWAS), SBayesR. In simulation and cross-validation using 12 real traits and 1.1 million variants on 350,000 individuals from the UK Biobank, SBayesR improves prediction accuracy relative to commonly used state-of-the-art summary statistics methods at a fraction of the computational resources. Furthermore, using summary statistics for variants from the largest GWAS meta-analysis (n ≈ 700, 000) on height and BMI, we show that on average across traits and two independent data sets that SBayesR improves prediction R2 by 5.2% relative to LDpred and by 26.5% relative to clumping and p value thresholding.
  • COPYRIGHT : https://creativecommons.org/licenses/by/4.0
  • CITATION : Lloyd-Jones LR, Zeng J, Sidorenko J, Yengo L, ...&, Visscher PM. (2019) Improved polygenic prediction by Bayesian multiple regression on summary statistics Nat. Commun., 10 (1) 5086. doi:10.1038/s41467-019-12653-0. PMID 31704910
  • JOURNAL_INFO : Nature communications ; Nat. Commun. ; 2019 ; 10 ; 1 ; 5086
  • PUBMED_LINK : 31704910

SBayesRC

  • NAME : SBayesRC
  • SHORT NAME : SBayesRC
  • FULL NAME : SBayesRC
  • DESCRIPTION : SBayesRC integrates GWAS summary statistics with functional genomic annotations to improve polygenic prediction of complex traits.
  • URL : https://cnsgenomics.com/software/gctb/#Overview
  • KEYWORDS : functional genomic annotation, whole-genome variants, cross-ancestry
  • TITLE : Leveraging functional genomic annotations and genome coverage to improve polygenic prediction of complex traits within and between ancestries
  • DOI : 10.1038/s41588-024-01704-y
  • ABSTRACT : We develop a method, SBayesRC, that integrates genome-wide association study (GWAS) summary statistics with functional genomic annotations to improve polygenic prediction of complex traits. Our method is scalable to whole-genome variant analysis and refines signals from functional annotations by allowing them to affect both causal variant probability and causal effect distribution. We analyze 50 complex traits and diseases using ∼7 million common single-nucleotide polymorphisms (SNPs) and 96 annotations. SBayesRC improves prediction accuracy by 14% in European ancestry and up to 34% in cross-ancestry prediction compared to the baseline method SBayesR, which does not use annotations, and outperforms other methods, including LDpred2, LDpred-funct, MegaPRS, PolyPred-S and PRS-CSx. Investigation of factors affecting prediction accuracy identifies a significant interaction between SNP density and annotation information, suggesting whole-genome sequence variants with annotations may further improve prediction. Functional partitioning analysis highlights a major contribution of evolutionary constrained regions to prediction accuracy and the largest per-SNP contribution from nonsynonymous SNPs.
  • COPYRIGHT : https://creativecommons.org/licenses/by/4.0
  • CITATION : Zheng Z, Liu S, Sidorenko J, Wang Y, ...&, Zeng J. (2024) Leveraging functional genomic annotations and genome coverage to improve polygenic prediction of complex traits within and between ancestries Nat. Genet., 56 (5) 767-777. doi:10.1038/s41588-024-01704-y. PMID 38689000
  • JOURNAL_INFO : Nature genetics ; Nat. Genet. ; 2024 ; 56 ; 5 ; 767-777
  • PUBMED_LINK : 38689000

SBayesS

  • NAME : SBayesS
  • SHORT NAME : SBayesS
  • FULL NAME : SBayesS
  • DESCRIPTION : estimate multiple genetic architecture parameters including selection signature using only GWAS summary statistics
  • URL : https://cnsgenomics.com/software/gctb/#Overview
  • TITLE : Widespread signatures of natural selection across human complex traits and functional genomic categories
  • DOI : 10.1038/s41467-021-21446-3
  • ABSTRACT : Understanding how natural selection has shaped genetic architecture of complex traits is of importance in medical and evolutionary genetics. Bayesian methods have been developed using individual-level GWAS data to estimate multiple genetic architecture parameters including selection signature. Here, we present a method (SBayesS) that only requires GWAS summary statistics. We analyse data for 155 complex traits (n = 27k-547k) and project the estimates onto those obtained from evolutionary simulations. We estimate that, on average across traits, about 1% of human genome sequence are mutational targets with a mean selection coefficient of ~0.001. Common diseases, on average, show a smaller number of mutational targets and have been under stronger selection, compared to other traits. SBayesS analyses incorporating functional annotations reveal that selection signatures vary across genomic regions, among which coding regions have the strongest selection signature and are enriched for both the number of associated variants and the magnitude of effect sizes.
  • COPYRIGHT : https://creativecommons.org/licenses/by/4.0
  • CITATION : Zeng J, Xue A, Jiang L, Lloyd-Jones LR, ...&, Yang J. (2021) Widespread signatures of natural selection across human complex traits and functional genomic categories Nat. Commun., 12 (1) 1164. doi:10.1038/s41467-021-21446-3. PMID 33608517
  • JOURNAL_INFO : Nature communications ; Nat. Commun. ; 2021 ; 12 ; 1 ; 1164
  • PUBMED_LINK : 33608517

Review

Review-Kachuri

  • NAME : Review-Kachuri
  • TITLE : Principles and methods for transferring polygenic risk scores across global populations
  • DOI : 10.1038/s41576-023-00637-2
  • ABSTRACT : Polygenic risk scores (PRSs) summarize the genetic predisposition of a complex human trait or disease and may become a valuable tool for advancing precision medicine. However, PRSs that are developed in populations of predominantly European genetic ancestries can increase health disparities due to poor predictive performance in individuals of diverse and complex genetic ancestries. We describe genetic and modifiable risk factors that limit the transferability of PRSs across populations and review the strengths and weaknesses of existing PRS construction methods for diverse ancestries. Developing PRSs that benefit global populations in research and clinical settings provides an opportunity for innovation and is essential for health equity.
  • COPYRIGHT : https://www.springernature.com/gp/researchers/text-and-data-mining
  • CITATION : Kachuri L, Chatterjee N, Hirbo J, Schaid DJ, ...&, Ge T. (2023) Principles and methods for transferring polygenic risk scores across global populations Nat. Rev. Genet., 25 (1) 8-25. doi:10.1038/s41576-023-00637-2. PMID 37620596
  • JOURNAL_INFO : Nature reviews. Genetics ; Nat. Rev. Genet. ; 2023 ; 25 ; 1 ; 8-25
  • PUBMED_LINK : 37620596

Review-Peter

  • NAME : Review-Peter
  • TITLE : Discovery and implications of polygenicity of common diseases
  • DOI : 10.1126/science.abi8206
  • ABSTRACT : The sequencing of the human genome has allowed the study of the genetic architecture of common diseases: the number of genomic variants that contribute to risk of disease and their joint frequency and effect size distribution. Common diseases are polygenic, with many loci contributing to phenotype, and the cumulative burden of risk alleles determines individual risk in conjunction with environmental factors. Most risk loci occur in noncoding regions of the genome regulating cell- and context-specific gene expression. Although the effect sizes of most risk alleles are small, their cumulative effects in individuals, quantified as a polygenic (risk) score, can identify people at increased risk of disease, thereby facilitating prevention or early intervention.
  • CITATION : Visscher PM, Yengo L, Cox NJ, Wray NR. (2021) Discovery and implications of polygenicity of common diseases Science, 373 (6562) 1468-1473. doi:10.1126/science.abi8206. PMID 34554790
  • JOURNAL_INFO : Science ; Science ; 2021 ; 373 ; 6562 ; 1468-1473
  • PUBMED_LINK : 34554790

Review-Wang

  • NAME : Review-Wang
  • TITLE : Challenges and opportunities for developing more generalizable polygenic risk scores
  • DOI : 10.1146/annurev-biodatasci-111721-074830
  • ABSTRACT : Polygenic risk scores (PRS) estimate an individual's genetic likelihood of complex traits and diseases by aggregating information across multiple genetic variants identified from genome-wide association studies. PRS can predict a broad spectrum of diseases and have therefore been widely used in research settings. Some work has investigated their potential applications as biomarkers in preventative medicine, but significant work is still needed to definitively establish and communicate absolute risk to patients for genetic and modifiable risk factors across demographic groups. However, the biggest limitation of PRS currently is that they show poor generalizability across diverse ancestries and cohorts. Major efforts are underway through methodological development and data generation initiatives to improve their generalizability. This review aims to comprehensively discuss current progress on the development of PRS, the factors that affect their generalizability, and promising areas for improving their accuracy, portability, and implementation.
  • CITATION : Wang Y, Tsuo K, Kanai M, Neale BM, ...&, Martin AR. (2022) Challenges and opportunities for developing more generalizable polygenic risk scores Annu. Rev. Biomed. Data Sci., 5 (1) 293-320. doi:10.1146/annurev-biodatasci-111721-074830. PMID 35576555
  • JOURNAL_INFO : Annual review of biomedical data science ; Annu. Rev. Biomed. Data Sci. ; 2022 ; 5 ; 1 ; 293-320
  • PUBMED_LINK : 35576555

Single-trait

ALL-Sum

  • NAME : ALL-Sum
  • SHORT NAME : ALL-Sum
  • FULL NAME : Aggregated L0Learn using Summary-level data
  • DESCRIPTION : ALL - Sum leverages a L0L2 penalized regression and ensemble learning across tuning parameters to flexibly model traits with diverse genetic architectures.
  • URL : https://github.com/chen-tony/ALL-Sum/
  • KEYWORDS : ensemble learning
  • TITLE : Fast and scalable ensemble learning method for versatile polygenic risk prediction
  • DOI : 10.1073/pnas.2403210121
  • ABSTRACT : Polygenic risk scores (PRS) enhance population risk stratification and advance personalized medicine, but existing methods face several limitations, encompassing issues related to computational burden, predictive accuracy, and adaptability to a wide range of genetic architectures. To address these issues, we propose Aggregated L0Learn using Summary-level data (ALL-Sum), a fast and scalable ensemble learning method for computing PRS using summary statistics from genome-wide association studies (GWAS). ALL-Sum leverages a L0L2 penalized regression and ensemble learning across tuning parameters to flexibly model traits with diverse genetic architectures. In extensive large-scale simulations across a wide range of polygenicity and GWAS sample sizes, ALL-Sum consistently outperformed popular alternative methods in terms of prediction accuracy, runtime, and memory usage by 10%, 20-fold, and threefold, respectively, and demonstrated robustness to diverse genetic architectures. We validated the performance of ALL-Sum in real data analysis of 11 complex traits using GWAS summary statistics from nine data sources, including the Global Lipids Genetics Consortium, Breast Cancer Association Consortium, and FinnGen Biobank, with validation in the UK Biobank. Our results show that on average, ALL-Sum obtained PRS with 25% higher accuracy on average, with 15 times faster computation and half the memory than the current state-of-the-art methods, and had robust performance across a wide range of traits and diseases. Furthermore, our method demonstrates stable prediction when using linkage disequilibrium computed from different data sources. ALL-Sum is available as a user-friendly R software package with publicly available reference data for streamlined analysis.
  • COPYRIGHT : https://creativecommons.org/licenses/by-nc-nd/4.0/
  • CITATION : Chen T, Zhang H, Mazumder R, Lin X. (2024) Fast and scalable ensemble learning method for versatile polygenic risk prediction Proc. Natl. Acad. Sci. U. S. A., 121 (33) e2403210121. doi:10.1073/pnas.2403210121. PMID 39110727
  • JOURNAL_INFO : Proceedings of the National Academy of Sciences of the United States of America ; Proc. Natl. Acad. Sci. U. S. A. ; 2024 ; 121 ; 33 ; e2403210121
  • PUBMED_LINK : 39110727

CalPred

  • NAME : CalPred
  • SHORT NAME : CalPred
  • FULL NAME : Calibrated prediction intervals
  • DESCRIPTION : a statistical framework that jointly models the effects of all contexts on PGS accuracy with parameters learned in a calibration dataset
  • URL : https://github.com/KangchengHou/calpred
  • KEYWORDS : trait prediction intervals
  • TITLE : Calibrated prediction intervals for polygenic scores across diverse contexts
  • DOI : 10.1038/s41588-024-01792-w
  • ABSTRACT : Polygenic scores (PGS) have emerged as the tool of choice for genomic prediction in a wide range of fields. We show that PGS performance varies broadly across contexts and biobanks. Contexts such as age, sex and income can impact PGS accuracy with similar magnitudes as genetic ancestry. Here we introduce an approach (CalPred) that models all contexts jointly to produce prediction intervals that vary across contexts to achieve calibration (include the trait with 90% probability), whereas existing methods are miscalibrated. In analyses of 72 traits across large and diverse biobanks (All of Us and UK Biobank), we find that prediction intervals required adjustment by up to 80% for quantitative traits. For disease traits, PGS-based predictions were miscalibrated across socioeconomic contexts such as annual household income levels, further highlighting the need of accounting for context information in PGS-based prediction across diverse populations.
  • COPYRIGHT : https://www.springernature.com/gp/researchers/text-and-data-mining
  • CITATION : Hou K, Xu Z, Ding Y, Mandla R, ...&, Pasaniuc B. (2024) Calibrated prediction intervals for polygenic scores across diverse contexts Nat. Genet., 56 (7) 1386-1396. doi:10.1038/s41588-024-01792-w. PMID 38886587
  • JOURNAL_INFO : Nature genetics ; Nat. Genet. ; 2024 ; 56 ; 7 ; 1386-1396
  • PUBMED_LINK : 38886587

DBSLMM

  • NAME : DBSLMM
  • SHORT NAME : DBSLMM
  • FULL NAME : Deterministic Bayesian Sparse Linear Mixed Model
  • DESCRIPTION : There are two versions of DBSLMM: the tuning version and the deterministic version. The tuning version examines three different heritability choices and requires a validation data to tune the heritability hyper-parameter. The deterministic version uses one heritability estimate and directly fit the model in the training data without a separate validation data. Both versions requires a reference data to compute the SNP correlation matrix. In our experience, the tuning version may work more accurately than the deterministic version.
  • URL : https://github.com/biostat0903/DBSLMM
  • TITLE : Accurate and scalable construction of polygenic scores in large biobank data sets
  • DOI : 10.1016/j.ajhg.2020.03.013
  • ABSTRACT : Accurate construction of polygenic scores (PGS) can enable early diagnosis of diseases and facilitate the development of personalized medicine. Accurate PGS construction requires prediction models that are both adaptive to different genetic architectures and scalable to biobank scale datasets with millions of individuals and tens of millions of genetic variants. Here, we develop such a method called Deterministic Bayesian Sparse Linear Mixed Model (DBSLMM). DBSLMM relies on a flexible modeling assumption on the effect size distribution to achieve robust and accurate prediction performance across a range of genetic architectures. DBSLMM also relies on a simple deterministic search algorithm to yield an approximate analytic estimation solution using summary statistics only. The deterministic search algorithm, when paired with further algebraic innovations, results in substantial computational savings. With simulations, we show that DBSLMM achieves scalable and accurate prediction performance across a range of realistic genetic architectures. We then apply DBSLMM to analyze 25 traits in UK Biobank. For these traits, compared to existing approaches, DBSLMM achieves an average of 2.03%-101.09% accuracy gain in internal cross-validations. In external validations on two separate datasets, including one from BioBank Japan, DBSLMM achieves an average of 14.74%-522.74% accuracy gain. In these real data applications, DBSLMM is 1.03-28.11 times faster and uses only 7.4%-24.8% of physical memory as compared to other multiple regression-based PGS methods. Overall, DBSLMM represents an accurate and scalable method for constructing PGS in biobank scale datasets.
  • COPYRIGHT : http://www.elsevier.com/open-access/userlicense/1.0/
  • CITATION : Yang S, Zhou X. (2020) Accurate and scalable construction of polygenic scores in large biobank data sets Am. J. Hum. Genet., 106 (5) 679-693. doi:10.1016/j.ajhg.2020.03.013. PMID 32330416
  • JOURNAL_INFO : The American Journal of Human Genetics ; Am. J. Hum. Genet. ; 2020 ; 106 ; 5 ; 679-693
  • PUBMED_LINK : 32330416

GMRM

  • NAME : GMRM
  • SHORT NAME : GMRM
  • FULL NAME : Bayesian grouped mixture of regressions model
  • DESCRIPTION : gmrm is hybrid-parallel software for a Bayesian grouped mixture of regressions model for genome-wide association studies (GWAS). It is written in C++ using extensive optimisations and code vectorisation. It relies on plink's .bed format. It can handle multiple traits simultaneously.
  • URL : https://github.com/medical-genomics-group/gmrm
  • TITLE : Improving GWAS discovery and genomic prediction accuracy in biobank data
  • DOI : 10.1073/pnas.2121279119
  • ABSTRACT : Genetically informed, deep-phenotyped biobanks are an important research resource and it is imperative that the most powerful, versatile, and efficient analysis approaches are used. Here, we apply our recently developed Bayesian grouped mixture of regressions model (GMRM) in the UK and Estonian Biobanks and obtain the highest genomic prediction accuracy reported to date across 21 heritable traits. When compared to other approaches, GMRM accuracy was greater than annotation prediction models run in the LDAK or LDPred-funct software by 15% (SE 7%) and 14% (SE 2%), respectively, and was 18% (SE 3%) greater than a baseline BayesR model without single-nucleotide polymorphism (SNP) markers grouped into minor allele frequency-linkage disequilibrium (MAF-LD) annotation categories. For height, the prediction accuracy R2 was 47% in a UK Biobank holdout sample, which was 76% of the estimated [Formula: see text]. We then extend our GMRM prediction model to provide mixed-linear model association (MLMA) SNP marker estimates for genome-wide association (GWAS) discovery, which increased the independent loci detected to 16,162 in unrelated UK Biobank individuals, compared to 10,550 from BoltLMM and 10,095 from Regenie, a 62 and 65% increase, respectively. The average [Formula: see text] value of the leading markers increased by 15.24 (SE 0.41) for every 1% increase in prediction accuracy gained over a baseline BayesR model across the traits. Thus, we show that modeling genetic associations accounting for MAF and LD differences among SNP markers, and incorporating prior knowledge of genomic function, is important for both genomic prediction and discovery in large-scale individual-level studies.
  • CITATION : Orliac EJ, Trejo Banos D, Ojavee SE, Läll K, ...&, Robinson MR. (2022) Improving GWAS discovery and genomic prediction accuracy in biobank data Proc. Natl. Acad. Sci. U. S. A., 119 (31) e2121279119. doi:10.1073/pnas.2121279119. PMID 35905320
  • JOURNAL_INFO : Proceedings of the National Academy of Sciences of the United States of America ; Proc. Natl. Acad. Sci. U. S. A. ; 2022 ; 119 ; 31 ; e2121279119
  • PUBMED_LINK : 35905320

GRPa-PRS

  • NAME : GRPa-PRS
  • SHORT NAME : GRPa-PRS
  • FULL NAME : genetically-regulated pathways
  • URL : https://github.com/davidroad/GRPa-PRS
  • TITLE : GRPa-PRS: A risk stratification method to identify genetically-regulated pathways in polygenic diseases
  • DOI : 10.1101/2023.06.19.23291621
  • ABSTRACT : Background: Polygenic risk scores (PRS) are tools used to evaluate an individual's susceptibility to polygenic diseases based on their genetic profile. A considerable proportion of people carry a high genetic risk but evade the disease. On the other hand, some individuals with a low risk of eventually developing the disease. We hypothesized that unknown counterfactors might be involved in reversing the PRS prediction, which might provide new insights into the pathogenesis, prevention, and early intervention of diseases. Methods: We built a novel computational framework to identify genetically-regulated pathways (GRPas) using PRS-based stratification for each cohort. We curated two AD cohorts with genotyping data; the discovery (disc) and the replication (rep) datasets include 2722 and 2854 individuals, respectively. First, we calculated the optimized PRS model based on the three recent AD GWAS summary statistics for each cohort. Then, we stratified the individuals by their PRS and clinical diagnosis into six biologically meaningful PRS strata, such as AD cases with low/high risk and cognitively normal (CN) with low/high risk. Lastly, we imputed individual genetically-regulated expression (GReX) and identified differential GReX and GRPas between risk strata using gene-set enrichment and variational analyses in two models, with and without APOE effects. An orthogonality test was further conducted to verify those GRPas are independent of PRS risk. To verify the generalizability of other polygenic diseases, we further applied a default model of GRPa-PRS for schizophrenia (SCZ). Results: For each stratum, we conducted the same procedures in both the disc and rep datasets for comparison. In AD, we identified several well-known AD-related pathways, including amyloid-beta clearance, tau protein binding, and astrocyte response to oxidative stress. Additionally, we discovered resilience-related GRPs that are orthogonal to AD PRS, such as the calcium signaling pathway and divalent inorganic cation homeostasis. In SCZ, pathways related to mitochondrial function and muscle development were highlighted. Finally, our GRPa-PRS method identified more consistent differential pathways compared to another variant-based pathway PRS method. Conclusions: We developed a framework, GRPa-PRS, to systematically explore the differential GReX and GRPas among individuals stratified by their estimated PRS. The GReX-level comparison among those strata unveiled new insights into the pathways associated with disease risk and resilience. Our framework is extendable to other polygenic complex diseases.
  • CITATION : Li X, Fernandes BS, Liu A, Chen J, ...&, Dai Y. (2024) GRPa-PRS: A risk stratification method to identify genetically-regulated pathways in polygenic diseases medRxiv, () 2023.06.19.23291621. doi:10.1101/2023.06.19.23291621. PMID 37425929
  • JOURNAL_INFO : medRxiv: the preprint server for health sciences ; medRxiv ; 2024 ; ; ; 2023.06.19.23291621
  • PUBMED_LINK : 37425929

GenoBoost

  • NAME : GenoBoost
  • SHORT NAME : GenoBoost
  • FULL NAME : GenoBoost
  • DESCRIPTION : GenoBoost is a polygenic score method to capture additive and non-additive genetic inheritance effects.
  • URL : https://github.com/rickyota/genoboost
  • KEYWORDS : additive effects, non-additive effects, statistical boosting
  • TITLE : A polygenic score method boosted by non-additive models
  • DOI : 10.1038/s41467-024-48654-x
  • ABSTRACT : Dominance heritability in complex traits has received increasing recognition. However, most polygenic score (PGS) approaches do not incorporate non-additive effects. Here, we present GenoBoost, a flexible PGS modeling framework capable of considering both additive and non-additive effects, specifically focusing on genetic dominance. Building on statistical boosting theory, we derive provably optimal GenoBoost scores and provide its efficient implementation for analyzing large-scale cohorts. We benchmark it against seven commonly used PGS methods and demonstrate its competitive predictive performance. GenoBoost is ranked the best for four traits and second-best for three traits among twelve tested disease outcomes in UK Biobank. We reveal that GenoBoost improves prediction for autoimmune diseases by incorporating non-additive effects localized in the MHC locus and, more broadly, works best in less polygenic traits. We further demonstrate that GenoBoost can infer the mode of genetic inheritance without requiring prior knowledge. For example, GenoBoost finds non-zero genetic dominance effects for 602 of 900 selected genetic variants, resulting in 2.5% improvements in predicting psoriasis cases. Lastly, we show that GenoBoost can prioritize genetic loci with genetic dominance not previously reported in the GWAS catalog. Our results highlight the increased accuracy and biological insights from incorporating non-additive effects in PGS models.
  • CITATION : Ohta R, Tanigawa Y, Suzuki Y, Kellis M, ...&, Morishita S. (2024) A polygenic score method boosted by non-additive models Nat. Commun., 15 (1) 4433. doi:10.1038/s41467-024-48654-x. PMID 38811555
  • JOURNAL_INFO : Nature communications ; Nat. Commun. ; 2024 ; 15 ; 1 ; 4433
  • PUBMED_LINK : 38811555

LDpred

  • NAME : LDpred
  • SHORT NAME : LDpred
  • FULL NAME : LDpred
  • DESCRIPTION : LDpred is a Python based software package that adjusts GWAS summary statistics for the effects of linkage disequilibrium (LD).
  • URL : https://github.com/bvilhjal/ldpred
  • KEYWORDS : Bayesian, Gaussian infinitesimal prior, python
  • TITLE : Modeling linkage disequilibrium increases accuracy of polygenic risk scores
  • DOI : 10.1016/j.ajhg.2015.09.001
  • ABSTRACT : Polygenic risk scores have shown great promise in predicting complex disease risk and will become more accurate as training sample sizes increase. The standard approach for calculating risk scores involves linkage disequilibrium (LD)-based marker pruning and applying a p value threshold to association statistics, but this discards information and can reduce predictive accuracy. We introduce LDpred, a method that infers the posterior mean effect size of each marker by using a prior on effect sizes and LD information from an external reference panel. Theory and simulations show that LDpred outperforms the approach of pruning followed by thresholding, particularly at large sample sizes. Accordingly, predicted R(2) increased from 20.1% to 25.3% in a large schizophrenia dataset and from 9.8% to 12.0% in a large multiple sclerosis dataset. A similar relative improvement in accuracy was observed for three additional large disease datasets and for non-European schizophrenia samples. The advantage of LDpred over existing methods will grow as sample sizes increase.
  • CITATION : Vilhjálmsson BJ, Yang J, Finucane HK, Gusev A, ...&, Price AL. (2015) Modeling linkage disequilibrium increases accuracy of polygenic risk scores Am. J. Hum. Genet., 97 (4) 576-592. doi:10.1016/j.ajhg.2015.09.001. PMID 26430803
  • JOURNAL_INFO : The American Journal of Human Genetics ; Am. J. Hum. Genet. ; 2015 ; 97 ; 4 ; 576-592
  • PUBMED_LINK : 26430803

LDpred-funct

  • NAME : LDpred-funct
  • SHORT NAME : LDpred-funct
  • FULL NAME : LDpred-funct
  • DESCRIPTION : LDpred-funct is a method for polygenic prediction that leverages trait-specific functional priors to increase prediction accuracy.
  • URL : https://github.com/carlaml/LDpred-funct
  • KEYWORDS : Bayesian, functional priors
  • TITLE : Incorporating functional priors improves polygenic prediction accuracy in UK Biobank and 23andMe data sets
  • DOI : 10.1038/s41467-021-25171-9
  • ABSTRACT : Polygenic risk prediction is a widely investigated topic because of its promising clinical applications. Genetic variants in functional regions of the genome are enriched for complex trait heritability. Here, we introduce a method for polygenic prediction, LDpred-funct, that leverages trait-specific functional priors to increase prediction accuracy. We fit priors using the recently developed baseline-LD model, including coding, conserved, regulatory, and LD-related annotations. We analytically estimate posterior mean causal effect sizes and then use cross-validation to regularize these estimates, improving prediction accuracy for sparse architectures. We applied LDpred-funct to predict 21 highly heritable traits in the UK Biobank (avg N = 373 K as training data). LDpred-funct attained a +4.6% relative improvement in average prediction accuracy (avg prediction R2 = 0.144; highest R2 = 0.413 for height) compared to SBayesR (the best method that does not incorporate functional information). For height, meta-analyzing training data from UK Biobank and 23andMe cohorts (N = 1107 K) increased prediction R2 to 0.431. Our results show that incorporating functional priors improves polygenic prediction accuracy, consistent with the functional architecture of complex traits.
  • COPYRIGHT : https://creativecommons.org/licenses/by/4.0
  • CITATION : Márquez-Luna C, Gazal S, Loh PR, Kim SS, ...&, Price AL. (2021) Incorporating functional priors improves polygenic prediction accuracy in UK Biobank and 23andMe data sets Nat. Commun., 12 (1) 6052. doi:10.1038/s41467-021-25171-9. PMID 34663819
  • JOURNAL_INFO : Nature communications ; Nat. Commun. ; 2021 ; 12 ; 1 ; 6052
  • PUBMED_LINK : 34663819

LDpred2

  • NAME : LDpred2
  • SHORT NAME : LDpred2
  • FULL NAME : LDpred2
  • DESCRIPTION : LDpred-2 is one of the dedicated PRS programs which is an R package that uses a Bayesian approach to polygenic risk scoring.
  • URL : https://privefl.github.io/bigsnpr/articles/LDpred2.html
  • KEYWORDS : Bayesian, R, LDpred2-grid (LDpred2), LDpred2-auto, LDpred2-sparse
  • TITLE : LDpred2: better, faster, stronger
  • DOI : 10.1093/bioinformatics/btaa1029
  • ABSTRACT : MOTIVATION: Polygenic scores have become a central tool in human genetics research. LDpred is a popular method for deriving polygenic scores based on summary statistics and a matrix of correlation between genetic variants. However, LDpred has limitations that may reduce its predictive performance. RESULTS: Here, we present LDpred2, a new version of LDpred that addresses these issues. We also provide two new options in LDpred2: a 'sparse' option that can learn effects that are exactly 0, and an 'auto' option that directly learns the two LDpred parameters from data. We benchmark predictive performance of LDpred2 against the previous version on simulated and real data, demonstrating substantial improvements in robustness and predictive accuracy compared to LDpred1. We then show that LDpred2 also outperforms other polygenic score methods recently developed, with a mean AUC over the 8 real traits analyzed here of 65.1%, compared to 63.8% for lassosum, 62.9% for PRS-CS and 61.5% for SBayesR. Note that LDpred2 provides more accurate polygenic scores when run genome-wide, instead of per chromosome. AVAILABILITY AND IMPLEMENTATION: LDpred2 is implemented in R package bigsnpr. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.
  • COPYRIGHT : http://creativecommons.org/licenses/by/4.0/
  • CITATION : Privé F, Arbel J, Vilhjálmsson BJ. (2021) LDpred2: better, faster, stronger Bioinformatics, 36 (22-23) 5424-5431. doi:10.1093/bioinformatics/btaa1029. PMID 33326037
  • JOURNAL_INFO : Bioinformatics (Oxford, England) ; Bioinformatics ; 2021 ; 36 ; 22-23 ; 5424-5431
  • PUBMED_LINK : 33326037

LDpred2-auto

  • NAME : LDpred2-auto
  • SHORT NAME : LDpred2-auto
  • FULL NAME : LDpred2-auto
  • DESCRIPTION : LDpred2 is a widely used Bayesian method for building polygenic scores (PGS). LDpred2-auto can infer the two parameters from the LDpred model, h^2 and p, so that it does not require an additional validation dataset to choose best-performing parameters. Here, we present a new version of LDpred2-auto, which adds a third parameter alpha to its model for modeling negative selection. Additional changes are also made to provide better sampling of these parameters.
  • URL : https://privefl.github.io/bigsnpr/articles/LDpred2.html
  • KEYWORDS : Bayesian, new LDpred2-auto, α (relationship between MAF and beta)
  • TITLE : Inferring disease architecture and predictive ability with LDpred2-auto
  • DOI : 10.1016/j.ajhg.2023.10.010
  • ABSTRACT : LDpred2 is a widely used Bayesian method for building polygenic scores (PGSs). LDpred2-auto can infer the two parameters from the LDpred model, the SNP heritability h2 and polygenicity p, so that it does not require an additional validation dataset to choose best-performing parameters. The main aim of this paper is to properly validate the use of LDpred2-auto for inferring multiple genetic parameters. Here, we present a new version of LDpred2-auto that adds an optional third parameter α to its model, for modeling negative selection. We then validate the inference of these three parameters (or two, when using the previous model). We also show that LDpred2-auto provides per-variant probabilities of being causal that are well calibrated and can therefore be used for fine-mapping purposes. We also introduce a formula to infer the out-of-sample predictive performance r2 of the resulting PGS directly from the Gibbs sampler of LDpred2-auto. Finally, we extend the set of HapMap3 variants recommended to use with LDpred2 with 37% more variants to improve the coverage of this set, and we show that this new set of variants captures 12% more heritability and provides 6% more predictive performance, on average, in UK Biobank analyses.
  • COPYRIGHT : http://www.elsevier.com/open-access/userlicense/1.0/
  • CITATION : Privé F, Albiñana C, Arbel J, Pasaniuc B, ...&, Vilhjálmsson BJ. (2023) Inferring disease architecture and predictive ability with LDpred2-auto Am. J. Hum. Genet., 110 (12) 2042-2055. doi:10.1016/j.ajhg.2023.10.010. PMID 37944514
  • JOURNAL_INFO : The American Journal of Human Genetics ; Am. J. Hum. Genet. ; 2023 ; 110 ; 12 ; 2042-2055
  • PUBMED_LINK : 37944514

MegaPRS

  • NAME : MegaPRS
  • SHORT NAME : MegaPRS
  • FULL NAME : MegaPRS
  • DESCRIPTION : individual level: big_spLinReg, LDAK-Ridge-Predict, LDAK-Bolt-Predict and LDAK-BayesR-Predict sumstats: LDAK-Lasso-SS, LDAK-Ridge-SS, LDAK-Bolt-SS and LDAK-BayesR-SS
  • URL : http://www.ldak.org/
  • TITLE : Improved genetic prediction of complex traits from individual-level data or summary statistics
  • DOI : 10.1038/s41467-021-24485-y
  • ABSTRACT : Most existing tools for constructing genetic prediction models begin with the assumption that all genetic variants contribute equally towards the phenotype. However, this represents a suboptimal model for how heritability is distributed across the genome. Therefore, we develop prediction tools that allow the user to specify the heritability model. We compare individual-level data prediction tools using 14 UK Biobank phenotypes; our new tool LDAK-Bolt-Predict outperforms the existing tools Lasso, BLUP, Bolt-LMM and BayesR for all 14 phenotypes. We compare summary statistic prediction tools using 225 UK Biobank phenotypes; our new tool LDAK-BayesR-SS outperforms the existing tools lassosum, sBLUP, LDpred and SBayesR for 223 of the 225 phenotypes. When we improve the heritability model, the proportion of phenotypic variance explained increases by on average 14%, which is equivalent to increasing the sample size by a quarter.
  • CITATION : Zhang Q, Privé F, Vilhjálmsson B, Speed D. (2021) Improved genetic prediction of complex traits from individual-level data or summary statistics Nat. Commun., 12 (1) 4192. doi:10.1038/s41467-021-24485-y. PMID 34234142
  • JOURNAL_INFO : Nature communications ; Nat. Commun. ; 2021 ; 12 ; 1 ; 4192
  • PUBMED_LINK : 34234142

MiXeR

  • NAME : MiXeR
  • SHORT NAME : MiXeR
  • FULL NAME : MiXeR(univariate)
  • DESCRIPTION : Causal Mixture Model for GWAS summary statistics
  • URL : https://github.com/precimed/mixer
  • TITLE : Beyond SNP heritability: Polygenicity and discoverability of phenotypes estimated with a univariate Gaussian mixture model
  • DOI : 10.1371/journal.pgen.1008612
  • ABSTRACT : Estimating the polygenicity (proportion of causally associated single nucleotide polymorphisms (SNPs)) and discoverability (effect size variance) of causal SNPs for human traits is currently of considerable interest. SNP-heritability is proportional to the product of these quantities. We present a basic model, using detailed linkage disequilibrium structure from a reference panel of 11 million SNPs, to estimate these quantities from genome-wide association studies (GWAS) summary statistics. We apply the model to diverse phenotypes and validate the implementation with simulations. We find model polygenicities (as a fraction of the reference panel) ranging from ≃ 2 × 10-5 to ≃ 4 × 10-3, with discoverabilities similarly ranging over two orders of magnitude. A power analysis allows us to estimate the proportions of phenotypic variance explained additively by causal SNPs reaching genome-wide significance at current sample sizes, and map out sample sizes required to explain larger portions of additive SNP heritability. The model also allows for estimating residual inflation (or deflation from over-correcting of z-scores), and assessing compatibility of replication and discovery GWAS summary statistics.
  • CITATION : Holland D, Frei O, Desikan R, Fan CC, ...&, Dale AM. (2020) Beyond SNP heritability: Polygenicity and discoverability of phenotypes estimated with a univariate Gaussian mixture model PLoS Genet., 16 (5) e1008612. doi:10.1371/journal.pgen.1008612. PMID 32427991
  • JOURNAL_INFO : PLoS genetics ; PLoS Genet. ; 2020 ; 16 ; 5 ; e1008612
  • PUBMED_LINK : 32427991

MultiBLUP

  • NAME : MultiBLUP
  • SHORT NAME : MultiBLUP
  • FULL NAME : MultiBLUP
  • URL : http://www.ldak.org/
  • TITLE : MultiBLUP: improved SNP-based prediction for complex traits
  • DOI : 10.1101/gr.169375.113
  • ABSTRACT : BLUP (best linear unbiased prediction) is widely used to predict complex traits in plant and animal breeding, and increasingly in human genetics. The BLUP mathematical model, which consists of a single random effect term, was adequate when kinships were measured from pedigrees. However, when genome-wide SNPs are used to measure kinships, the BLUP model implicitly assumes that all SNPs have the same effect-size distribution, which is a severe and unnecessary limitation. We propose MultiBLUP, which extends the BLUP model to include multiple random effects, allowing greatly improved prediction when the random effects correspond to classes of SNPs with distinct effect-size variances. The SNP classes can be specified in advance, for example, based on SNP functional annotations, and we also provide an adaptive procedure for determining a suitable partition of SNPs. We apply MultiBLUP to genome-wide association data from the Wellcome Trust Case Control Consortium (seven diseases), and from much larger studies of celiac disease and inflammatory bowel disease, finding that it consistently provides better prediction than alternative methods. Moreover, MultiBLUP is computationally very efficient; for the largest data set, which includes 12,678 individuals and 1.5 M SNPs, the total analysis can be run on a single desktop PC in less than a day and can be parallelized to run even faster. Tools to perform MultiBLUP are freely available in our software LDAK.
  • CITATION : Speed D, Balding DJ. (2014) MultiBLUP: improved SNP-based prediction for complex traits Genome Res., 24 (9) 1550-1557. doi:10.1101/gr.169375.113. PMID 24963154
  • JOURNAL_INFO : Genome research ; Genome Res. ; 2014 ; 24 ; 9 ; 1550-1557
  • PUBMED_LINK : 24963154

PRS-CS

  • NAME : PRS-CS
  • SHORT NAME : PRS-CS
  • FULL NAME : PRS-CS
  • DESCRIPTION : PRS-CS is a Python based command line tool that infers posterior SNP effect sizes under continuous shrinkage (CS) priors using GWAS summary statistics and an external LD reference panel.
  • URL : https://github.com/getian107/PRScs
  • KEYWORDS : continuous shrinkage (CS) prior
  • TITLE : Polygenic prediction via Bayesian regression and continuous shrinkage priors
  • DOI : 10.1038/s41467-019-09718-5
  • ABSTRACT : Polygenic risk scores (PRS) have shown promise in predicting human complex traits and diseases. Here, we present PRS-CS, a polygenic prediction method that infers posterior effect sizes of single nucleotide polymorphisms (SNPs) using genome-wide association summary statistics and an external linkage disequilibrium (LD) reference panel. PRS-CS utilizes a high-dimensional Bayesian regression framework, and is distinct from previous work by placing a continuous shrinkage (CS) prior on SNP effect sizes, which is robust to varying genetic architectures, provides substantial computational advantages, and enables multivariate modeling of local LD patterns. Simulation studies using data from the UK Biobank show that PRS-CS outperforms existing methods across a wide range of genetic architectures, especially when the training sample size is large. We apply PRS-CS to predict six common complex diseases and six quantitative traits in the Partners HealthCare Biobank, and further demonstrate the improvement of PRS-CS in prediction accuracy over alternative methods.
  • COPYRIGHT : https://creativecommons.org/licenses/by/4.0
  • CITATION : Ge T, Chen CY, Ni Y, Feng YA, ...&, Smoller JW. (2019) Polygenic prediction via Bayesian regression and continuous shrinkage priors Nat. Commun., 10 (1) 1776. doi:10.1038/s41467-019-09718-5. PMID 30992449
  • JOURNAL_INFO : Nature communications ; Nat. Commun. ; 2019 ; 10 ; 1 ; 1776
  • PUBMED_LINK : 30992449

PRSMix_AOI

  • NAME : PRSMix_AOI
  • SHORT NAME : PRSMix_AOI
  • FULL NAME : add -one-in (AOI)
  • PREPRINT_DOI : 10.1101/2024.07.24.24310897
  • SERVER : biorxiv
  • CITATION : Misra, A. et al. Instability of high polygenic risk classification and mitigation by integrative scoring. bioRxiv 2024.07.24.24310897 (2024) doi:10.1101/2024.07.24.24310897.

PRS_to_Abs

  • NAME : PRS_to_Abs
  • SHORT NAME : PRS_to_Abs
  • FULL NAME : PRS_to_Abs
  • DESCRIPTION : Converting Polygenic Score to Absolute Scale
  • URL : https://opain.github.io/GenoPred/PRS_to_Abs_tool.html
  • TITLE : A tool for translating polygenic scores onto the absolute scale using summary statistics
  • DOI : 10.1038/s41431-021-01028-z
  • ABSTRACT : There is growing interest in the clinical application of polygenic scores as their predictive utility increases for a range of health-related phenotypes. However, providing polygenic score predictions on the absolute scale is an important step for their safe interpretation. We have developed a method to convert polygenic scores to the absolute scale for binary and normally distributed phenotypes. This method uses summary statistics, requiring only the area-under-the-ROC curve (AUC) or variance explained (R2) by the polygenic score, and the prevalence of binary phenotypes, or mean and standard deviation of normally distributed phenotypes. Polygenic scores are converted using normal distribution theory. We also evaluate methods for estimating polygenic score AUC/R2 from genome-wide association study (GWAS) summary statistics alone. We validate the absolute risk conversion and AUC/R2 estimation using data for eight binary and three continuous phenotypes in the UK Biobank sample. When the AUC/R2 of the polygenic score is known, the observed and estimated absolute values were highly concordant. Estimates of AUC/R2 from the lassosum pseudovalidation method were most similar to the observed AUC/R2 values, though estimated values deviated substantially from the observed for autoimmune disorders. This study enables accurate interpretation of polygenic scores using only summary statistics, providing a useful tool for educational and clinical purposes. Furthermore, we have created interactive webtools implementing the conversion to the absolute ( https://opain.github.io/GenoPred/PRS_to_Abs_tool.html ). Several further barriers must be addressed before clinical implementation of polygenic scores, such as ensuring target individuals are well represented by the GWAS sample.
  • COPYRIGHT : https://creativecommons.org/licenses/by/4.0
  • CITATION : Pain O, Gillett AC, Austin JC, Folkersen L, ...&, Lewis CM. (2022) A tool for translating polygenic scores onto the absolute scale using summary statistics Eur. J. Hum. Genet., 30 (3) 339-348. doi:10.1038/s41431-021-01028-z. PMID 34983942
  • JOURNAL_INFO : European journal of human genetics: EJHG ; Eur. J. Hum. Genet. ; 2022 ; 30 ; 3 ; 339-348
  • PUBMED_LINK : 34983942

PRStuning

  • NAME : PRStuning
  • SHORT NAME : PRStuning
  • FULL NAME : PRStuning
  • DESCRIPTION : Estimate Testing AUC for Binary Phenotype Using GWAS Summary Statistics from the Training Data
  • CITATION : Jiang, W., Chen, L., Girgenti, M. J., & Zhao, H. (2023). Tuning Parameters for Polygenic Risk Score Methods Using GWAS Summary Statistics from Training Data. Research Square.
  • PUBMED_LINK : 37398263

SDPR

  • NAME : SDPR
  • SHORT NAME : SDPR
  • FULL NAME : SDPR
  • DESCRIPTION : SDPR (Summary statistics based Dirichelt Process Regression) is a method to compute polygenic risk score (PRS) from summary statistics. It is the extension of Dirichlet Process Regression (DPR) to the use of summary statistics
  • URL : https://github.com/eldronzhou/SDPR
  • TITLE : A fast and robust Bayesian nonparametric method for prediction of complex traits using summary statistics
  • DOI : 10.1371/journal.pgen.1009697
  • ABSTRACT : Genetic prediction of complex traits has great promise for disease prevention, monitoring, and treatment. The development of accurate risk prediction models is hindered by the wide diversity of genetic architecture across different traits, limited access to individual level data for training and parameter tuning, and the demand for computational resources. To overcome the limitations of the most existing methods that make explicit assumptions on the underlying genetic architecture and need a separate validation data set for parameter tuning, we develop a summary statistics-based nonparametric method that does not rely on validation datasets to tune parameters. In our implementation, we refine the commonly used likelihood assumption to deal with the discrepancy between summary statistics and external reference panel. We also leverage the block structure of the reference linkage disequilibrium matrix for implementation of a parallel algorithm. Through simulations and applications to twelve traits, we show that our method is adaptive to different genetic architectures, statistically robust, and computationally efficient. Our method is available at https://github.com/eldronzhou/SDPR.
  • COPYRIGHT : http://creativecommons.org/licenses/by/4.0/
  • CITATION : Zhou G, Zhao H. (2021) A fast and robust Bayesian nonparametric method for prediction of complex traits using summary statistics PLoS Genet., 17 (7) e1009697. doi:10.1371/journal.pgen.1009697. PMID 34310601
  • JOURNAL_INFO : PLoS genetics ; PLoS Genet. ; 2021 ; 17 ; 7 ; e1009697
  • PUBMED_LINK : 34310601

VIPRS

  • NAME : VIPRS
  • SHORT NAME : VIPRS
  • FULL NAME : Variational inference of polygenic risk scores
  • DESCRIPTION : viprs is a python package that implements scripts and utilities for running variational inference algorithms on genome-wide association study (GWAS) data for the purposes polygenic risk estimation.
  • URL : https://github.com/shz9/viprs
  • KEYWORDS : Variational Inference (VI)
  • TITLE : Fast and accurate Bayesian polygenic risk modeling with variational inference
  • DOI : 10.1016/j.ajhg.2023.03.009
  • ABSTRACT : The advent of large-scale genome-wide association studies (GWASs) has motivated the development of statistical methods for phenotype prediction with single-nucleotide polymorphism (SNP) array data. These polygenic risk score (PRS) methods use a multiple linear regression framework to infer joint effect sizes of all genetic variants on the trait. Among the subset of PRS methods that operate on GWAS summary statistics, sparse Bayesian methods have shown competitive predictive ability. However, most existing Bayesian approaches employ Markov chain Monte Carlo (MCMC) algorithms, which are computationally inefficient and do not scale favorably to higher dimensions, for posterior inference. Here, we introduce variational inference of polygenic risk scores (VIPRS), a Bayesian summary statistics-based PRS method that utilizes variational inference techniques to approximate the posterior distribution for the effect sizes. Our experiments with 36 simulation configurations and 12 real phenotypes from the UK Biobank dataset demonstrated that VIPRS is consistently competitive with the state-of-the-art in prediction accuracy while being more than twice as fast as popular MCMC-based approaches. This performance advantage is robust across a variety of genetic architectures, SNP heritabilities, and independent GWAS cohorts. In addition to its competitive accuracy on the "White British" samples, VIPRS showed improved transferability when applied to other ethnic groups, with up to 1.7-fold increase in R2 among individuals of Nigerian ancestry for low-density lipoprotein (LDL) cholesterol. To illustrate its scalability, we applied VIPRS to a dataset of 9.6 million genetic markers, which conferred further improvements in prediction accuracy for highly polygenic traits, such as height.
  • CITATION : Zabad S, Gravel S, Li Y. (2023) Fast and accurate Bayesian polygenic risk modeling with variational inference Am. J. Hum. Genet., 110 (5) 741-761. doi:10.1016/j.ajhg.2023.03.009. PMID 37030289
  • JOURNAL_INFO : American journal of human genetics ; Am. J. Hum. Genet. ; 2023 ; 110 ; 5 ; 741-761
  • PUBMED_LINK : 37030289

lassosum

  • NAME : lassosum
  • SHORT NAME : lassosum
  • FULL NAME : lassosum
  • DESCRIPTION : lassosum is a method for computing LASSO/Elastic Net estimates of a linear regression problem given summary statistics from GWAS and Genome-wide meta-analyses, accounting for Linkage Disequilibrium (LD), via a reference panel.
  • URL : https://github.com/tshmak/lassosum
  • KEYWORDS : penalized regression
  • TITLE : Polygenic scores via penalized regression on summary statistics
  • DOI : 10.1002/gepi.22050
  • ABSTRACT : Polygenic scores (PGS) summarize the genetic contribution of a person's genotype to a disease or phenotype. They can be used to group participants into different risk categories for diseases, and are also used as covariates in epidemiological analyses. A number of possible ways of calculating PGS have been proposed, and recently there is much interest in methods that incorporate information available in published summary statistics. As there is no inherent information on linkage disequilibrium (LD) in summary statistics, a pertinent question is how we can use LD information available elsewhere to supplement such analyses. To answer this question, we propose a method for constructing PGS using summary statistics and a reference panel in a penalized regression framework, which we call lassosum. We also propose a general method for choosing the value of the tuning parameter in the absence of validation data. In our simulations, we showed that pseudovalidation often resulted in prediction accuracy that is comparable to using a dataset with validation phenotype and was clearly superior to the conservative option of setting the tuning parameter of lassosum to its lowest value. We also showed that lassosum achieved better prediction accuracy than simple clumping and P-value thresholding in almost all scenarios. It was also substantially faster and more accurate than the recently proposed LDpred.
  • CITATION : Mak TSH, Porsch RM, Choi SW, Zhou X, ...&, Sham PC. (2017) Polygenic scores via penalized regression on summary statistics Genet. Epidemiol., 41 (6) 469-480. doi:10.1002/gepi.22050. PMID 28480976
  • JOURNAL_INFO : Genetic epidemiology ; Genet. Epidemiol. ; 2017 ; 41 ; 6 ; 469-480
  • PUBMED_LINK : 28480976

lassosum2

  • NAME : lassosum2
  • SHORT NAME : lassosum2
  • FULL NAME : lassosum2
  • DESCRIPTION : lassosum2 is a re-implementation of the lassosum model that now uses the exact same input parameters as LDpred2 (corr and df_beta). It should be fast to run. It can be run next to LDpred2 and the best model can be chosen using the validation set. Note that parameter ‘s’ from lassosum has been replaced by a new parameter ‘delta’ in lassosum2, in order to better reflect that the lassosum model also uses L2-regularization (therefore, elastic-net regularization).
  • URL : https://privefl.github.io/bigsnpr/articles/LDpred2.html#lassosum2-grid-of-models
  • TITLE : Identifying and correcting for misspecifications in GWAS summary statistics and polygenic scores
  • DOI : 10.1016/j.xhgg.2022.100136
  • ABSTRACT : Publicly available genome-wide association studies (GWAS) summary statistics exhibit uneven quality, which can impact the validity of follow-up analyses. First, we present an overview of possible misspecifications that come with GWAS summary statistics. Then, in both simulations and real-data analyses, we show that additional information such as imputation INFO scores, allele frequencies, and per-variant sample sizes in GWAS summary statistics can be used to detect possible issues and correct for misspecifications in the GWAS summary statistics. One important motivation for us is to improve the predictive performance of polygenic scores built from these summary statistics. Unfortunately, owing to the lack of reporting standards for GWAS summary statistics, this additional information is not systematically reported. We also show that using well-matched linkage disequilibrium (LD) references can improve model fit and translate into more accurate prediction. Finally, we discuss how to make polygenic score methods such as lassosum and LDpred2 more robust to these misspecifications to improve their predictive power.
  • CITATION : Privé F, Arbel J, Aschard H, Vilhjálmsson BJ. (2022) Identifying and correcting for misspecifications in GWAS summary statistics and polygenic scores HGG Adv, 3 (4) 100136. doi:10.1016/j.xhgg.2022.100136. PMID 36105883
  • JOURNAL_INFO : HGG advances ; HGG Adv ; 2022 ; 3 ; 4 ; 100136
  • PUBMED_LINK : 36105883

meta-PRS

  • NAME : meta-PRS
  • SHORT NAME : meta-PRS
  • FULL NAME : linear combination of PRSs
  • URL : https://github.com/ClaraAlbi/paper_MetaPRS
  • TITLE : Leveraging both individual-level genetic data and GWAS summary statistics increases polygenic prediction
  • DOI : 10.1016/j.ajhg.2021.04.014
  • ABSTRACT : The accuracy of polygenic risk scores (PRSs) to predict complex diseases increases with the training sample size. PRSs are generally derived based on summary statistics from large meta-analyses of multiple genome-wide association studies (GWASs). However, it is now common for researchers to have access to large individual-level data as well, such as the UK Biobank data. To the best of our knowledge, it has not yet been explored how best to combine both types of data (summary statistics and individual-level data) to optimize polygenic prediction. The most widely used approach to combine data is the meta-analysis of GWAS summary statistics (meta-GWAS), but we show that it does not always provide the most accurate PRS. Through simulations and using 12 real case-control and quantitative traits from both iPSYCH and UK Biobank along with external GWAS summary statistics, we compare meta-GWAS with two alternative data-combining approaches, stacked clumping and thresholding (SCT) and meta-PRS. We find that, when large individual-level data are available, the linear combination of PRSs (meta-PRS) is both a simple alternative to meta-GWAS and often more accurate.
  • COPYRIGHT : http://creativecommons.org/licenses/by-nc-nd/4.0/
  • CITATION : Albiñana C, Grove J, McGrath JJ, Agerbo E, ...&, Vilhjálmsson BJ. (2021) Leveraging both individual-level genetic data and GWAS summary statistics increases polygenic prediction Am. J. Hum. Genet., 108 (6) 1001-1011. doi:10.1016/j.ajhg.2021.04.014. PMID 33964208
  • JOURNAL_INFO : The American Journal of Human Genetics ; Am. J. Hum. Genet. ; 2021 ; 108 ; 6 ; 1001-1011
  • PUBMED_LINK : 33964208

rtPRS-CS

  • NAME : rtPRS-CS
  • SHORT NAME : rtPRS-CS
  • FULL NAME : real-time PRS-CS
  • DESCRIPTION : rtPRS-CS is a python-based command line tool that performs real-time online updating of polygenic risk score (PRS) weights in a target dataset, one sample at-a-time. Given the most recent set of SNP weights, for each new target sample with both phenotypic and genetic information, rtPRS-CS uses stochastic gradient descent to update the SNP weights, adjusting for the effect of a set of covariates.
  • URL : https://github.com/getian107/rtPRS
  • PREPRINT_DOI : 10.1101/2024.07.12.24310357
  • SERVER : biorxiv
  • CITATION : Tubbs, J. D., Chen, Y., Duan, R., Huang, H. & Ge, T. Real-time dynamic polygenic prediction for streaming data. bioRxiv 2024.07.12.24310357 (2024) doi:10.1101/2024.07.12.24310357.

Standards

PRS-RS

  • NAME : PRS-RS
  • SHORT NAME : PRS-RS
  • FULL NAME : Polygenic Risk Score Reporting Standards
  • TITLE : Improving reporting standards for polygenic scores in risk prediction studies
  • DOI : 10.1038/s41586-021-03243-6
  • ABSTRACT : Polygenic risk scores (PRSs), which often aggregate results from genome-wide association studies, can bridge the gap between initial discovery efforts and clinical applications for the estimation of disease risk using genetics. However, there is notable heterogeneity in the application and reporting of these risk scores, which hinders the translation of PRSs into clinical care. Here, in a collaboration between the Clinical Genome Resource (ClinGen) Complex Disease Working Group and the Polygenic Score (PGS) Catalog, we present the Polygenic Risk Score Reporting Standards (PRS-RS), in which we update the Genetic Risk Prediction Studies (GRIPS) Statement to reflect the present state of the field. Drawing on the input of experts in epidemiology, statistics, disease-specific applications, implementation and policy, this comprehensive reporting framework defines the minimal information that is needed to interpret and evaluate PRSs, especially with respect to downstream clinical applications. Items span detailed descriptions of study populations, statistical methods for the development and validation of PRSs and considerations for the potential limitations of these scores. In addition, we emphasize the need for data availability and transparency, and we encourage researchers to deposit and share PRSs through the PGS Catalog to facilitate reproducibility and comparative benchmarking. By providing these criteria in a structured format that builds on existing standards and ontologies, the use of this framework in publishing PRSs will facilitate translation into clinical care and progress towards defining best practice.
  • CITATION : Wand H, Lambert SA, Tamburro C, Iacocca MA, ...&, Wojcik GL. (2021) Improving reporting standards for polygenic scores in risk prediction studies Nature, 591 (7849) 211-219. doi:10.1038/s41586-021-03243-6. PMID 33692554
  • JOURNAL_INFO : Nature ; Nature ; 2021 ; 591 ; 7849 ; 211-219
  • PUBMED_LINK : 33692554

Tutorial

Tutorial-Choi

  • NAME : Tutorial-Choi
  • SHORT NAME : PRS Tutorial
  • FULL NAME : PRS Tutorial
  • DESCRIPTION : This tutorial provides a step-by-step guide to performing basic polygenic risk score (PRS) analyses and accompanies our PRS Guide paper. The aim of this tutorial is to provide a simple introduction of PRS analyses to those new to PRS, while equipping existing users with a better understanding of the processes and implementation "underneath the hood" of popular PRS software.
  • URL : https://choishingwan.github.io/PRS-Tutorial/
  • TITLE : Tutorial: a guide to performing polygenic risk score analyses
  • DOI : 10.1038/s41596-020-0353-1
  • ABSTRACT : A polygenic score (PGS) or polygenic risk score (PRS) is an estimate of an individual's genetic liability to a trait or disease, calculated according to their genotype profile and relevant genome-wide association study (GWAS) data. While present PRSs typically explain only a small fraction of trait variance, their correlation with the single largest contributor to phenotypic variation-genetic liability-has led to the routine application of PRSs across biomedical research. Among a range of applications, PRSs are exploited to assess shared etiology between phenotypes, to evaluate the clinical utility of genetic data for complex disease and as part of experimental studies in which, for example, experiments are performed that compare outcomes (e.g., gene expression and cellular response to treatment) between individuals with low and high PRS values. As GWAS sample sizes increase and PRSs become more powerful, PRSs are set to play a key role in research and stratified medicine. However, despite the importance and growing application of PRSs, there are limited guidelines for performing PRS analyses, which can lead to inconsistency between studies and misinterpretation of results. Here, we provide detailed guidelines for performing and interpreting PRS analyses. We outline standard quality control steps, discuss different methods for the calculation of PRSs, provide an introductory online tutorial, highlight common misconceptions relating to PRS results, offer recommendations for best practice and discuss future challenges.
  • COPYRIGHT : https://www.springernature.com/gp/researchers/text-and-data-mining
  • CITATION : Choi SW, Mak TS, O'Reilly PF. (2020) Tutorial: a guide to performing polygenic risk score analyses Nat. Protoc., 15 (9) 2759-2772. doi:10.1038/s41596-020-0353-1. PMID 32709988
  • JOURNAL_INFO : Nature protocols ; Nat. Protoc. ; 2020 ; 15 ; 9 ; 2759-2772
  • PUBMED_LINK : 32709988