Tools Fine mapping
Curation of Fine mapping — listings under the GWAS Tools tab.
Summary Table
Click a column header to sort the table.
| NAME | Main citation | YEAR |
|---|---|---|
| BEATRICE | Ghosal S et al., Bioinformatics, 2024 |
2024 |
| CAFEH | Arvanitis M et al., Am J Hum Genet, 2022 |
2022 |
| CAVIARBF | Chen W et al., Genetics, 2015 |
2015 |
| CAVIAR | Hormozdiari F et al., Genetics, 2014 |
2014 |
| FINEMAP | Benner C et al., Bioinformatics, 2016 |
2016 |
| GWFM | Wu Y et al., Nat Genet, 2026 |
2026 |
| JAM | Newcombe PJ et al., Genet Epidemiol, 2016 |
2016 |
| MESuSiE | Gao B et al., Nat Genet, 2024 |
2024 |
| MR-MEGA | Mägi R et al., Hum Mol Genet, 2017 |
2017 |
| MsCAVIAR | LaPierre N et al., PLoS Genet, 2021 |
2021 |
| MultiSuSiE | Rossen J et al., Nat Genet, 2026 |
2026 |
| PAINTOR | Kichaev G et al., PLoS Genet, 2014 |
2014 |
| RFR SuSiE-inf FINEMAP-inf | Cui R et al., Nat Genet, 2024 |
2024 |
| SUSIE-RSS | Zou Y et al., PLoS Genet, 2022 |
2022 |
| SUSIE | Wang G et al., J R Stat Soc Series B Stat Methodol, 2020 |
2020 |
| SUSIEx | Yuan, K., Longchamps, R. J., Pardiñas, A. F., Yu, M., Chen, T. T., Lin, S. C., ... & Schizophrenia Workgroup of… |
NA |
| SparsePro | Zhang W et al., PLoS Genet, 2023 |
2023 |
| flashfmZero | Zhou F et al., Cell Genom, 2025 |
2025 |
| mJAM | Shen, J., Jiang, L., Wang, K., Wang, A., Chen, F., Newcombe, P. J., ... & Conti, D. V. (2022). Fine-mapping and… |
NA |
| mvSuSiE | Zou Y et al., Nat Genet, 2026 |
2026 |
BEATRICE
PUBMED_LINK
FULL NAME
Bayesian finE-mapping from summAry daTa using deep vaRiational InferenCE
DESCRIPTION
In this repository, we introduce BEATRICE, a finemapping tool to identify putative causal variants from GWAS summary data. BEATRICE combines a hierarchical Bayesian model with a deep learning-based inference procedure. This combination provides greater inferential power to handle noise and spurious interactions due to polygenicity of the trait, trans-interactions of variants, or varying correlation structure of the genomic region.
URL
TITLE
BEATRICE: Bayesian fine-mapping from summary data using deep variational inference.
Main citation
Ghosal S, Schatz MC, Venkataraman A. (2024) BEATRICE: Bayesian fine-mapping from summary data using deep variational inference. Bioinformatics, 40 (10) . doi:10.1093/bioinformatics/btae590. PMID 39360993
ABSTRACT
MOTIVATION: We introduce a novel framework BEATRICE to identify putative causal variants from GWAS statistics. Identifying causal variants is challenging due to their sparsity and high correlation in the nearby regions. To account for these challenges, we rely on a hierarchical Bayesian model that imposes a binary concrete prior on the set of causal variants. We derive a variational algorithm for this fine-mapping problem by minimizing the KL divergence between an approximate density and the posterior probability distribution of the causal configurations. Correspondingly, we use a deep neural network as an inference machine to estimate the parameters of our proposal distribution. Our stochastic optimization procedure allows us to sample from the space of causal configurations, which we use to compute the posterior inclusion probabilities and determine credible sets for each causal variant. We conduct a detailed simulation study to quantify the performance of our framework against two state-of-the-art baseline methods across different numbers of causal variants and noise paradigms, as defined by the relative genetic contributions of causal and noncausal variants. RESULTS: We demonstrate that BEATRICE achieves uniformly better coverage with comparable power and set sizes, and that the performance gain increases with the number of causal variants. We also show the efficacy BEATRICE in finding causal variants from the GWAS study of Alzheimer's disease. In comparison to the baselines, only BEATRICE can successfully find the APOE ϵ2 allele, a commonly associated variant of Alzheimer's. AVAILABILITY AND IMPLEMENTATION: BEATRICE is available for download at https://github.com/sayangsep/Beatrice-Finemapping.
DOI
10.1093/bioinformatics/btae590
CAFEH
PUBMED_LINK
FULL NAME
colocalization and fine-mapping in the presence of allelic heterogeneity
DESCRIPTION
CAFEH is a method that performs finemapping and colocalization jointly over multiple phenotypes. CAFEH can be run with 10s of phenotypes and 1000s of variants in a few minutes.
URL
KEYWORDS
multi-trait, finemapping, colocalization
TITLE
Redefining tissue specificity of genetic regulation of gene expression in the presence of allelic heterogeneity.
Main citation
Arvanitis M, Tayeb K, Strober BJ, Battle A. (2022) Redefining tissue specificity of genetic regulation of gene expression in the presence of allelic heterogeneity. Am J Hum Genet, 109 (2) 223-239. doi:10.1016/j.ajhg.2022.01.002. PMID 35085493
ABSTRACT
Uncovering the functional impact of genetic variation on gene expression is important in understanding tissue biology and the pathogenesis of complex traits. Despite large efforts to map expression quantitative trait loci (eQTLs) across many human tissues, our ability to translate those findings to understanding human disease has been incomplete, and the majority of disease loci are not explained by association with expression of a target gene. Cell-type specificity and the presence of multiple independent causal variants for many eQTLs are potential confounders contributing to the apparent discrepancy with disease loci. In this study, we investigate the tissue specificity of genetic effects on gene expression and the overlap with disease loci while considering the presence of multiple causal variants within and across tissues. We find evidence of pervasive tissue specificity of eQTLs, often masked by linkage disequilibrium that misleads traditional meta-analytic approaches. We propose CAFEH (colocalization and fine-mapping in the presence of allelic heterogeneity), a Bayesian method that integrates genetic association data across multiple traits, incorporating linkage disequilibrium to identify causal variants. CAFEH outperforms previous approaches in colocalization and fine-mapping. Using CAFEH, we show that genes with highly tissue-specific genetic effects are under greater selection, enriched in differentiation and developmental processes, and more likely to be involved in human disease. Last, we demonstrate that CAFEH can efficiently leverage the widespread allelic heterogeneity in genetic regulation of gene expression to prioritize the target tissue in genome-wide association complex trait loci, thereby improving our ability to interpret complex trait genetics.
DOI
10.1016/j.ajhg.2022.01.002
CAVIAR
PUBMED_LINK
FULL NAME
causal variants identification in associated regions
DESCRIPTION
a statistical framework that quantifies the probability of each variant to be causal while allowing an arbitrary number of causal variants.
URL
TITLE
Identifying causal variants at loci with multiple signals of association.
Main citation
Hormozdiari F, Kostem E, Kang EY, Pasaniuc B, ...&, Eskin E. (2014) Identifying causal variants at loci with multiple signals of association. Genetics, 198 (2) 497-508. doi:10.1534/genetics.114.167908. PMID 25104515
ABSTRACT
Although genome-wide association studies have successfully identified thousands of risk loci for complex traits, only a handful of the biologically causal variants, responsible for association at these loci, have been successfully identified. Current statistical methods for identifying causal variants at risk loci either use the strength of the association signal in an iterative conditioning framework or estimate probabilities for variants to be causal. A main drawback of existing methods is that they rely on the simplifying assumption of a single causal variant at each risk locus, which is typically invalid at many risk loci. In this work, we propose a new statistical framework that allows for the possibility of an arbitrary number of causal variants when estimating the posterior probability of a variant being causal. A direct benefit of our approach is that we predict a set of variants for each locus that under reasonable assumptions will contain all of the true causal variants with a high confidence level (e.g., 95%) even when the locus contains multiple causal variants. We use simulations to show that our approach provides 20-50% improvement in our ability to identify the causal variants compared to the existing methods at loci harboring multiple causal variants. We validate our approach using empirical data from an expression QTL study of CHI3L2 to identify new causal variants that affect gene expression at this locus. CAVIAR is publicly available online at http://genetics.cs.ucla.edu/caviar/.
DOI
10.1534/genetics.114.167908
CAVIARBF
PUBMED_LINK
FULL NAME
CAVIAR Bayes factor
DESCRIPTION
a fine-mapping method using marginal test statistics in the Bayesian framework
URL
KEYWORDS
Bayes factor
TITLE
Fine Mapping Causal Variants with an Approximate Bayesian Method Using Marginal Test Statistics.
Main citation
Chen W, Larrabee BR, Ovsyannikova IG, Kennedy RB, ...&, Schaid DJ. (2015) Fine Mapping Causal Variants with an Approximate Bayesian Method Using Marginal Test Statistics. Genetics, 200 (3) 719-36. doi:10.1534/genetics.115.176107. PMID 25948564
ABSTRACT
Two recently developed fine-mapping methods, CAVIAR and PAINTOR, demonstrate better performance over other fine-mapping methods. They also have the advantage of using only the marginal test statistics and the correlation among SNPs. Both methods leverage the fact that the marginal test statistics asymptotically follow a multivariate normal distribution and are likelihood based. However, their relationship with Bayesian fine mapping, such as BIMBAM, is not clear. In this study, we first show that CAVIAR and BIMBAM are actually approximately equivalent to each other. This leads to a fine-mapping method using marginal test statistics in the Bayesian framework, which we call CAVIAR Bayes factor (CAVIARBF). Another advantage of the Bayesian framework is that it can answer both association and fine-mapping questions. We also used simulations to compare CAVIARBF with other methods under different numbers of causal variants. The results showed that both CAVIARBF and BIMBAM have better performance than PAINTOR and other methods. Compared to BIMBAM, CAVIARBF has the advantage of using only marginal test statistics and takes about one-quarter to one-fifth of the running time. We applied different methods on two independent cohorts of the same phenotype. Results showed that CAVIARBF, BIMBAM, and PAINTOR selected the same top 3 SNPs; however, CAVIARBF and BIMBAM had better consistency in selecting the top 10 ranked SNPs between the two cohorts. Software is available at https://bitbucket.org/Wenan/caviarbf.
DOI
10.1534/genetics.115.176107
FINEMAP
PUBMED_LINK
DESCRIPTION
FINEMAP is a program for 1.identifying causal SNPs, 2. estimating effect sizes of causal SNPs, 3 estimating the heritability contribution of causal SNPs
URL
KEYWORDS
Shotgun Stochastic Search (SSS)
TITLE
FINEMAP: efficient variable selection using summary data from genome-wide association studies.
Main citation
Benner C, Spencer CC, Havulinna AS, Salomaa V, ...&, Pirinen M. (2016) FINEMAP: efficient variable selection using summary data from genome-wide association studies. Bioinformatics, 32 (10) 1493-501. doi:10.1093/bioinformatics/btw018. PMID 26773131
ABSTRACT
MOTIVATION: The goal of fine-mapping in genomic regions associated with complex diseases and traits is to identify causal variants that point to molecular mechanisms behind the associations. Recent fine-mapping methods using summary data from genome-wide association studies rely on exhaustive search through all possible causal configurations, which is computationally expensive. RESULTS: We introduce FINEMAP, a software package to efficiently explore a set of the most important causal configurations of the region via a shotgun stochastic search algorithm. We show that FINEMAP produces accurate results in a fraction of processing time of existing approaches and is therefore a promising tool for analyzing growing amounts of data produced in genome-wide association studies and emerging sequencing projects. AVAILABILITY AND IMPLEMENTATION: FINEMAP v1.0 is freely available for Mac OS X and Linux at http://www.christianbenner.com CONTACT: : christian.benner@helsinki.fi or matti.pirinen@helsinki.fi.
DOI
10.1093/bioinformatics/btw018
GWFM
PUBMED_LINK
FULL NAME
Genome-wide fine-mapping with functional annotations
DESCRIPTION
Genome-wide fine-mapping (GWFM) with functional annotations models the global genetic architecture rather than isolated loci; compared with region-specific approaches it improves error control, power, resolution, precision, replication, and cross-ancestry phenotype prediction. Distributed as part of the GCTB software suite.
URL
KEYWORDS
fine-mapping, functional annotation, credible sets, trans-ancestry
TITLE
Genome-wide fine-mapping improves identification of causal variants.
Main citation
Wu Y, Zheng Z, Thibaut L, Lin T, ...&, Zeng J. (2026) Genome-wide fine-mapping improves identification of causal variants. Nat Genet, () . doi:10.1038/s41588-026-02549-3. PMID 41912930
ABSTRACT
Fine-mapping refines genotype-phenotype association signals to identify causal variants underlying complex traits. However, current methods typically focus on individual genomic loci and do not account for the global genetic architecture. Here we demonstrate the advantages of performing genome-wide fine-mapping (GWFM) with functional annotations and develop methods to facilitate GWFM. In simulations and real data analyses, GWFM outperforms current methods across several metrics, including error control, mapping power, resolution, precision, replication rate and trans-ancestry phenotype prediction. Across 48 complex traits, we identify credible sets that collectively explain 18% of the SNP-based heritability ( h SNP 2 ) on average, with 30% credible sets located outside genome-wide significant loci. Leveraging the genetic architecture estimated from GWFM, we predict that fine-mapping over 50% of h SNP 2 would require an average of 2 million samples. Finally, as proof-of-principle, we highlight a known causal variant at FTO influencing body mass index and identify new missense causal variants influencing schizophrenia and Crohn's disease risk.
DOI
10.1038/s41588-026-02549-3
JAM
PUBMED_LINK
FULL NAME
joint analysis of marginal summary statistics
DESCRIPTION
Bayesian variable selection under a range of likelihoods, including linear regression for continuous outcomes, logistic regression for binary outcomes, Weibull regression for survival outcomes binary and survial outcomes, and the "JAM" model for summary genetic association data.
URL
TITLE
JAM: A Scalable Bayesian Framework for Joint Analysis of Marginal SNP Effects.
Main citation
Newcombe PJ, Conti DV, Richardson S. (2016) JAM: A Scalable Bayesian Framework for Joint Analysis of Marginal SNP Effects. Genet Epidemiol, 40 (3) 188-201. doi:10.1002/gepi.21953. PMID 27027514
ABSTRACT
Recently, large scale genome-wide association study (GWAS) meta-analyses have boosted the number of known signals for some traits into the tens and hundreds. Typically, however, variants are only analysed one-at-a-time. This complicates the ability of fine-mapping to identify a small set of SNPs for further functional follow-up. We describe a new and scalable algorithm, joint analysis of marginal summary statistics (JAM), for the re-analysis of published marginal summary statistics under joint multi-SNP models. The correlation is accounted for according to estimates from a reference dataset, and models and SNPs that best explain the complete joint pattern of marginal effects are highlighted via an integrated Bayesian penalized regression framework. We provide both enumerated and Reversible Jump MCMC implementations of JAM and present some comparisons of performance. In a series of realistic simulation studies, JAM demonstrated identical performance to various alternatives designed for single region settings. In multi-region settings, where the only multivariate alternative involves stepwise selection, JAM offered greater power and specificity. We also present an application to real published results from MAGIC (meta-analysis of glucose and insulin related traits consortium) - a GWAS meta-analysis of more than 15,000 people. We re-analysed several genomic regions that produced multiple significant signals with glucose levels 2 hr after oral stimulation. Through joint multivariate modelling, JAM was able to formally rule out many SNPs, and for one gene, ADCY5, suggests that an additional SNP, which transpired to be more biologically plausible, should be followed up with equal priority to the reported index.
DOI
10.1002/gepi.21953
MESuSiE
PUBMED_LINK
FULL NAME
multi-ancestry sum of the single effects model
DESCRIPTION
MESuSiE relies on GWAS summary statistics from multiple ancestries, properly accounts for the LD structure of the local genomic region in multiple ancestries, and explicitly models both shared and ancestry-specific causal signals to accommodate causal effect size similarity as well as heterogeneity across ancestries. MESuSiE outputs posterior inclusion probability of variant being shared or ancestry-specific causal variants.
URL
KEYWORDS
multi-trait, fine-mapping
TITLE
MESuSiE enables scalable and powerful multi-ancestry fine-mapping of causal variants in genome-wide association studies.
Main citation
Gao B, Zhou X. (2024) MESuSiE enables scalable and powerful multi-ancestry fine-mapping of causal variants in genome-wide association studies. Nat Genet, 56 (1) 170-179. doi:10.1038/s41588-023-01604-7. PMID 38168930
ABSTRACT
Fine-mapping in genome-wide association studies attempts to identify causal SNPs from a set of candidate SNPs in a local genomic region of interest and is commonly performed in one genetic ancestry at a time. Here, we present multi-ancestry sum of the single effects model (MESuSiE), a probabilistic multi-ancestry fine-mapping method, to improve the accuracy and resolution of fine-mapping by leveraging association information across ancestries. MESuSiE uses summary statistics as input, accounts for the diverse linkage disequilibrium pattern observed in different ancestries, explicitly models both shared and ancestry-specific causal SNPs, and relies on a variational inference algorithm for scalable computation. We evaluated the performance of MESuSiE through comprehensive simulations and multi-ancestry fine-mapping of four lipid traits with both European and African samples. In the real data, MESuSiE improves fine-mapping resolution by 19.0% to 72.0% compared to existing approaches, is an order of magnitude faster, and captures and categorizes shared and ancestry-specific causal signals with enhanced functional enrichment.
DOI
10.1038/s41588-023-01604-7
MR-MEGA
PUBMED_LINK
FULL NAME
Meta-Regression of Multi-AncEstry Genetic Association
DESCRIPTION
MR-MEGA (Meta-Regression of Multi-AncEstry Genetic Association) is a tool to detect and fine-map complex trait association signals via multi-ancestry meta-regression. This approach uses genome-wide metrics of diversity between populations to derive axes of genetic variation via multi-dimensional scaling [Purcell 2007]. Allelic effects of a variant across GWAS, weighted by their corresponding standard errors, can then be modelled in a linear regression framework, including the axes of genetic variation as covariates. The flexibility of this model enables partitioning of the heterogeneity into components due to ancestry and residual variation, which would be expected to improve fine-mapping resolution.
URL
KEYWORDS
Multi-AncEstry
TITLE
Trans-ethnic meta-regression of genome-wide association studies accounting for ancestry increases power for discovery and improves fine-mapping resolution.
Main citation
Mägi R, Horikoshi M, Sofer T, Mahajan A, ...&, Morris AP. (2017) Trans-ethnic meta-regression of genome-wide association studies accounting for ancestry increases power for discovery and improves fine-mapping resolution. Hum Mol Genet, 26 (18) 3639-3650. doi:10.1093/hmg/ddx280. PMID 28911207
ABSTRACT
Trans-ethnic meta-analysis of genome-wide association studies (GWAS) across diverse populations can increase power to detect complex trait loci when the underlying causal variants are shared between ancestry groups. However, heterogeneity in allelic effects between GWAS at these loci can occur that is correlated with ancestry. Here, a novel approach is presented to detect SNP association and quantify the extent of heterogeneity in allelic effects that is correlated with ancestry. We employ trans-ethnic meta-regression to model allelic effects as a function of axes of genetic variation, derived from a matrix of mean pairwise allele frequency differences between GWAS, and implemented in the MR-MEGA software. Through detailed simulations, we demonstrate increased power to detect association for MR-MEGA over fixed- and random-effects meta-analysis across a range of scenarios of heterogeneity in allelic effects between ethnic groups. We also demonstrate improved fine-mapping resolution, in loci containing a single causal variant, compared to these meta-analysis approaches and PAINTOR, and equivalent performance to MANTRA at reduced computational cost. Application of MR-MEGA to trans-ethnic GWAS of kidney function in 71,461 individuals indicates stronger signals of association than fixed-effects meta-analysis when heterogeneity in allelic effects is correlated with ancestry. Application of MR-MEGA to fine-mapping four type 2 diabetes susceptibility loci in 22,086 cases and 42,539 controls highlights: (i) strong evidence for heterogeneity in allelic effects that is correlated with ancestry only at the index SNP for the association signal at the CDKAL1 locus; and (ii) 99% credible sets with six or fewer variants for five distinct association signals.
DOI
10.1093/hmg/ddx280
MsCAVIAR
PUBMED_LINK
FULL NAME
multiple study causal variants identification in associated regions
DESCRIPTION
MsCAVIAR is a method for fine-mapping (identifying causal variants among GWAS associated variants) by leveraging information from multiple studies. One important application area is trans-ethnic fine mapping.
URL
KEYWORDS
multi-study finemapping
TITLE
Identifying causal variants by fine mapping across multiple studies.
Main citation
LaPierre N, Taraszka K, Huang H, He R, ...&, Eskin E. (2021) Identifying causal variants by fine mapping across multiple studies. PLoS Genet, 17 (9) e1009733. doi:10.1371/journal.pgen.1009733. PMID 34543273
ABSTRACT
Increasingly large Genome-Wide Association Studies (GWAS) have yielded numerous variants associated with many complex traits, motivating the development of "fine mapping" methods to identify which of the associated variants are causal. Additionally, GWAS of the same trait for different populations are increasingly available, raising the possibility of refining fine mapping results further by leveraging different linkage disequilibrium (LD) structures across studies. Here, we introduce multiple study causal variants identification in associated regions (MsCAVIAR), a method that extends the popular CAVIAR fine mapping framework to a multiple study setting using a random effects model. MsCAVIAR only requires summary statistics and LD as input, accounts for uncertainty in association statistics using a multivariate normal model, allows for multiple causal variants at a locus, and explicitly models the possibility of different SNP effect sizes in different populations. We demonstrate the efficacy of MsCAVIAR in both a simulation study and a trans-ethnic, trans-biobank fine mapping analysis of High Density Lipoprotein (HDL).
DOI
10.1371/journal.pgen.1009733
MultiSuSiE
PUBMED_LINK
DESCRIPTION
MultiSuSiE is a multi-ancestry SuSiE-style fine-mapping framework that allows causal effect sizes to differ across ancestries, improving credible sets in diverse whole-genome sequencing cohorts such as All of Us.
URL
KEYWORDS
cross-ancestry, fine-mapping
TITLE
MultiSuSiE improves multi-ancestry fine-mapping in All of Us whole-genome sequencing data.
Main citation
Rossen J, Shi H, Strober BJ, Zhang MJ, ...&, Price AL. (2026) MultiSuSiE improves multi-ancestry fine-mapping in All of Us whole-genome sequencing data. Nat Genet, 58 (1) 67-76. doi:10.1038/s41588-025-02450-5. PMID 41491094
ABSTRACT
Leveraging multi-ancestry data can improve fine-mapping power. We propose MultiSuSiE, an extension of Sum of Single Effects (SuSiE), to multiple ancestries that allows causal effect sizes to vary across ancestries. We evaluated MultiSuSiE using whole-genome sequencing data from 47,000 African-ancestry, 36,000 Latino-ancestry and 116,000 European-ancestry individuals from All of Us. In simulations, MultiSuSiE applied to Afr36k + Lat36k + Eur36k was well-calibrated and attained higher power than SuSiE applied to Eur109k; compared to recent multi-ancestry methods (SuSiEx and MESuSiE), MultiSuSiE attained higher power and lower computational cost. In analyses of 14 quantitative traits, MultiSuSiE applied to Afr47k + Lat36k + Eur116k identified 348 fine-mapped variants with posterior inclusion probability (PIP) > 0.9, and MultiSuSiE applied to Afr36k + Lat36k + Eur36k identified 59% more PIP > 0.9 variants than SuSiE applied to Eur109k; MultiSuSiE identified 29% more PIP > 0.9 variants than SuSiEx, and MESuSiE was not included due to its high computational cost. We validated these findings through functional enrichment of fine-mapped variants and highlighted examples implicating biologically plausible fine-mapped variants.
DOI
10.1038/s41588-025-02450-5
PAINTOR
PUBMED_LINK
FULL NAME
Probabilistic Annotation INtegraTOR
DESCRIPTION
Finding causal variants that underlie known risk loci is one of the main post-GWAS challenges. Here we present PAINTOR (Probabilistic Annotation INtegraTOR), a probabilistic framework that integrates association strength with genomic functional annotation data to improve accuracy in selecting plausible causal variants for functional validation. The main output of PAINTOR are probabilities for every variant to be causal that can be used for prioritization in functional assays to establish biological causality.
URL
KEYWORDS
Empirical Bayes prior
TITLE
Integrating functional data to prioritize causal variants in statistical fine-mapping studies.
Main citation
Kichaev G, Yang WY, Lindstrom S, Hormozdiari F, ...&, Pasaniuc B. (2014) Integrating functional data to prioritize causal variants in statistical fine-mapping studies. PLoS Genet, 10 (10) e1004722. doi:10.1371/journal.pgen.1004722. PMID 25357204
ABSTRACT
Standard statistical approaches for prioritization of variants for functional testing in fine-mapping studies either use marginal association statistics or estimate posterior probabilities for variants to be causal under simplifying assumptions. Here, we present a probabilistic framework that integrates association strength with functional genomic annotation data to improve accuracy in selecting plausible causal variants for functional validation. A key feature of our approach is that it empirically estimates the contribution of each functional annotation to the trait of interest directly from summary association statistics while allowing for multiple causal variants at any risk locus. We devise efficient algorithms that estimate the parameters of our model across all risk loci to further increase performance. Using simulations starting from the 1000 Genomes data, we find that our framework consistently outperforms the current state-of-the-art fine-mapping methods, reducing the number of variants that need to be selected to capture 90% of the causal variants from an average of 13.3 to 10.4 SNPs per locus (as compared to the next-best performing strategy). Furthermore, we introduce a cost-to-benefit optimization framework for determining the number of variants to be followed up in functional assays and assess its performance using real and simulation data. We validate our findings using a large scale meta-analysis of four blood lipids traits and find that the relative probability for causality is increased for variants in exons and transcription start sites and decreased in repressed genomic regions at the risk loci of these traits. Using these highly predictive, trait-specific functional annotations, we estimate causality probabilities across all traits and variants, reducing the size of the 90% confidence set from an average of 17.5 to 13.5 variants per locus in this data.
DOI
10.1371/journal.pgen.1004722
RFR SuSiE-inf FINEMAP-inf (RFR)
PUBMED_LINK
FULL NAME
Replication Failure Rate
DESCRIPTION
Replication Failure Rate (RFR), a metric to assess the consistency of fine-mapping results based on downsampling a large cohort. SuSiE-inf and FINEMAP-inf, that extend SuSiE and FINEMAP to incorporate a term for infinitesimal effects in addition to a small number of larger causal effects of interest.
URL
TITLE
Improving fine-mapping by modeling infinitesimal effects.
Main citation
Cui R, Elzur RA, Kanai M, Ulirsch JC, ...&, Finucane HK. (2024) Improving fine-mapping by modeling infinitesimal effects. Nat Genet, 56 (1) 162-169. doi:10.1038/s41588-023-01597-3. PMID 38036779
ABSTRACT
Fine-mapping aims to identify causal genetic variants for phenotypes. Bayesian fine-mapping algorithms (for example, SuSiE, FINEMAP, ABF and COJO-ABF) are widely used, but assessing posterior probability calibration remains challenging in real data, where model misspecification probably exists, and true causal variants are unknown. We introduce replication failure rate (RFR), a metric to assess fine-mapping consistency by downsampling. SuSiE, FINEMAP and COJO-ABF show high RFR, indicating potential overconfidence in their output. Simulations reveal that nonsparse genetic architecture can lead to miscalibration, while imputation noise, nonuniform distribution of causal variants and quality control filters have minimal impact. Here we present SuSiE-inf and FINEMAP-inf, fine-mapping methods modeling infinitesimal effects alongside fewer larger causal effects. Our methods show improved calibration, RFR and functional enrichment, competitive recall and computational efficiency. Notably, using our methods' posterior effect sizes substantially increases polygenic risk score accuracy over SuSiE and FINEMAP. Our work improves causal variant identification for complex traits, a fundamental goal of human genetics.
DOI
10.1038/s41588-023-01597-3
SUSIE
PUBMED_LINK
FULL NAME
sum of single effects
DESCRIPTION
The susieR package implements a simple new way to perform variable selection in multiple regression (y = Xb + e). The methods implemented here are particularly well-suited to settings where some of the X variables are highly correlated, and the true effects are highly sparse (e.g. <20 non-zero effects in the vector b). One example of this is genetic fine-mapping applications, and this application was a major motivation for developing these methods.
URL
KEYWORDS
fine-mapping, sum of single-effects (SuSiE) regression, iterative Bayesian stepwise selection (IBSS)
TITLE
A simple new approach to variable selection in regression, with application to genetic fine mapping.
Main citation
Wang G, Sarkar A, Carbonetto P, Stephens M. (2020) A simple new approach to variable selection in regression, with application to genetic fine mapping. J R Stat Soc Series B Stat Methodol, 82 (5) 1273-1300. doi:10.1111/rssb.12388. PMID 37220626
ABSTRACT
We introduce a simple new approach to variable selection in linear regression, with a particular focus on quantifying uncertainty in which variables should be selected. The approach is based on a new model - the "Sum of Single Effects" (SuSiE) model - which comes from writing the sparse vector of regression coefficients as a sum of "single-effect" vectors, each with one non-zero element. We also introduce a corresponding new fitting procedure - Iterative Bayesian Stepwise Selection (IBSS) - which is a Bayesian analogue of stepwise selection methods. IBSS shares the computational simplicity and speed of traditional stepwise methods, but instead of selecting a single variable at each step, IBSS computes a distribution on variables that captures uncertainty in which variable to select. We provide a formal justification of this intuitive algorithm by showing that it optimizes a variational approximation to the posterior distribution under the SuSiE model. Further, this approximate posterior distribution naturally yields convenient novel summaries of uncertainty in variable selection, providing a Credible Set of variables for each selection. Our methods are particularly well-suited to settings where variables are highly correlated and detectable effects are sparse, both of which are characteristics of genetic fine-mapping applications. We demonstrate through numerical experiments that our methods outperform existing methods for this task, and illustrate their application to fine-mapping genetic variants influencing alternative splicing in human cell-lines. We also discuss the potential and challenges for applying these methods to generic variable selection problems.
DOI
10.1111/rssb.12388
SUSIE-RSS
PUBMED_LINK
FULL NAME
sum of single effects regression with summary statistics
DESCRIPTION
The susieR package implements a simple new way to perform variable selection in multiple regression (y = Xb + e). The methods implemented here are particularly well-suited to settings where some of the X variables are highly correlated, and the true effects are highly sparse (e.g. <20 non-zero effects in the vector b). One example of this is genetic fine-mapping applications, and this application was a major motivation for developing these methods.
URL
KEYWORDS
fine-mapping, summary statistics
TITLE
Fine-mapping from summary data with the "Sum of Single Effects" model.
Main citation
Zou Y, Carbonetto P, Wang G, Stephens M. (2022) Fine-mapping from summary data with the "Sum of Single Effects" model. PLoS Genet, 18 (7) e1010299. doi:10.1371/journal.pgen.1010299. PMID 35853082
ABSTRACT
In recent work, Wang et al introduced the "Sum of Single Effects" (SuSiE) model, and showed that it provides a simple and efficient approach to fine-mapping genetic variants from individual-level data. Here we present new methods for fitting the SuSiE model to summary data, for example to single-SNP z-scores from an association study and linkage disequilibrium (LD) values estimated from a suitable reference panel. To develop these new methods, we first describe a simple, generic strategy for extending any individual-level data method to deal with summary data. The key idea is to replace the usual regression likelihood with an analogous likelihood based on summary data. We show that existing fine-mapping methods such as FINEMAP and CAVIAR also (implicitly) use this strategy, but in different ways, and so this provides a common framework for understanding different methods for fine-mapping. We investigate other common practical issues in fine-mapping with summary data, including problems caused by inconsistencies between the z-scores and LD estimates, and we develop diagnostics to identify these inconsistencies. We also present a new refinement procedure that improves model fits in some data sets, and hence improves overall reliability of the SuSiE fine-mapping results. Detailed evaluations of fine-mapping methods in a range of simulated data sets show that SuSiE applied to summary data is competitive, in both speed and accuracy, with the best available fine-mapping methods for summary data.
DOI
10.1371/journal.pgen.1010299
SUSIEx
DESCRIPTION
SuSiEx is a Python based command line tool that performs cross-ethnic fine-mapping using GWAS summary statistics and LD reference panels. The method is built on the Sum of Single Effects (SuSiE) model.
URL
KEYWORDS
cross-ancestry, fine-mapping
Main citation
Yuan, K., Longchamps, R. J., Pardiñas, A. F., Yu, M., Chen, T. T., Lin, S. C., ... & Schizophrenia Workgroup of Psychiatric Genomics Consortium. (2023). Fine-mapping across diverse ancestries drives the discovery of putative causal variants underlying human complex traits and diseases. medRxiv.
SparsePro
PUBMED_LINK
DESCRIPTION
SparsePro is a command line tool for efficiently conducting genome-wide fine-mapping. Our method has two key features: First, by creating a sparse low-dimensional projection of the high-dimensional genotype, we enable a linear search of causal variants instead of an exponential search of causal configurations in most existing methods; Second, we adopt a probabilistic framework with a highly efficient variational expectation-maximization algorithm to integrate statistical associations and functional priors.
URL
TITLE
SparsePro: An efficient fine-mapping method integrating summary statistics and functional annotations.
Main citation
Zhang W, Najafabadi H, Li Y. (2023) SparsePro: An efficient fine-mapping method integrating summary statistics and functional annotations. PLoS Genet, 19 (12) e1011104. doi:10.1371/journal.pgen.1011104. PMID 38153934
ABSTRACT
Identifying causal variants from genome-wide association studies (GWAS) is challenging due to widespread linkage disequilibrium (LD) and the possible existence of multiple causal variants in the same genomic locus. Functional annotations of the genome may help to prioritize variants that are biologically relevant and thus improve fine-mapping of GWAS results. Classical fine-mapping methods conducting an exhaustive search of variant-level causal configurations have a high computational cost, especially when the underlying genetic architecture and LD patterns are complex. SuSiE provided an iterative Bayesian stepwise selection algorithm for efficient fine-mapping. In this work, we build connections between SuSiE and a paired mean field variational inference algorithm through the implementation of a sparse projection, and propose effective strategies for estimating hyperparameters and summarizing posterior probabilities. Moreover, we incorporate functional annotations into fine-mapping by jointly estimating enrichment weights to derive functionally-informed priors. We evaluate the performance of SparsePro through extensive simulations using resources from the UK Biobank. Compared to state-of-the-art methods, SparsePro achieved improved power for fine-mapping with reduced computation time. We demonstrate the utility of SparsePro through fine-mapping of five functional biomarkers of clinically relevant phenotypes. In summary, we have developed an efficient fine-mapping method for integrating summary statistics and functional annotations. Our method can have wide utility in understanding the genetics of complex traits and increasing the yield of functional follow-up studies of GWAS. SparsePro software is available on GitHub at https://github.com/zhwm/SparsePro.
DOI
10.1371/journal.pgen.1011104
flashfmZero
PUBMED_LINK
DESCRIPTION
flashfmZero performs zero-correlation latent-factor-based multi-trait fine-mapping from GWAS summary statistics for high-dimensional trait panels (e.g., blood cell counts). Latent-factor GWAS can surface signals below univariate thresholds; in INTERVAL blood-cell analyses, 99% credible sets were at least as small as univariate fine-mapping in most comparisons and were nested within univariate latent-factor credible sets.
URL
KEYWORDS
latent factor, multi-trait, fine-mapping, GWAS summary statistics, high-dimensional traits
TITLE
Improved genetic discovery and fine-mapping resolution through multivariate latent factor analysis of high-dimensional traits.
Main citation
Zhou F, Astle WJ, Butterworth AS, Asimit JL. (2025) Improved genetic discovery and fine-mapping resolution through multivariate latent factor analysis of high-dimensional traits. Cell Genom, 5 (5) 100847. doi:10.1016/j.xgen.2025.100847. PMID 40220762
ABSTRACT
Genome-wide association studies (GWASs) of high-dimensional traits, such as blood cell or metabolic traits, often use univariate approaches, ignoring trait relationships. Biological mechanisms generating variation in high-dimensional traits can be captured parsimoniously through a GWAS of latent factors. Here, we introduce flashfmZero, a zero-correlation latent-factor-based multi-trait fine-mapping approach. In an application to 25 latent factors derived from 99 blood cell traits in the INTERVAL cohort, we show that latent factor GWASs enable the detection of signals generating sub-threshold associations with several blood cell traits. The 99% credible sets (CS99) from flashfmZero were equal to or smaller in size than those from univariate fine-mapping of blood cell traits in 87% of our comparisons. In all cases univariate latent factor CS99 contained those from flashfmZero. Our latent factor approaches can be applied to GWAS summary statistics and will enhance power for the discovery and fine-mapping of associations for many traits.
DOI
10.1016/j.xgen.2025.100847
mJAM
FULL NAME
multi-population JAM
URL
KEYWORDS
multi-population
Main citation
Shen, J., Jiang, L., Wang, K., Wang, A., Chen, F., Newcombe, P. J., ... & Conti, D. V. (2022). Fine-mapping and credible set construction using a multi-population joint analysis of marginal summary statistics from genome-wide association studies. bioRxiv, 2022-12.
mvSuSiE
PUBMED_LINK
DESCRIPTION
mvSuSiE extends the Sum of Single Effects (SuSiE) model to joint fine-mapping of multiple traits, improving power and resolution relative to separate single-trait analyses while remaining computationally practical.
URL
KEYWORDS
multi-trait, fine-mapping
TITLE
Fast and flexible joint fine-mapping of multiple traits via the Sum of Single Effects model.
Main citation
Zou Y, Carbonetto P, Xie D, Wang G, ...&, Stephens M. (2026) Fast and flexible joint fine-mapping of multiple traits via the Sum of Single Effects model. Nat Genet, 58 (2) 454-462. doi:10.1038/s41588-025-02486-7. PMID 41634413
ABSTRACT
We introduce mvSuSiE, a multitrait fine-mapping method, to identify putative causal variants from genetic association data (individual-level or summary). mvSuSiE learns patterns of shared genetic effects from data, and exploits these patterns to improve power to identify causal single nucleotide polymorphisms (SNPs). Comparisons on simulated data show that mvSuSiE is competitive in speed, power and precision with existing multitrait methods, and uniformly improves over single-trait fine-mapping (Sum of Single Effects) performed separately for each trait. We applied mvSuSiE to jointly fine-map 16 blood cell traits using data from the UK Biobank. By jointly analyzing traits and modeling heterogeneous effect-sharing patterns, we identified a substantially larger number of causal SNPs (>3,000) than single-trait fine-mapping and achieved narrower credible sets. mvSuSiE also more comprehensively characterized how genetic variants affect blood cell traits; 68% of causal SNPs showed significant effects across more than one blood cell type.
DOI
10.1038/s41588-025-02486-7