Tools Meta and Multi triat
Curation of Meta and Multi triat — listings under the GWAS Tools tab.
Summary Table
Click a column header to sort the table.
| NAME | CATEGORY | Main citation | YEAR |
|---|---|---|---|
| REMETA | Gene-based | Joseph TA et al., Nat Genet, 2025 |
2025 |
| GWAMA | Meta-analysis | Mägi R et al., BMC Bioinformatics, 2010 |
2010 |
| MANTRA | Meta-analysis | Morris AP, Genet Epidemiol, 2011 |
2011 |
| METAL | Meta-analysis | Willer CJ et al., Bioinformatics, 2010 |
2010 |
| MR-MEGA | Meta-analysis | Mägi R et al., Hum Mol Genet, 2017 |
2017 |
| ASSET | Multi-trait | Bhattacharjee S et al., Am J Hum Genet, 2012 |
2012 |
| FactorGO | Multi-trait | Zhang Z et al., Am J Hum Genet, 2023 |
2023 |
| GLEANR | Multi-trait | Omdahl AR et al., Am J Hum Genet, 2025 |
2025 |
| Galesloot | Multi-trait | Galesloot TE et al., PLoS One, 2014 |
2014 |
| Genomic-SEM | Multi-trait | Grotzinger AD et al., Nat Hum Behav, 2019 |
2019 |
| HIPO | Multi-trait | Qi G et al., PLoS Genet, 2018 |
2018 |
| JASS | Multi-trait | Julienne H et al., NAR Genom Bioinform, 2020 |
2020 |
| LCP-GWAS | Multi-trait | Ruotsalainen SE et al., Eur J Hum Genet, 2021 |
2021 |
| MANOVA | Multi-trait | 1955 |
1955 |
| MOSTest | Multi-trait | van der Meer D et al., Nat Commun, 2020 |
2020 |
| MTAG | Multi-trait | Turley P et al., Nat Genet, 2018 |
2018 |
| MV-PLINK (MQFAM) | Multi-trait | Ferreira MA et al., Bioinformatics, 2009 |
2009 |
| MultiPhen | Multi-trait | O'Reilly PF et al., PLoS One, 2012 |
2012 |
| PCHAT | Multi-trait | Klei L et al., Genet Epidemiol, 2008 |
2008 |
| Porter | Multi-trait | Porter HF et al., Sci Rep, 2017 |
2017 |
| Salinas | Multi-trait | Salinas YD et al., Am J Epidemiol, 2018 |
2018 |
| Stephens | Multi-trait | Stephens M, PLoS One, 2013 |
2013 |
| TATES | Multi-trait | van der Sluis S et al., PLoS Genet, 2013 |
2013 |
| Yang | Multi-trait | Yang Q et al., J Probab Stat, 2012 |
2012 |
| aMAT | Multi-trait | Wu C, Genetics, 2020 |
2020 |
| condFDR | Multi-trait | Andreassen OA et al., PLoS Genet, 2013 |
2013 |
| fastASSET | Multi-trait | Qi G et al., Nat Commun, 2024 |
2024 |
| metaCCA | Multi-trait | Cichonska A et al., Bioinformatics, 2016 |
2016 |
| metaUSAT/metaMANOVA | Multi-trait | Ray D et al., Genet Epidemiol, 2018 |
2018 |
| mvGWAMA | Multi-trait | Jansen IE et al., Nat Genet, 2019 |
2019 |
| Meta-SAIGE | Rare-variant | Park E et al., Nat Genet, 2025 |
2025 |
| MetaSKAT | Rare-variant | Lee S et al., Am J Hum Genet, 2013 |
2013 |
| MetaSTAAR | Rare-variant | Li X et al., Nat Genet, 2023 |
2023 |
| RareMETAL | Rare-variant | Feng S et al., Bioinformatics, 2014 |
2014 |
| SMMAT | Rare-variant | Chen H et al., Am J Hum Genet, 2019 |
2019 |
Gene-based
REMETA
PUBMED_LINK
DESCRIPTION
REMETA is a computationally efficient C++ toolkit for meta-analysis of gene-based association tests using single-variant summary statistics from REGENIE-style pipelines, including burden and variance-component tests, with sparse per-study LD references rescaled per phenotype.
URL
KEYWORDS
gene-based test, meta-analysis, summary statistics, REGENIE, burden, SKAT-O
TITLE
Computationally efficient meta-analysis of gene-based tests using summary statistics in large-scale genetic studies.
Main citation
Joseph TA, Mbatchou J, Ghosh A, Marcketta A, ...&, Marchini J. (2025) Computationally efficient meta-analysis of gene-based tests using summary statistics in large-scale genetic studies. Nat Genet, 57 (12) 3193-3200. doi:10.1038/s41588-025-02390-0. PMID 41225158
ABSTRACT
Meta-analysis of gene-based tests using single-variant summary statistics is a powerful strategy for genetic association studies. However, current approaches require sharing the covariance matrix between variants for each study and trait of interest. For large-scale studies with many phenotypes, these matrices can be cumbersome to calculate, store and share. Here, to address this challenge, we present REMETA-an efficient tool for meta-analysis of gene-based tests. REMETA uses a single sparse covariance reference file per study that is rescaled for each phenotype using single-variant summary statistics. We develop new methods for binary traits with case-control imbalance, and to estimate allele frequencies, genotype counts and effect sizes of burden tests. We demonstrate the performance and advantages of our approach through meta-analysis of five traits in 469,376 samples in UK Biobank. The open-source REMETA software will facilitate meta-analysis across large-scale exome sequencing studies from diverse studies that cannot easily be combined.
DOI
10.1038/s41588-025-02390-0
Meta-analysis
GWAMA
PUBMED_LINK
FULL NAME
Genome-Wide Association Meta-Analysis
DESCRIPTION
Software tool for meta analysis of whole genome association data
URL
TITLE
GWAMA: software for genome-wide association meta-analysis.
Main citation
Mägi R, Morris AP. (2010) GWAMA: software for genome-wide association meta-analysis. BMC Bioinformatics, 11 () 288. doi:10.1186/1471-2105-11-288. PMID 20509871
ABSTRACT
BACKGROUND: Despite the recent success of genome-wide association studies in identifying novel loci contributing effects to complex human traits, such as type 2 diabetes and obesity, much of the genetic component of variation in these phenotypes remains unexplained. One way to improving power to detect further novel loci is through meta-analysis of studies from the same population, increasing the sample size over any individual study. Although statistical software analysis packages incorporate routines for meta-analysis, they are ill equipped to meet the challenges of the scale and complexity of data generated in genome-wide association studies. RESULTS: We have developed flexible, open-source software for the meta-analysis of genome-wide association studies. The software incorporates a variety of error trapping facilities, and provides a range of meta-analysis summary statistics. The software is distributed with scripts that allow simple formatting of files containing the results of each association study and generate graphical summaries of genome-wide meta-analysis results. CONCLUSIONS: The GWAMA (Genome-Wide Association Meta-Analysis) software has been developed to perform meta-analysis of summary statistics generated from genome-wide association studies of dichotomous phenotypes or quantitative traits. Software with source files, documentation and example data files are freely available online at http://www.well.ox.ac.uk/GWAMA.
DOI
10.1186/1471-2105-11-288
MANTRA
PUBMED_LINK
FULL NAME
Meta-ANalysis of Transethnic Association studies
KEYWORDS
cross-population
TITLE
Transethnic meta-analysis of genomewide association studies.
Main citation
Morris AP. (2011) Transethnic meta-analysis of genomewide association studies. Genet Epidemiol, 35 (8) 809-22. doi:10.1002/gepi.20630. PMID 22125221
ABSTRACT
The detection of loci contributing effects to complex human traits, and their subsequent fine-mapping for the location of causal variants, remains a considerable challenge for the genetics research community. Meta-analyses of genomewide association studies, primarily ascertained from European-descent populations, have made considerable advances in our understanding of complex trait genetics, although much of their heritability is still unexplained. With the increasing availability of genomewide association data from diverse populations, transethnic meta-analysis may offer an exciting opportunity to increase the power to detect novel complex trait loci and to improve the resolution of fine-mapping of causal variants by leveraging differences in local linkage disequilibrium structure between ethnic groups. However, we might also expect there to be substantial genetic heterogeneity between diverse populations, both in terms of the spectrum of causal variants and their allelic effects, which cannot easily be accommodated through traditional approaches to meta-analysis. In order to address this challenge, I propose novel transethnic meta-analysis methodology that takes account of the expected similarity in allelic effects between the most closely related populations, while allowing for heterogeneity between more diverse ethnic groups. This approach yields substantial improvements in performance, compared to fixed-effects meta-analysis, both in terms of power to detect association, and localization of the causal variant, over a range of models of heterogeneity between ethnic groups. Furthermore, when the similarity in allelic effects between populations is well captured by their relatedness, this approach has increased power and mapping resolution over random-effects meta-analysis.
DOI
10.1002/gepi.20630
METAL
PUBMED_LINK
DESCRIPTION
METAL is a tool for meta-analysis genomewide association scans. METAL can combine either (a) test statistics and standard errors or (b) p-values across studies (taking sample size and direction of effect into account). METAL analysis is a convenient alternative to a direct analysis of merged data from multiple studies. It is especially appropriate when data from the individual studies cannot be analyzed together because of differences in ethnicity, phenotype distribution, gender or constraints in sharing of individual level data imposed. Meta-analysis results in little or no loss of efficiency compared to analysis of a combined dataset including data from all individual studies.
URL
TITLE
METAL: fast and efficient meta-analysis of genomewide association scans.
Main citation
Willer CJ, Li Y, Abecasis GR. (2010) METAL: fast and efficient meta-analysis of genomewide association scans. Bioinformatics, 26 (17) 2190-1. doi:10.1093/bioinformatics/btq340. PMID 20616382
ABSTRACT
SUMMARY: METAL provides a computationally efficient tool for meta-analysis of genome-wide association scans, which is a commonly used approach for improving power complex traits gene mapping studies. METAL provides a rich scripting interface and implements efficient memory management to allow analyses of very large data sets and to support a variety of input file formats. AVAILABILITY AND IMPLEMENTATION: METAL, including source code, documentation, examples, and executables, is available at http://www.sph.umich.edu/csg/abecasis/metal/.
DOI
10.1093/bioinformatics/btq340
MR-MEGA
PUBMED_LINK
FULL NAME
Meta-Regression of Multi-AncEstry Genetic Association
DESCRIPTION
MR-MEGA (Meta-Regression of Multi-AncEstry Genetic Association) is a tool to detect and fine-map complex trait association signals via multi-ancestry meta-regression. This approach uses genome-wide metrics of diversity between populations to derive axes of genetic variation via multi-dimensional scaling [Purcell 2007]. Allelic effects of a variant across GWAS, weighted by their corresponding standard errors, can then be modelled in a linear regression framework, including the axes of genetic variation as covariates. The flexibility of this model enables partitioning of the heterogeneity into components due to ancestry and residual variation, which would be expected to improve fine-mapping resolution.
URL
KEYWORDS
cross-population, Meta-Regression
TITLE
Trans-ethnic meta-regression of genome-wide association studies accounting for ancestry increases power for discovery and improves fine-mapping resolution.
Main citation
Mägi R, Horikoshi M, Sofer T, Mahajan A, ...&, Morris AP. (2017) Trans-ethnic meta-regression of genome-wide association studies accounting for ancestry increases power for discovery and improves fine-mapping resolution. Hum Mol Genet, 26 (18) 3639-3650. doi:10.1093/hmg/ddx280. PMID 28911207
ABSTRACT
Trans-ethnic meta-analysis of genome-wide association studies (GWAS) across diverse populations can increase power to detect complex trait loci when the underlying causal variants are shared between ancestry groups. However, heterogeneity in allelic effects between GWAS at these loci can occur that is correlated with ancestry. Here, a novel approach is presented to detect SNP association and quantify the extent of heterogeneity in allelic effects that is correlated with ancestry. We employ trans-ethnic meta-regression to model allelic effects as a function of axes of genetic variation, derived from a matrix of mean pairwise allele frequency differences between GWAS, and implemented in the MR-MEGA software. Through detailed simulations, we demonstrate increased power to detect association for MR-MEGA over fixed- and random-effects meta-analysis across a range of scenarios of heterogeneity in allelic effects between ethnic groups. We also demonstrate improved fine-mapping resolution, in loci containing a single causal variant, compared to these meta-analysis approaches and PAINTOR, and equivalent performance to MANTRA at reduced computational cost. Application of MR-MEGA to trans-ethnic GWAS of kidney function in 71,461 individuals indicates stronger signals of association than fixed-effects meta-analysis when heterogeneity in allelic effects is correlated with ancestry. Application of MR-MEGA to fine-mapping four type 2 diabetes susceptibility loci in 22,086 cases and 42,539 controls highlights: (i) strong evidence for heterogeneity in allelic effects that is correlated with ancestry only at the index SNP for the association signal at the CDKAL1 locus; and (ii) 99% credible sets with six or fewer variants for five distinct association signals.
DOI
10.1093/hmg/ddx280
Multi-trait
ASSET
PUBMED_LINK
FULL NAME
association analysis based on subsets
URL
TITLE
A subset-based approach improves power and interpretation for the combined analysis of genetic association studies of heterogeneous traits.
Main citation
Bhattacharjee S, Rajaraman P, Jacobs KB, Wheeler WA, ...&, Chatterjee N. (2012) A subset-based approach improves power and interpretation for the combined analysis of genetic association studies of heterogeneous traits. Am J Hum Genet, 90 (5) 821-35. doi:10.1016/j.ajhg.2012.03.015. PMID 22560090
ABSTRACT
Pooling genome-wide association studies (GWASs) increases power but also poses methodological challenges because studies are often heterogeneous. For example, combining GWASs of related but distinct traits can provide promising directions for the discovery of loci with small but common pleiotropic effects. Classical approaches for meta-analysis or pooled analysis, however, might not be suitable for such analysis because individual variants are likely to be associated with only a subset of the traits or might demonstrate effects in different directions. We propose a method that exhaustively explores subsets of studies for the presence of true association signals that are in either the same direction or possibly opposite directions. An efficient approximation is used for rapid evaluation of p values. We present two illustrative applications, one for a meta-analysis of separate case-control studies of six distinct cancers and another for pooled analysis of a case-control study of glioma, a class of brain tumors that contains heterogeneous subtypes. Both the applications and additional simulation studies demonstrate that the proposed methods offer improved power and more interpretable results when compared to traditional methods for the analysis of heterogeneous traits. The proposed framework has applications beyond genetic association studies.
DOI
10.1016/j.ajhg.2012.03.015
FactorGO
PUBMED_LINK
FULL NAME
Factor analysis model in Genetic assOciation
DESCRIPTION
FactorGo is a scalable variational factor analysis model that learns pleiotropic factors using GWAS summary statistics.
URL
KEYWORDS
pleiotropy, factor analysis
TITLE
A scalable approach to characterize pleiotropy across thousands of human diseases and complex traits using GWAS summary statistics.
Main citation
Zhang Z, Jung J, Kim A, Suboc N, ...&, Mancuso N. (2023) A scalable approach to characterize pleiotropy across thousands of human diseases and complex traits using GWAS summary statistics. Am J Hum Genet, 110 (11) 1863-1874. doi:10.1016/j.ajhg.2023.09.015. PMID 37879338
ABSTRACT
Genome-wide association studies (GWASs) across thousands of traits have revealed the pervasive pleiotropy of trait-associated genetic variants. While methods have been proposed to characterize pleiotropic components across groups of phenotypes, scaling these approaches to ultra-large-scale biobanks has been challenging. Here, we propose FactorGo, a scalable variational factor analysis model to identify and characterize pleiotropic components using biobank GWAS summary data. In extensive simulations, we observe that FactorGo outperforms the state-of-the-art (model-free) approach tSVD in capturing latent pleiotropic factors across phenotypes while maintaining a similar computational cost. We apply FactorGo to estimate 100 latent pleiotropic factors from GWAS summary data of 2,483 phenotypes measured in European-ancestry Pan-UK BioBank individuals (N = 420,531). Next, we find that factors from FactorGo are more enriched with relevant tissue-specific annotations than those identified by tSVD (p = 2.58E-10) and validate our approach by recapitulating brain-specific enrichment for BMI and the height-related connection between reproductive system and muscular-skeletal growth. Finally, our analyses suggest shared etiologies between rheumatoid arthritis and periodontal condition in addition to alkaline phosphatase as a candidate prognostic biomarker for prostate cancer. Overall, FactorGo improves our biological understanding of shared etiologies across thousands of GWASs.
DOI
10.1016/j.ajhg.2023.09.015
GLEANR
PUBMED_LINK
FULL NAME
GWAS latent embeddings accounting for noise and regularization
DESCRIPTION
GLEANER is a GWAS matrix factorization tool to estimate sparse latent pleiotropic genetic factors. Factors map traits to a distribution of SNP effects that may capture biological pathways or mechanisms shared by these traits.
URL
TITLE
Sparse matrix factorization robust to sample sharing across GWASs reveals interpretable genetic components.
Main citation
Omdahl AR, Weinstock JS, Keener R, Chhetri SB, ...&, Battle A. (2025) Sparse matrix factorization robust to sample sharing across GWASs reveals interpretable genetic components. Am J Hum Genet, 112 (9) 2178-2197. doi:10.1016/j.ajhg.2025.07.003. PMID 40730164
ABSTRACT
Complex trait-associated genetic variation is highly pleiotropic. This extensive pleiotropy implies that multi-phenotype analyses are informative for characterizing genetic associations, as they facilitate the discovery of trait-shared and trait-specific variants and pathways ("genetic factors"). Previous efforts have estimated genetic factors using matrix factorization (MF) applied to numerous genome-wide association studies (GWASs). However, existing methods are susceptible to spurious factors arising from residual confounding due to sample sharing in biobank GWASs. Furthermore, MF approaches have historically estimated dense factors, loaded on most traits and variants, that are challenging to map onto interpretable biological pathways. To address these shortcomings, we introduce "GWAS latent embeddings accounting for noise and regularization" (GLEANR), an MF method for detection of sparse genetic factors from summary statistics. GLEANR accounts for sample sharing between studies and uses regularization to estimate a data-driven number of interpretable factors. GLEANR is robust to confounding induced by shared samples and improves the replication of genetic factors derived from distinct biobanks. We used GLEANR to evaluate 137 diverse GWASs from the UK Biobank, identifying 58 factors that decompose the genetic architecture of input traits and have distinct signatures of negative selection and degrees of polygenicity. These sparse factors can be interpreted with respect to disease, cell type, and pathway enrichment. We highlight three such factors that captured platelet-measure phenotypes and were enriched for disease-relevant markers corresponding to distinct stages of platelet differentiation. Overall, GLEANR is a powerful tool for discovering both trait-specific and trait-shared pathways underlying complex traits from GWAS summary statistics.
DOI
10.1016/j.ajhg.2025.07.003
Galesloot
PUBMED_LINK
TITLE
A comparison of multivariate genome-wide association methods.
Main citation
Galesloot TE, van Steen K, Kiemeney LA, Janss LL, ...&, Vermeulen SH. (2014) A comparison of multivariate genome-wide association methods. PLoS One, 9 (4) e95923. doi:10.1371/journal.pone.0095923. PMID 24763738
ABSTRACT
Joint association analysis of multiple traits in a genome-wide association study (GWAS), i.e. a multivariate GWAS, offers several advantages over analyzing each trait in a separate GWAS. In this study we directly compared a number of multivariate GWAS methods using simulated data. We focused on six methods that are implemented in the software packages PLINK, SNPTEST, MultiPhen, BIMBAM, PCHAT and TATES, and also compared them to standard univariate GWAS, analysis of the first principal component of the traits, and meta-analysis of univariate results. We simulated data (N = 1000) for three quantitative traits and one bi-allelic quantitative trait locus (QTL), and varied the number of traits associated with the QTL (explained variance 0.1%), minor allele frequency of the QTL, residual correlation between the traits, and the sign of the correlation induced by the QTL relative to the residual correlation. We compared the power of the methods using empirically fixed significance thresholds (α = 0.05). Our results showed that the multivariate methods implemented in PLINK, SNPTEST, MultiPhen and BIMBAM performed best for the majority of the tested scenarios, with a notable increase in power for scenarios with an opposite sign of genetic and residual correlation. All multivariate analyses resulted in a higher power than univariate analyses, even when only one of the traits was associated with the QTL. Hence, use of multivariate GWAS methods can be recommended, even when genetic correlations between traits are weak.
DOI
10.1371/journal.pone.0095923
Genomic-SEM
PUBMED_LINK
FULL NAME
genomic structural equation modelling
DESCRIPTION
R-package which allows the user to fit structural equation models based on the summary statistics obtained from genome wide association studies (GWAS).
URL
KEYWORDS
SEM
TITLE
Genomic structural equation modelling provides insights into the multivariate genetic architecture of complex traits.
Main citation
Grotzinger AD, Rhemtulla M, de Vlaming R, Ritchie SJ, ...&, Tucker-Drob EM. (2019) Genomic structural equation modelling provides insights into the multivariate genetic architecture of complex traits. Nat Hum Behav, 3 (5) 513-525. doi:10.1038/s41562-019-0566-x. PMID 30962613
ABSTRACT
Genetic correlations estimated from genome-wide association studies (GWASs) reveal pervasive pleiotropy across a wide variety of phenotypes. We introduce genomic structural equation modelling (genomic SEM): a multivariate method for analysing the joint genetic architecture of complex traits. Genomic SEM synthesizes genetic correlations and single-nucleotide polymorphism heritabilities inferred from GWAS summary statistics of individual traits from samples with varying and unknown degrees of overlap. Genomic SEM can be used to model multivariate genetic associations among phenotypes, identify variants with effects on general dimensions of cross-trait liability, calculate more predictive polygenic scores and identify loci that cause divergence between traits. We demonstrate several applications of genomic SEM, including a joint analysis of summary statistics from five psychiatric traits. We identify 27 independent single-nucleotide polymorphisms not previously identified in the contributing univariate GWASs. Polygenic scores from genomic SEM consistently outperform those from univariate GWASs. Genomic SEM is flexible and open ended, and allows for continuous innovation in multivariate genetic analysis.
DOI
10.1038/s41562-019-0566-x
HIPO
PUBMED_LINK
FULL NAME
heritability informed power optimization
DESCRIPTION
hipo is an R package that performs heritability informed power optimization (HIPO) for conducting multi-trait association analysis on summary level data.
URL
TITLE
Heritability informed power optimization (HIPO) leads to enhanced detection of genetic associations across multiple traits.
Main citation
Qi G, Chatterjee N. (2018) Heritability informed power optimization (HIPO) leads to enhanced detection of genetic associations across multiple traits. PLoS Genet, 14 (10) e1007549. doi:10.1371/journal.pgen.1007549. PMID 30289880
ABSTRACT
Genome-wide association studies have shown that pleiotropy is a common phenomenon that can potentially be exploited for enhanced detection of susceptibility loci. We propose heritability informed power optimization (HIPO) for conducting powerful pleiotropic analysis using summary-level association statistics. We find optimal linear combinations of association coefficients across traits that are expected to maximize non-centrality parameter for the underlying test statistics, taking into account estimates of heritability, sample size variations and overlaps across the traits. Simulation studies show that the proposed method has correct type I error, robust to population stratification and leads to desired genome-wide enrichment of association signals. Application of the proposed method to publicly available data for three groups of genetically related traits, lipids (N = 188,577), psychiatric diseases (Ncase = 33,332, Ncontrol = 27,888) and social science traits (N ranging between 161,460 to 298,420 across individual traits) increased the number of genome-wide significant loci by 12%, 200% and 50%, respectively, compared to those found by analysis of individual traits. Evidence of replication is present for many of these loci in subsequent larger studies for individual traits. HIPO can potentially be extended to high-dimensional phenotypes as a way of dimension reduction to maximize power for subsequent genetic association testing.
DOI
10.1371/journal.pgen.1007549
JASS
PUBMED_LINK
FULL NAME
Joint Analysis of Summary Statistics
DESCRIPTION
JASS is a python package that handles the computation of the joint statistics over sets of selected GWAS results, and the interactive exploration of the results through a web interface. The generation of joint statistics over a set of selected studies, and the generation of static plots to display the results, is easily performed using the command line interface. These functionalities can also be accessed through a web application embedded in the python package, which also enables the exploration of the results through a dynamic Javascript interface. The JASS analysis module handles the data processing, going from the import of the data up to the computation of the joint statistics and the generation of the various static plots to illustrate the results. However, we also briefly describe in the next section the pre-processing of raw GWAS data which can be performed through a companion script provided on behalf of the JASS package.
URL
TITLE
JASS: command line and web interface for the joint analysis of GWAS results.
Main citation
Julienne H, Lechat P, Guillemot V, Lasry C, ...&, Aschard H. (2020) JASS: command line and web interface for the joint analysis of GWAS results. NAR Genom Bioinform, 2 (1) lqaa003. doi:10.1093/nargab/lqaa003. PMID 32002517
ABSTRACT
Genome-wide association study (GWAS) has been the driving force for identifying association between genetic variants and human phenotypes. Thousands of GWAS summary statistics covering a broad range of human traits and diseases are now publicly available. These GWAS have proven their utility for a range of secondary analyses, including in particular the joint analysis of multiple phenotypes to identify new associated genetic variants. However, although several methods have been proposed, there are very few large-scale applications published so far because of challenges in implementing these methods on real data. Here, we present JASS (Joint Analysis of Summary Statistics), a polyvalent Python package that addresses this need. Our package incorporates recently developed joint tests such as the omnibus approach and various weighted sum of Z-score tests while solving all practical and computational barriers for large-scale multivariate analysis of GWAS summary statistics. This includes data cleaning and harmonization tools, an efficient algorithm for fast derivation of joint statistics, an optimized data management process and a web interface for exploration purposes. Both benchmark analyses and real data applications demonstrated the robustness and strong potential of JASS for the detection of new associated genetic variants. Our package is freely available at https://gitlab.pasteur.fr/statistical-genetics/jass.
DOI
10.1093/nargab/lqaa003
LCP-GWAS
PUBMED_LINK
FULL NAME
Linear Combination Phenotype GWAS
KEYWORDS
multivariate GWAS follow-up analyses
TITLE
An expanded analysis framework for multivariate GWAS connects inflammatory biomarkers to functional variants and disease.
Main citation
Ruotsalainen SE, Partanen JJ, Cichonska A, Lin J, ...&, Koskela J. (2021) An expanded analysis framework for multivariate GWAS connects inflammatory biomarkers to functional variants and disease. Eur J Hum Genet, 29 (2) 309-324. doi:10.1038/s41431-020-00730-8. PMID 33110245
ABSTRACT
Multivariate methods are known to increase the statistical power to detect associations in the case of shared genetic basis between phenotypes. They have, however, lacked essential analytic tools to follow-up and understand the biology underlying these associations. We developed a novel computational workflow for multivariate GWAS follow-up analyses, including fine-mapping and identification of the subset of traits driving associations (driver traits). Many follow-up tools require univariate regression coefficients which are lacking from multivariate results. Our method overcomes this problem by using Canonical Correlation Analysis to turn each multivariate association into its optimal univariate Linear Combination Phenotype (LCP). This enables an LCP-GWAS, which in turn generates the statistics required for follow-up analyses. We implemented our method on 12 highly correlated inflammatory biomarkers in a Finnish population-based study. Altogether, we identified 11 associations, four of which (F5, ABO, C1orf140 and PDGFRB) were not detected by biomarker-specific analyses. Fine-mapping identified 19 signals within the 11 loci and driver trait analysis determined the traits contributing to the associations. A phenome-wide association study on the 19 representative variants from the signals in 176,899 individuals from the FinnGen study revealed 53 disease associations (p < 1 × 10-4). Several reported pQTLs in the 11 loci provided orthogonal evidence for the biologically relevant functions of the representative variants. Our novel multivariate analysis workflow provides a powerful addition to standard univariate GWAS analyses by enabling multivariate GWAS follow-up and thus promoting the advancement of powerful multivariate methods in genomics.
DOI
10.1038/s41431-020-00730-8
MANOVA
FULL NAME
multivariate analysis of variance
MOSTest
PUBMED_LINK
FULL NAME
Multivariate Omnibus Statistical Test
DESCRIPTION
MOSTest is a tool for join genetical analysis of multiple traits, using multivariate analysis to boost the power of discovering associated loci.
URL
TITLE
Understanding the genetic determinants of the brain with MOSTest.
Main citation
van der Meer D, Frei O, Kaufmann T, Shadrin AA, ...&, Dale AM. (2020) Understanding the genetic determinants of the brain with MOSTest. Nat Commun, 11 (1) 3512. doi:10.1038/s41467-020-17368-1. PMID 32665545
ABSTRACT
Regional brain morphology has a complex genetic architecture, consisting of many common polymorphisms with small individual effects. This has proven challenging for genome-wide association studies (GWAS). Due to the distributed nature of genetic signal across brain regions, multivariate analysis of regional measures may enhance discovery of genetic variants. Current multivariate approaches to GWAS are ill-suited for complex, large-scale data of this kind. Here, we introduce the Multivariate Omnibus Statistical Test (MOSTest), with an efficient computational design enabling rapid and reliable inference, and apply it to 171 regional brain morphology measures from 26,502 UK Biobank participants. At the conventional genome-wide significance threshold of α = 5 × 10-8, MOSTest identifies 347 genomic loci associated with regional brain morphology, more than any previous study, improving upon the discovery of established GWAS approaches more than threefold. Our findings implicate more than 5% of all protein-coding genes and provide evidence for gene sets involved in neuron development and differentiation.
DOI
10.1038/s41467-020-17368-1
MTAG
PUBMED_LINK
FULL NAME
Multi-Trait Analysis of GWAS
DESCRIPTION
mtag is a Python-based command line tool for jointly analyzing multiple sets of GWAS summary statistics as described by Turley et. al. (2018). It can also be used as a tool to meta-analyze GWAS results.
URL
KEYWORDS
Multi-trait
TITLE
Multi-trait analysis of genome-wide association summary statistics using MTAG.
Main citation
Turley P, Walters RK, Maghzian O, Okbay A, ...&, Benjamin DJ. (2018) Multi-trait analysis of genome-wide association summary statistics using MTAG. Nat Genet, 50 (2) 229-237. doi:10.1038/s41588-017-0009-4. PMID 29292387
ABSTRACT
We introduce multi-trait analysis of GWAS (MTAG), a method for joint analysis of summary statistics from genome-wide association studies (GWAS) of different traits, possibly from overlapping samples. We apply MTAG to summary statistics for depressive symptoms (N eff = 354,862), neuroticism (N = 168,105), and subjective well-being (N = 388,538). As compared to the 32, 9, and 13 genome-wide significant loci identified in the single-trait GWAS (most of which are themselves novel), MTAG increases the number of associated loci to 64, 37, and 49, respectively. Moreover, association statistics from MTAG yield more informative bioinformatics analyses and increase the variance explained by polygenic scores by approximately 25%, matching theoretical expectations.
DOI
10.1038/s41588-017-0009-4
MV-PLINK (MQFAM)
PUBMED_LINK
TITLE
A multivariate test of association.
Main citation
Ferreira MA, Purcell SM. (2009) A multivariate test of association. Bioinformatics, 25 (1) 132-3. doi:10.1093/bioinformatics/btn563. PMID 19019849
ABSTRACT
UNLABELLED: Although genetic association studies often test multiple, related phenotypes, few formal multivariate tests of association are available. We describe a test of association that can be efficiently applied to large population-based designs. AVAILABILITY: A C++ implementation can be obtained from the authors.
DOI
10.1093/bioinformatics/btn563
MultiPhen
PUBMED_LINK
DESCRIPTION
Performs genetic association tests between SNPs (one-at-a-time) and multiple phenotypes (separately or in joint model).
URL
TITLE
MultiPhen: joint model of multiple phenotypes can increase discovery in GWAS.
Main citation
O'Reilly PF, Hoggart CJ, Pomyen Y, Calboli FC, ...&, Coin LJ. (2012) MultiPhen: joint model of multiple phenotypes can increase discovery in GWAS. PLoS One, 7 (5) e34861. doi:10.1371/journal.pone.0034861. PMID 22567092
ABSTRACT
The genome-wide association study (GWAS) approach has discovered hundreds of genetic variants associated with diseases and quantitative traits. However, despite clinical overlap and statistical correlation between many phenotypes, GWAS are generally performed one-phenotype-at-a-time. Here we compare the performance of modelling multiple phenotypes jointly with that of the standard univariate approach. We introduce a new method and software, MultiPhen, that models multiple phenotypes simultaneously in a fast and interpretable way. By performing ordinal regression, MultiPhen tests the linear combination of phenotypes most associated with the genotypes at each SNP, and thus potentially captures effects hidden to single phenotype GWAS. We demonstrate via simulation that this approach provides a dramatic increase in power in many scenarios. There is a boost in power for variants that affect multiple phenotypes and for those that affect only one phenotype. While other multivariate methods have similar power gains, we describe several benefits of MultiPhen over these. In particular, we demonstrate that other multivariate methods that assume the genotypes are normally distributed, such as canonical correlation analysis (CCA) and MANOVA, can have highly inflated type-1 error rates when testing case-control or non-normal continuous phenotypes, while MultiPhen produces no such inflation. To test the performance of MultiPhen on real data we applied it to lipid traits in the Northern Finland Birth Cohort 1966 (NFBC1966). In these data MultiPhen discovers 21% more independent SNPs with known associations than the standard univariate GWAS approach, while applying MultiPhen in addition to the standard approach provides 37% increased discovery. The most associated linear combinations of the lipids estimated by MultiPhen at the leading SNPs accurately reflect the Friedewald Formula, suggesting that MultiPhen could be used to refine the definition of existing phenotypes or uncover novel heritable phenotypes.
DOI
10.1371/journal.pone.0034861
PCHAT
PUBMED_LINK
FULL NAME
principal component of heritability association test
TITLE
Pleiotropy and principal components of heritability combine to increase power for association analysis.
Main citation
Klei L, Luca D, Devlin B, Roeder K. (2008) Pleiotropy and principal components of heritability combine to increase power for association analysis. Genet Epidemiol, 32 (1) 9-19. doi:10.1002/gepi.20257. PMID 17922480
ABSTRACT
When many correlated traits are measured the potential exists to discover the coordinated control of these traits via genotyped polymorphisms. A common statistical approach to this problem involves assessing the relationship between each phenotype and each single nucleotide polymorphism (SNP) individually (PHN); and taking a Bonferroni correction for the effective number of independent tests conducted. Alternatively, one can apply a dimension reduction technique, such as estimation of principal components, and test for an association with the principal components of the phenotypes (PCP) rather than the individual phenotypes. Building on the work of Lange and colleagues we develop an alternative method based on the principal component of heritability (PCH). For each SNP the PCH approach reduces the phenotypes to a single trait that has a higher heritability than any other linear combination of the phenotypes. As a result, the association between a SNP and derived trait is often easier to detect than an association with any of the individual phenotypes or the PCP. When applied to unrelated subjects, PCH has a drawback. For each SNP it is necessary to estimate the vector of loadings that maximize the heritability over all phenotypes. We develop a method of iterated sample splitting that uses one portion of the data for training and the remainder for testing. This cross-validation approach maintains the type I error control and yet utilizes the data efficiently, resulting in a powerful test for association.
DOI
10.1002/gepi.20257
Porter
PUBMED_LINK
TITLE
Multivariate simulation framework reveals performance of multi-trait GWAS methods.
Main citation
Porter HF, O'Reilly PF. (2017) Multivariate simulation framework reveals performance of multi-trait GWAS methods. Sci Rep, 7 () 38837. doi:10.1038/srep38837. PMID 28287610
ABSTRACT
Burgeoning availability of genome-wide association study (GWAS) results and national biobank data has led to growing interest in performing multi-trait genetic analyses. Numerous multi-trait GWAS methods that exploit either summary statistics or individual-level data have been developed, but their relative performance is unclear. Here we develop a simulation framework to model the complex networks underlying multivariate genetic epidemiology, enabling the vast model space of genetic effects on multiple correlated traits to be explored systematically. We perform a comprehensive comparison of the leading multi-trait GWAS methods, finding: (1) method performance is highly sensitive to the specific combination of genetic effects and phenotypic correlations, (2) most of the current multivariate methods have remarkably similar statistical power, and (3) multivariate methods may offer a substantial increase in the discovery of genetic variants over the standard univariate approach. We believe our findings offer the clearest picture to date of the relative performance of multi-trait GWAS methods and act as a guide for method selection. We provide a web application and open-source software program implementing our simulation framework, for: (i) further benchmarking of multivariate GWAS methods, (ii) power calculations for multivariate genetic studies, and (iii) generating data for testing any multivariate method in genetic epidemiology.
DOI
10.1038/srep38837
Salinas
PUBMED_LINK
TITLE
Statistical Analysis of Multiple Phenotypes in Genetic Epidemiologic Studies: From Cross-Phenotype Associations to Pleiotropy.
Main citation
Salinas YD, Wang Z, DeWan AT. (2018) Statistical Analysis of Multiple Phenotypes in Genetic Epidemiologic Studies: From Cross-Phenotype Associations to Pleiotropy. Am J Epidemiol, 187 (4) 855-863. doi:10.1093/aje/kwx296. PMID 29020254
ABSTRACT
In the context of genetics, pleiotropy refers to the phenomenon in which a single genetic locus affects more than 1 trait or disease. Genetic epidemiologic studies have identified loci associated with multiple phenotypes, and these cross-phenotype associations are often incorrectly interpreted as examples of pleiotropy. Pleiotropy is only one possible explanation for cross-phenotype associations. Cross-phenotype associations may also arise due to issues related to study design, confounder bias, or nongenetic causal links between the phenotypes under analysis. Therefore, it is necessary to dissect cross-phenotype associations carefully to uncover true pleiotropic loci. In this review, we describe statistical methods that can be used to identify robust statistical evidence of pleiotropy. First, we provide an overview of univariate and multivariate methods for discovery of cross-phenotype associations and highlight important considerations for choosing among available methods. Then, we describe how to dissect cross-phenotype associations by using mediation analysis. Pleiotropic loci provide insights into the mechanistic underpinnings of disease comorbidity, and they may serve as novel targets for interventions that simultaneously treat multiple diseases. Discerning between different types of cross-phenotype associations is necessary to realize the public health potential of pleiotropic loci.
DOI
10.1093/aje/kwx296
Stephens
PUBMED_LINK
TITLE
A unified framework for association analysis with multiple related phenotypes.
Main citation
Stephens M. (2013) A unified framework for association analysis with multiple related phenotypes. PLoS One, 8 (7) e65245. doi:10.1371/journal.pone.0065245. PMID 23861737
ABSTRACT
We consider the problem of assessing associations between multiple related outcome variables, and a single explanatory variable of interest. This problem arises in many settings, including genetic association studies, where the explanatory variable is genotype at a genetic variant. We outline a framework for conducting this type of analysis, based on Bayesian model comparison and model averaging for multivariate regressions. This framework unifies several common approaches to this problem, and includes both standard univariate and standard multivariate association tests as special cases. The framework also unifies the problems of testing for associations and explaining associations - that is, identifying which outcome variables are associated with genotype. This provides an alternative to the usual, but conceptually unsatisfying, approach of resorting to univariate tests when explaining and interpreting significant multivariate findings. The method is computationally tractable genome-wide for modest numbers of phenotypes (e.g. 5-10), and can be applied to summary data, without access to raw genotype and phenotype data. We illustrate the methods on both simulated examples, and to a genome-wide association study of blood lipid traits where we identify 18 potential novel genetic associations that were not identified by univariate analyses of the same data.
DOI
10.1371/journal.pone.0065245
TATES
PUBMED_LINK
FULL NAME
Trait-based Association Test that uses Extended Simes procedure
TITLE
TATES: efficient multivariate genotype-phenotype analysis for genome-wide association studies.
Main citation
van der Sluis S, Posthuma D, Dolan CV. (2013) TATES: efficient multivariate genotype-phenotype analysis for genome-wide association studies. PLoS Genet, 9 (1) e1003235. doi:10.1371/journal.pgen.1003235. PMID 23359524
ABSTRACT
To date, the genome-wide association study (GWAS) is the primary tool to identify genetic variants that cause phenotypic variation. As GWAS analyses are generally univariate in nature, multivariate phenotypic information is usually reduced to a single composite score. This practice often results in loss of statistical power to detect causal variants. Multivariate genotype-phenotype methods do exist but attain maximal power only in special circumstances. Here, we present a new multivariate method that we refer to as TATES (Trait-based Association Test that uses Extended Simes procedure), inspired by the GATES procedure proposed by Li et al (2011). For each component of a multivariate trait, TATES combines p-values obtained in standard univariate GWAS to acquire one trait-based p-value, while correcting for correlations between components. Extensive simulations, probing a wide variety of genotype-phenotype models, show that TATES's false positive rate is correct, and that TATES's statistical power to detect causal variants explaining 0.5% of the variance can be 2.5-9 times higher than the power of univariate tests based on composite scores and 1.5-2 times higher than the power of the standard MANOVA. Unlike other multivariate methods, TATES detects both genetic variants that are common to multiple phenotypes and genetic variants that are specific to a single phenotype, i.e. TATES provides a more complete view of the genetic architecture of complex traits. As the actual causal genotype-phenotype model is usually unknown and probably phenotypically and genetically complex, TATES, available as an open source program, constitutes a powerful new multivariate strategy that allows researchers to identify novel causal variants, while the complexity of traits is no longer a limiting factor.
DOI
10.1371/journal.pgen.1003235
Yang
PUBMED_LINK
TITLE
Methods for Analyzing Multivariate Phenotypes in Genetic Association Studies.
Main citation
Yang Q, Wang Y. (2012) Methods for Analyzing Multivariate Phenotypes in Genetic Association Studies. J Probab Stat, 2012 () 652569. doi:10.1155/2012/652569. PMID 24748889
ABSTRACT
This is an open access article distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited. Multivariate phenotypes are frequently encountered in genetic association studies. The purpose of analyzing multivariate phenotypes usually includes discovery of novel genetic variants of pleiotropy effects, that is, affecting multiple phenotypes, and the ultimate goal of uncovering the underlying genetic mechanism. In recent years, there have been new method development and application of existing statistical methods to such phenotypes. In this paper, we provide a review of the available methods for analyzing association between a single marker and a multivariate phenotype consisting of the same type of components (e.g., all continuous or all categorical) or different types of components (e.g., some are continuous and others are categorical). We also reviewed causal inference methods designed to test whether the detected association with the multivariate phenotype is truly pleiotropy or the genetic marker exerts its effects on some phenotypes through affecting the others.
DOI
10.1155/2012/652569
aMAT
PUBMED_LINK
FULL NAME
adaptive multi-trait association test
TITLE
Multi-trait Genome-Wide Analyses of the Brain Imaging Phenotypes in UK Biobank.
Main citation
Wu C. (2020) Multi-trait Genome-Wide Analyses of the Brain Imaging Phenotypes in UK Biobank. Genetics, 215 (4) 947-958. doi:10.1534/genetics.120.303242. PMID 32540950
ABSTRACT
Many genetic variants identified in genome-wide association studies (GWAS) are associated with multiple, sometimes seemingly unrelated, traits. This motivates multi-trait association analyses, which have successfully identified novel associated loci for many complex diseases. While appealing, most existing methods focus on analyzing a relatively small number of traits, and may yield inflated Type 1 error rates when a large number of traits need to be analyzed jointly. As deep phenotyping data are becoming rapidly available, we develop a novel method, referred to as aMAT (adaptive multi-trait association test), for multi-trait analysis of any number of traits. We applied aMAT to GWAS summary statistics for a set of 58 volumetric imaging derived phenotypes from the UK Biobank. aMAT had a genomic inflation factor of 1.04, indicating the Type 1 error rate was well controlled. More important, aMAT identified 24 distinct risk loci, 13 of which were ignored by standard GWAS. In comparison, the competing methods either had a suspicious genomic inflation factor or identified much fewer risk loci. Finally, four additional sets of traits have been analyzed and provided similar conclusions.
DOI
10.1534/genetics.120.303242
condFDR
PUBMED_LINK
FULL NAME
pleiotropy-informed conditional false discovery rate
DESCRIPTION
Uses GWAS summary statistics from two related traits to estimate conditional false discovery rates from conditional Q–Q curves, boosting discovery of variants that may fall below standard genome-wide thresholds in single-trait scans. The framework includes conjunction FDR for loci associated with both traits.
URL
KEYWORDS
Pleiotropy,conditional FDR,conjunction FDR,summary statistics,multi-trait
TITLE
Improved detection of common variants associated with schizophrenia and bipolar disorder using pleiotropy-informed conditional false discovery rate.
Main citation
Andreassen OA, Thompson WK, Schork AJ, Ripke S, Mattingsdal M, Kelsoe JR, Kendler KS, O'Donovan MC, Rujescu D, Werge T, Sklar P, et al. (2013) Improved detection of common variants associated with schizophrenia and bipolar disorder using pleiotropy-informed conditional false discovery rate. PLoS Genet, 9 (4) e1003455. doi:10.1371/journal.pgen.1003455. PMID 23637625
ABSTRACT
Several lines of evidence suggest that genome-wide association studies (GWAS) have the potential to explain more of the "missing heritability" of common complex phenotypes. However, reliable methods to identify a larger proportion of single nucleotide polymorphisms (SNPs) that impact disease risk are currently lacking. Here, we use a genetic pleiotropy-informed conditional false discovery rate (FDR) method on GWAS summary statistics data to identify new loci associated with schizophrenia (SCZ) and bipolar disorders (BD), two highly heritable disorders with significant missing heritability. Epidemiological and clinical evidence suggest similar disease characteristics and overlapping genes between SCZ and BD. Here, we computed conditional Q-Q curves of data from the Psychiatric Genome Consortium (SCZ; n = 9,379 cases and n = 7,736 controls; BD: n = 6,990 cases and n = 4,820 controls) to show enrichment of SNPs associated with SCZ as a function of association with BD and vice versa with a corresponding reduction in FDR. Applying the conditional FDR method, we identified 58 loci associated with SCZ and 35 loci associated with BD below the conditional FDR level of 0.05. Of these, 14 loci were associated with both SCZ and BD (conjunction FDR). Together, these findings show the feasibility of genetic pleiotropy-informed methods to improve gene discovery in SCZ and BD and indicate overlapping genetic mechanisms between these two disorders.
DOI
10.1371/journal.pgen.1003455
fastASSET
PUBMED_LINK
URL
TITLE
Genome-wide large-scale multi-trait analysis characterizes global patterns of pleiotropy and unique trait-specific variants.
Main citation
Qi G, Chhetri SB, Ray D, Dutta D, ...&, Chatterjee N. (2024) Genome-wide large-scale multi-trait analysis characterizes global patterns of pleiotropy and unique trait-specific variants. Nat Commun, 15 (1) 6985. doi:10.1038/s41467-024-51075-5. PMID 39143063
ABSTRACT
Genome-wide association studies (GWAS) have found widespread evidence of pleiotropy, but characterization of global patterns of pleiotropy remain highly incomplete due to insufficient power of current approaches. We develop fastASSET, a method that allows efficient detection of variant-level pleiotropic association across many traits. We analyze GWAS summary statistics of 116 complex traits of diverse types collected from the GRASP repository and large GWAS Consortia. We identify 2293 independent loci and find that the lead variants in nearly all these loci (~99%) to be associated with ≥ 2 traits (median = 6). We observe that degree of pleiotropy estimated from our study predicts that observed in the UK Biobank for a much larger number of traits (K = 4114) (correlation = 0.43, p-value < 2.2 × 10 - 16 ). Follow-up analyzes of 21 trait-specific variants indicate their link to the expression in trait-related tissues for a small number of genes involved in relevant biological processes. Our findings provide deeper insight into the nature of pleiotropy and leads to identification of highly trait-specific susceptibility variants.
DOI
10.1038/s41467-024-51075-5
metaCCA
PUBMED_LINK
FULL NAME
meta canonical
correlation analysis
correlation analysis
DESCRIPTION
metaCCA performs multivariate analysis of a single or multiple GWAS based on univariate regression coefficients. It allows multivariate representation of both phenotype and genotype. metaCCA extends the statistical technique of canonical correlation analysis to the setting where original individual-level records are not available, and employs a covariance shrinkage algorithm to achieve robustness.
URL
TITLE
metaCCA: summary statistics-based multivariate meta-analysis of genome-wide association studies using canonical correlation analysis.
Main citation
Cichonska A, Rousu J, Marttinen P, Kangas AJ, ...&, Pirinen M. (2016) metaCCA: summary statistics-based multivariate meta-analysis of genome-wide association studies using canonical correlation analysis. Bioinformatics, 32 (13) 1981-9. doi:10.1093/bioinformatics/btw052. PMID 27153689
ABSTRACT
MOTIVATION: A dominant approach to genetic association studies is to perform univariate tests between genotype-phenotype pairs. However, analyzing related traits together increases statistical power, and certain complex associations become detectable only when several variants are tested jointly. Currently, modest sample sizes of individual cohorts, and restricted availability of individual-level genotype-phenotype data across the cohorts limit conducting multivariate tests. RESULTS: We introduce metaCCA, a computational framework for summary statistics-based analysis of a single or multiple studies that allows multivariate representation of both genotype and phenotype. It extends the statistical technique of canonical correlation analysis to the setting where original individual-level records are not available, and employs a covariance shrinkage algorithm to achieve robustness.Multivariate meta-analysis of two Finnish studies of nuclear magnetic resonance metabolomics by metaCCA, using standard univariate output from the program SNPTEST, shows an excellent agreement with the pooled individual-level analysis of original data. Motivated by strong multivariate signals in the lipid genes tested, we envision that multivariate association testing using metaCCA has a great potential to provide novel insights from already published summary statistics from high-throughput phenotyping technologies. AVAILABILITY AND IMPLEMENTATION: Code is available at https://github.com/aalto-ics-kepaco CONTACTS: anna.cichonska@helsinki.fi or matti.pirinen@helsinki.fi SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.
DOI
10.1093/bioinformatics/btw052
metaUSAT/metaMANOVA
PUBMED_LINK
FULL NAME
unified score-based association test
DESCRIPTION
metaUSAT is a data-adaptive statistical approach for testing genetic associations of multiple traits from single/multiple studies using univariate GWAS summary statistics. This multivariate meta-analysis method can appropriately account for overlapping samples (if any) and can potentially test binary and/or continuous traits.
URL
TITLE
Methods for meta-analysis of multiple traits using GWAS summary statistics.
Main citation
Ray D, Boehnke M. (2018) Methods for meta-analysis of multiple traits using GWAS summary statistics. Genet Epidemiol, 42 (2) 134-145. doi:10.1002/gepi.22105. PMID 29226385
ABSTRACT
Genome-wide association studies (GWAS) for complex diseases have focused primarily on single-trait analyses for disease status and disease-related quantitative traits. For example, GWAS on risk factors for coronary artery disease analyze genetic associations of plasma lipids such as total cholesterol, LDL-cholesterol, HDL-cholesterol, and triglycerides (TGs) separately. However, traits are often correlated and a joint analysis may yield increased statistical power for association over multiple univariate analyses. Recently several multivariate methods have been proposed that require individual-level data. Here, we develop metaUSAT (where USAT is unified score-based association test), a novel unified association test of a single genetic variant with multiple traits that uses only summary statistics from existing GWAS. Although the existing methods either perform well when most correlated traits are affected by the genetic variant in the same direction or are powerful when only a few of the correlated traits are associated, metaUSAT is designed to be robust to the association structure of correlated traits. metaUSAT does not require individual-level data and can test genetic associations of categorical and/or continuous traits. One can also use metaUSAT to analyze a single trait over multiple studies, appropriately accounting for overlapping samples, if any. metaUSAT provides an approximate asymptotic P-value for association and is computationally efficient for implementation at a genome-wide level. Simulation experiments show that metaUSAT maintains proper type-I error at low error levels. It has similar and sometimes greater power to detect association across a wide array of scenarios compared to existing methods, which are usually powerful for some specific association scenarios only. When applied to plasma lipids summary data from the METSIM and the T2D-GENES studies, metaUSAT detected genome-wide significant loci beyond the ones identified by univariate analyses. Evidence from larger studies suggest that the variants additionally detected by our test are, indeed, associated with lipid levels in humans. In summary, metaUSAT can provide novel insights into the genetic architecture of a common disease or traits.
DOI
10.1002/gepi.22105
mvGWAMA
PUBMED_LINK
FULL NAME
Multivariate Genome-Wide Association Meta-Analysis
DESCRIPTION
mvGWAMA is a python script to perform a GWAS meta-analysis when there are sample overlap.
URL
TITLE
Genome-wide meta-analysis identifies new loci and functional pathways influencing Alzheimer's disease risk.
Main citation
Jansen IE, Savage JE, Watanabe K, Bryois J, ...&, Posthuma D. (2019) Genome-wide meta-analysis identifies new loci and functional pathways influencing Alzheimer's disease risk. Nat Genet, 51 (3) 404-413. doi:10.1038/s41588-018-0311-9. PMID 30617256
ABSTRACT
Alzheimer's disease (AD) is highly heritable and recent studies have identified over 20 disease-associated genomic loci. Yet these only explain a small proportion of the genetic variance, indicating that undiscovered loci remain. Here, we performed a large genome-wide association study of clinically diagnosed AD and AD-by-proxy (71,880 cases, 383,378 controls). AD-by-proxy, based on parental diagnoses, showed strong genetic correlation with AD (rg = 0.81). Meta-analysis identified 29 risk loci, implicating 215 potential causative genes. Associated genes are strongly expressed in immune-related tissues and cell types (spleen, liver, and microglia). Gene-set analyses indicate biological mechanisms involved in lipid-related processes and degradation of amyloid precursor proteins. We show strong genetic correlations with multiple health-related outcomes, and Mendelian randomization results suggest a protective effect of cognitive ability on AD risk. These results are a step forward in identifying the genetic factors that contribute to AD risk and add novel insights into the neurobiology of AD.
DOI
10.1038/s41588-018-0311-9
Rare-variant
Meta-SAIGE
PUBMED_LINK
DESCRIPTION
Meta-SAIGE performs scalable cohort-level rare-variant meta-analysis from study-level outputs, emphasizing accurate null calibration (including low-prevalence binary traits), computational efficiency via reuse of LD structure across phenotypes, and power close to pooled individual-level analysis with SAIGE-GENE+.
URL
KEYWORDS
rare variant, meta-analysis, SAIGE, summary statistics, type I error
TITLE
Scalable and accurate rare variant meta-analysis with Meta-SAIGE.
Main citation
Park E, Nam K, Jeong S, Keat K, ...&, Lee S. (2025) Scalable and accurate rare variant meta-analysis with Meta-SAIGE. Nat Genet, 57 (12) 3185-3192. doi:10.1038/s41588-025-02403-y. PMID 41266648
ABSTRACT
Meta-analysis enhances the power of rare variant association tests by combining summary statistics across several cohorts. However, existing methods often fail to control type I error for low-prevalence binary traits and are computationally intensive. Here we introduce Meta-SAIGE-a scalable method for rare variant meta-analysis that accurately estimates the null distribution to control type I error and reuses the linkage disequilibrium matrix across phenotypes to boost computational efficiency in phenome-wide analyses. Simulations using UK Biobank whole-exome sequencing data show that Meta-SAIGE effectively controls type I error and achieves power comparable to pooled individual-level analysis with SAIGE-GENE+. Applying Meta-SAIGE to 83 low-prevalence phenotypes in UK Biobank and All of Us whole-exome sequencing data identified 237 gene-trait associations. Notably, 80 of these associations were not significant in either dataset alone, underscoring the power of our meta-analysis.
DOI
10.1038/s41588-025-02403-y
MetaSKAT
PUBMED_LINK
DESCRIPTION
MetaSKAT is a R package for multiple marker meta-analysis. It can carry out meta-analysis of SKAT, SKAT-O and burden tests with individual level genotype data or gene level summary statistics.
URL
TITLE
General framework for meta-analysis of rare variants in sequencing association studies.
Main citation
Lee S, Teslovich TM, Boehnke M, Lin X. (2013) General framework for meta-analysis of rare variants in sequencing association studies. Am J Hum Genet, 93 (1) 42-53. doi:10.1016/j.ajhg.2013.05.010. PMID 23768515
ABSTRACT
We propose a general statistical framework for meta-analysis of gene- or region-based multimarker rare variant association tests in sequencing association studies. In genome-wide association studies, single-marker meta-analysis has been widely used to increase statistical power by combining results via regression coefficients and standard errors from different studies. In analysis of rare variants in sequencing studies, region-based multimarker tests are often used to increase power. We propose meta-analysis methods for commonly used gene- or region-based rare variants tests, such as burden tests and variance component tests. Because estimation of regression coefficients of individual rare variants is often unstable or not feasible, the proposed method avoids this difficulty by calculating score statistics instead that only require fitting the null model for each study and then aggregating these score statistics across studies. Our proposed meta-analysis rare variant association tests are conducted based on study-specific summary statistics, specifically score statistics for each variant and between-variant covariance-type (linkage disequilibrium) relationship statistics for each gene or region. The proposed methods are able to incorporate different levels of heterogeneity of genetic effects across studies and are applicable to meta-analysis of multiple ancestry groups. We show that the proposed methods are essentially as powerful as joint analysis by directly pooling individual level genotype data. We conduct extensive simulations to evaluate the performance of our methods by varying levels of heterogeneity across studies, and we apply the proposed methods to meta-analysis of rare variant effects in a multicohort study of the genetics of blood lipid levels.
DOI
10.1016/j.ajhg.2013.05.010
MetaSTAAR
PUBMED_LINK
DESCRIPTION
MetaSTAAR is an R package for performing Meta-analysis of variant-Set Test for Association using Annotation infoRmation (MetaSTAAR) procedure in whole-genome sequencing (WGS) studies. MetaSTAAR enables functionally-informed rare variant meta-analysis of large WGS studies using an efficient, sparse matrix approach for storing summary statistic, while protecting data privacy of study participants and avoiding sharing subject-level data. MetaSTAAR accounts for relatedness and population structure of continuous and dichotomous traits, and boosts the power of rare variant meta-analysis by incorporating multiple variant functional annotations.
URL
TITLE
Powerful, scalable and resource-efficient meta-analysis of rare variant associations in large whole genome sequencing studies.
Main citation
Li X, Quick C, Zhou H, Gaynor SM, ...&, Lin X. (2023) Powerful, scalable and resource-efficient meta-analysis of rare variant associations in large whole genome sequencing studies. Nat Genet, 55 (1) 154-164. doi:10.1038/s41588-022-01225-6. PMID 36564505
ABSTRACT
Meta-analysis of whole genome sequencing/whole exome sequencing (WGS/WES) studies provides an attractive solution to the problem of collecting large sample sizes for discovering rare variants associated with complex phenotypes. Existing rare variant meta-analysis approaches are not scalable to biobank-scale WGS data. Here we present MetaSTAAR, a powerful and resource-efficient rare variant meta-analysis framework for large-scale WGS/WES studies. MetaSTAAR accounts for relatedness and population structure, can analyze both quantitative and dichotomous traits and boosts the power of rare variant tests by incorporating multiple variant functional annotations. Through meta-analysis of four lipid traits in 30,138 ancestrally diverse samples from 14 studies of the Trans Omics for Precision Medicine (TOPMed) Program, we show that MetaSTAAR performs rare variant meta-analysis at scale and produces results comparable to using pooled data. Additionally, we identified several conditionally significant rare variant associations with lipid traits. We further demonstrate that MetaSTAAR is scalable to biobank-scale cohorts through meta-analysis of TOPMed WGS data and UK Biobank WES data of ~200,000 samples.
DOI
10.1038/s41588-022-01225-6
RareMETAL
PUBMED_LINK
DESCRIPTION
RAREMETAL is a program that facilitates the meta-analysis of rare variants from genotype arrays or sequencing (manuscript in preparation).
URL
KEYWORDS
rare variants
TITLE
RAREMETAL: fast and powerful meta-analysis for rare variants.
Main citation
Feng S, Liu D, Zhan X, Wing MK, ...&, Abecasis GR. (2014) RAREMETAL: fast and powerful meta-analysis for rare variants. Bioinformatics, 30 (19) 2828-9. doi:10.1093/bioinformatics/btu367. PMID 24894501
ABSTRACT
SUMMARY: RAREMETAL is a computationally efficient tool for meta-analysis of rare variants genotyped using sequencing or arrays. RAREMETAL facilitates analyses of individual studies, accommodates a variety of input file formats, handles related and unrelated individuals, executes both single variant and burden tests and performs conditional association analyses. AVAILABILITY AND IMPLEMENTATION: http://genome.sph.umich.edu/wiki/RAREMETAL for executables, source code, documentation and tutorial.
DOI
10.1093/bioinformatics/btu367
SMMAT
PUBMED_LINK
FULL NAME
variant set mixed model association tests
DESCRIPTION
For rare variant analysis from sequencing association studies, GMMAT performs the variant Set Mixed Model Association Tests (SMMAT) as proposed in Chen et al. (2019), including the burden test, the sequence kernel association test (SKAT), SKAT-O and an efficient hybrid test of the burden test and SKAT, based on user-defined variant sets.
URL
TITLE
Efficient Variant Set Mixed Model Association Tests for Continuous and Binary Traits in Large-Scale Whole-Genome Sequencing Studies.
Main citation
Chen H, Huffman JE, Brody JA, Wang C, ...&, Lin X. (2019) Efficient Variant Set Mixed Model Association Tests for Continuous and Binary Traits in Large-Scale Whole-Genome Sequencing Studies. Am J Hum Genet, 104 (2) 260-274. doi:10.1016/j.ajhg.2018.12.012. PMID 30639324
ABSTRACT
With advances in whole-genome sequencing (WGS) technology, more advanced statistical methods for testing genetic association with rare variants are being developed. Methods in which variants are grouped for analysis are also known as variant-set, gene-based, and aggregate unit tests. The burden test and sequence kernel association test (SKAT) are two widely used variant-set tests, which were originally developed for samples of unrelated individuals and later have been extended to family data with known pedigree structures. However, computationally efficient and powerful variant-set tests are needed to make analyses tractable in large-scale WGS studies with complex study samples. In this paper, we propose the variant-set mixed model association tests (SMMAT) for continuous and binary traits using the generalized linear mixed model framework. These tests can be applied to large-scale WGS studies involving samples with population structure and relatedness, such as in the National Heart, Lung, and Blood Institute's Trans-Omics for Precision Medicine (TOPMed) program. SMMATs share the same null model for different variant sets, and a virtue of this null model, which includes covariates only, is that it needs to be fit only once for all tests in each genome-wide analysis. Simulation studies show that all the proposed SMMATs correctly control type I error rates for both continuous and binary traits in the presence of population structure and relatedness. We also illustrate our tests in a real data example of analysis of plasma fibrinogen levels in the TOPMed program (n = 23,763), using the Analysis Commons, a cloud-based computing platform.
DOI
10.1016/j.ajhg.2018.12.012