Skip to content

Tools Dimension reduction

Curation of Dimension reduction — listings under the GWAS Tools tab.

Summary Table

Click a column header to sort the table.

NAME Main citation YEAR
EIGENSTRAT
Price AL et al., Nat Genet, 2006
2006
PLINK-MDS
Purcell S et al., Am J Hum Genet, 2007
2007
SuSiE PCA
Yuan D et al., iScience, 2023
2023
UMAP
2018
2018
t-SNE
2008
2008

EIGENSTRAT

Tool
PUBMED_LINK
16862161
URL
https://github.com/DReichLab/EIG
KEYWORDS
PCA, Linear
TITLE
Principal components analysis corrects for stratification in genome-wide association studies.
Main citation
Price AL, Patterson NJ, Plenge RM, Weinblatt ME, ...&, Reich D. (2006) Principal components analysis corrects for stratification in genome-wide association studies. Nat Genet, 38 (8) 904-9. doi:10.1038/ng1847. PMID 16862161
ABSTRACT
Population stratification--allele frequency differences between cases and controls due to systematic ancestry differences-can cause spurious associations in disease studies. We describe a method that enables explicit detection and correction of population stratification on a genome-wide scale. Our method uses principal components analysis to explicitly model ancestry differences between cases and controls. The resulting correction is specific to a candidate marker's variation in frequency across ancestral populations, minimizing spurious associations while maximizing power to detect true associations. Our simple, efficient approach can easily be applied to disease studies with hundreds of thousands of markers.
DOI
10.1038/ng1847

PLINK-MDS (MDS)

Tool
PUBMED_LINK
17701901
FULL NAME
multidimensional scaling
URL
https://www.cog-genomics.org/plink/1.9/strat
KEYWORDS
MDS
TITLE
PLINK: a tool set for whole-genome association and population-based linkage analyses.
Main citation
Purcell S, Neale B, Todd-Brown K, Thomas L, ...&, Sham PC. (2007) PLINK: a tool set for whole-genome association and population-based linkage analyses. Am J Hum Genet, 81 (3) 559-75. doi:10.1086/519795. PMID 17701901
ABSTRACT
Whole-genome association studies (WGAS) bring new computational, as well as analytic, challenges to researchers. Many existing genetic-analysis tools are not designed to handle such large data sets in a convenient manner and do not necessarily exploit the new opportunities that whole-genome data bring. To address these issues, we developed PLINK, an open-source C/C++ WGAS tool set. With PLINK, large data sets comprising hundreds of thousands of markers genotyped for thousands of individuals can be rapidly manipulated and analyzed in their entirety. As well as providing tools to make the basic analytic steps computationally efficient, PLINK also supports some novel approaches to whole-genome data that take advantage of whole-genome coverage. We introduce PLINK and describe the five main domains of function: data management, summary statistics, population stratification, association analysis, and identity-by-descent estimation. In particular, we focus on the estimation and use of identity-by-state and identity-by-descent information in the context of population-based whole-genome studies. This information can be used to detect and correct for population stratification and to identify extended chromosomal segments that are shared identical by descent between very distantly related individuals. Analysis of the patterns of segmental sharing has the potential to map disease loci that contain multiple rare variants in a population-based linkage analysis.
DOI
10.1086/519795

SuSiE PCA

Tool
PUBMED_LINK
37953948
DESCRIPTION
SuSiE PCA is the abbreviation for the Sum of Single Effects model1 for principal component analysis. We develop SuSiE PCA for an efficient variable selection in PCA when dealing with high dimensional data with sparsity, and for quantifying uncertainty of contributing features for each latent component through posterior inclusion probabilities (PIPs).
URL
https://github.com/mancusolab/susiepca
KEYWORDS
PCA, SuSiE
TITLE
SuSiE PCA: A scalable Bayesian variable selection technique for principal component analysis.
Main citation
Yuan D, Mancuso N. (2023) SuSiE PCA: A scalable Bayesian variable selection technique for principal component analysis. iScience, 26 (11) 108181. doi:10.1016/j.isci.2023.108181. PMID 37953948
ABSTRACT
Latent factor models, like principal component analysis (PCA), provide a statistical framework to infer low-rank representation in various biological contexts. However, feature selection is challenging when this low-rank structure manifests from a sparse subspace. We introduce SuSiE PCA, a scalable sparse latent factor approach that evaluates uncertainty in contributing variables through posterior inclusion probabilities. We validate our model in extensive simulations and demonstrate that SuSiE PCA outperforms other approaches in signal detection and model robustness. We apply SuSiE PCA to multi-tissue expression quantitative trait loci (eQTLs) data from GTEx v8 and identify tissue-specific factors and their contributing eGenes. We further investigate its performance on the large-scale perturbation data and find that SuSiE PCA identifies modules with a higher enrichment of ribosome-related genes than sparse PCA (false discovery rate [FDR] =9.2×10-82 vs. 1.4×10-33), while being ∼ 18x faster. Overall, SuSiE PCA provides an efficient tool to identify relevant features in high-dimensional biological data.
DOI
10.1016/j.isci.2023.108181

UMAP

Tool
FULL NAME
Uniform Manifold Approximation and Projection
URL
https://github.com/lmcinnes/umap
KEYWORDS
UMAP
Main citation
McInnes, L., Healy, J., & Melville, J. (2018). Umap: Uniform manifold approximation and projection for dimension reduction. arXiv preprint arXiv:1802.03426.