Skip to content

Summary statistics

Catalog entries using this tag (links open the entry card on its page):

Entries

COWAS

TWAS Functional genomics Gene prioritization Tool Summary statistics
PUBMED_LINK
41381446
FULL NAME
Co-expression-wide association study
DESCRIPTION
Co-expression-wide association study (COWAS) extends TWAS/PWAS by testing pairs of genes or proteins whose genetically regulated co-expression or interaction is associated with a trait; includes implemented R software and trained imputation weights for summary-statistic follow-up.
URL
https://github.com/mykmal/cowas ,https://doi.org/10.1038/s41467-025-66039-6
KEYWORDS
TWAS, PWAS, co-expression, gene-gene interaction, GWAS summary statistics
TITLE
Co-expression-wide association studies link genetically regulated interactions with complex traits.
Main citation
Malakhov MM, Pan W. (2025) Co-expression-wide association studies link genetically regulated interactions with complex traits. Nat Commun, 16 (1) 11061. doi:10.1038/s41467-025-66039-6. PMID 41381446
ABSTRACT
Transcriptome- and proteome-wide association studies (TWAS/PWAS) have proven successful in prioritizing genes and proteins whose genetically regulated expression modulates disease risk, but they ignore potential co-expression and interaction effects. To address this limitation, we introduce the co-expression-wide association study (COWAS) method, which can identify pairs of genes or proteins whose genetically regulated co-expression is associated with complex traits. COWAS first trains models to predict expression and co-expression from genetic variation, and then tests for association between imputed co-expression and the trait of interest while also accounting for direct effects from each exposure. We applied our method to plasma proteomic concentrations from the UK Biobank, identifying dozens of interacting protein pairs associated with cholesterol levels, Alzheimer's disease, and Parkinson's disease. Notably, our results demonstrate that co-expression between proteins may affect complex traits even if neither protein is detected to influence the trait when considered on its own. We also show how COWAS can help to disentangle direct and interaction effects, providing a richer picture of the molecular networks that mediate genetic effects on disease outcomes.
DOI
10.1038/s41467-025-66039-6

JointPRS

PRS Multi-ancestry Cross-ancestry Genetic correlation Tool Summary statistics
PUBMED_LINK
40268942
DESCRIPTION
Data-adaptive polygenic score framework that borrows strength across populations via genetic correlations using only GWAS summary statistics and LD references—supporting prediction with or without individual-level tuning data.
URL
https://github.com/LeqiXu/JointPRS ,https://doi.org/10.1038/s41467-025-59243-x
KEYWORDS
PRS, multi-population, genetic correlation, summary statistics, cross-ancestry
TITLE
JointPRS: A data-adaptive framework for multi-population genetic risk prediction incorporating genetic correlation.
Main citation
Xu L, Zhou G, Jiang W, Zhang H, ...&, Zhao H. (2025) JointPRS: A data-adaptive framework for multi-population genetic risk prediction incorporating genetic correlation. Nat Commun, 16 (1) 3841. doi:10.1038/s41467-025-59243-x. PMID 40268942
ABSTRACT
Genetic risk prediction for non-European populations is hindered by limited Genome-Wide Association Study (GWAS) sample sizes and small tuning datasets. We propose JointPRS, a data-adaptive framework that leverages genetic correlations across multiple populations using GWAS summary statistics. It achieves accurate predictions without individual-level tuning data and remains effective in the presence of a small tuning set thanks to its data-adaptive approach. Through extensive simulations and real data applications to 22 quantitative and four binary traits in five continental populations evaluated using the UK Biobank (UKBB) and All of Us (AoU), JointPRS consistently outperforms six state-of-the-art methods across three data scenarios: no tuning data, same-cohort tuning and testing, and cross-cohort tuning and testing. Notably, in the Admixed American population, JointPRS improves lipid trait prediction in AoU by 6.46%-172.00% compared to the other existing methods.
DOI
10.1038/s41467-025-59243-x

PP-GWAS

GWAS Privacy-preserving GWAS Tool Summary statistics
PUBMED_LINK
41365878
DESCRIPTION
Privacy-preserving framework for multi-site GWAS on quantitative traits using a distributed linear mixed model and randomized encoding so servers never see raw genotypes or phenotypes—only obfuscated intermediates—while improving speed versus several cryptographic baselines.
URL
https://github.com/mdppml/PP-GWAS ,https://doi.org/10.1038/s41467-025-66771-z
KEYWORDS
Privacy-preserving GWAS, multi-site, quantitative traits, federated analysis
TITLE
PP-GWAS: Privacy Preserving Multi-Site Genome-wide Association Studies.
Main citation
Swaminathan A, Hannemann A, Ünal AB, Pfeifer N, ...&, Akgün M. (2025) PP-GWAS: Privacy Preserving Multi-Site Genome-wide Association Studies. Nat Commun, 16 (1) 11030. doi:10.1038/s41467-025-66771-z. PMID 41365878
ABSTRACT
Genome-wide association studies help uncover genetic influences on complex traits and diseases. Importantly, multi-site data collaborations enhance the statistical power of these studies but pose challenges due to the sensitivity of genomic data. Existing privacy-preserving approaches to performing multi-site genome-wide association studies rely on computationally expensive cryptographic techniques, which limit applicability. To address this, we present PP-GWAS, a privacy-preserving algorithm that improves efficiency and scalability while maintaining data privacy. Our method leverages randomized encoding within a distributed framework to perform stacked ridge regression on a linear mixed model, enabling robust analysis of quantitative phenotypes. We show experimentally using real-world and synthetic data that our approach achieves twice the computational speed of comparable methods while reducing resource consumption.
DOI
10.1038/s41467-025-66771-z

scTWAS

TWAS Single cell scRNA-seq Tool Summary statistics
PUBMED_LINK
41820391
DESCRIPTION
Statistical framework for cell-type-resolved transcriptome-wide association using single-cell RNA-seq: models sparsity and technical noise via latent variables and moment-based estimation to improve genetically regulated expression prediction and gene–trait discovery.
URL
https://github.com/ZhaotongL/scTWAS ,https://doi.org/10.1038/s41467-026-70374-7
KEYWORDS
TWAS, single-cell, cell-type-specific, latent variable, GReX
TITLE
scTWAS: a powerful statistical framework for single-cell transcriptome-wide association studies.
Main citation
Lin Z, Su C. (2026) scTWAS: a powerful statistical framework for single-cell transcriptome-wide association studies. Nat Commun, () . doi:10.1038/s41467-026-70374-7. PMID 41820391
ABSTRACT
Transcriptome-wide association studies (TWAS) have successfully identified genes associated with complex traits and diseases, but most have been performed using bulk gene expression data, which aggregate signals across heterogeneous cell types. Population-scale single-cell RNA sequencing data now make it possible to perform TWAS at the cell-type resolution, but present unique challenges due to strong noises, technical variations, and high sparsity. Here, we propose scTWAS, a statistical method to conduct cell-type-specific TWAS using single-cell data. Leveraging a latent-variable model and moment-based estimation to address the challenges of single-cell data, scTWAS consistently improves the prediction of genetically regulated gene expression across cell types in both blood and brain tissues. Compared to existing methods, scTWAS identifies substantially more gene-trait associations across 29 hematological traits and three immune-related diseases in immune cell types. An application to Alzheimer's disease also reveals cell-subtype-specific associations, including MS4A6A in the disease-associated microglial subtype and PPP1R37 in the inflammatory microglial subtype.
DOI
10.1038/s41467-026-70374-7

TGVIS

TWAS Gene prioritization Fine mapping Tool Summary statistics
PUBMED_LINK
40603866
FULL NAME
Tissue-Gene pairs, direct causal Variants, and Infinitesimal effects selector
DESCRIPTION
Multivariate TWAS approach that prioritizes causal gene–tissue pairs and candidate causal variants from GWAS summary data while explicitly controlling for genome-wide infinitesimal (polygenic) effects that can otherwise inflate false gene discoveries.
URL
https://github.com/harryyiheyang/TGVIS ,https://doi.org/10.1038/s41467-025-61423-8
KEYWORDS
multivariate TWAS, infinitesimal model, causal gene-tissue, eQTL, sQTL
TITLE
Uncovering causal gene-tissue pairs and variants through a multivariate TWAS controlling for infinitesimal effects.
Main citation
Yang Y, Lorincz-Comi N, Zhu X. (2025) Uncovering causal gene-tissue pairs and variants through a multivariate TWAS controlling for infinitesimal effects. Nat Commun, 16 (1) 6098. doi:10.1038/s41467-025-61423-8. PMID 40603866
ABSTRACT
Transcriptome-wide association studies (TWAS) are commonly used to prioritize causal genes underlying associations found in genome-wide association studies (GWAS) and have been extended to identify causal genes through multivariate TWAS methods. However, recent studies have shown that widespread infinitesimal effects due to polygenicity can impair the performance of these methods. In this report, we introduce a multivariate TWAS method named tissue-gene pairs, direct causal variants, and infinitesimal effects selector (TGVIS) to identify tissue-specific causal genes and direct causal variants while accounting for infinitesimal effects. In simulations, TGVIS maintains an accurate prioritization of causal gene-tissue pairs and variants and demonstrates comparable or superior power to existing approaches, regardless of the presence of infinitesimal effects. In the real data analysis of GWAS summary data of 45 cardiometabolic traits and expression/splicing quantitative trait loci from 31 tissues, TGVIS is able to improve causal gene prioritization and identifies novel genes that were missed by conventional TWAS.
DOI
10.1038/s41467-025-61423-8

A Table of all published GWAS with metabolomics

Summary statistics
PUBMED_LINK
26160913
DESCRIPTION
This table was initially published in Kastenmüller et al., Genetics of human metabolism: an update. Hum. Mol. Genet. 2015 and has been updated as of 23 April 2024.
URL
http://www.metabolomix.com/list-of-all-published-gwas-with-metabolomics/
TITLE
Genetics of human metabolism: an update.
Main citation
Kastenmüller G, Raffler J, Gieger C, Suhre K. (2015) Genetics of human metabolism: an update. Hum Mol Genet, 24 (R1) R93-R101. doi:10.1093/hmg/ddv263. PMID 26160913
ABSTRACT
Genome-wide association studies with metabolomics (mGWAS) identify genetically influenced metabotypes (GIMs), their ensemble defining the heritable part of every human's metabolic individuality. Knowledge of genetic variation in metabolism has many applications of biomedical and pharmaceutical interests, including the functional understanding of genetic associations with clinical end points, design of strategies to correct dysregulations in metabolic disorders and the identification of genetic effect modifiers of metabolic disease biomarkers. Furthermore, it has been shown that GIMs provide testable hypotheses for functional genomics and metabolomics and for the identification of novel gene functions and metabolite identities. mGWAS with growing sample sizes and increasingly complex metabolic trait panels are being conducted, allowing for more comprehensive and systems-based downstream analyses. The generated large datasets of genetic associations can now be mined by the biomedical research community and provide valuable resources for hypothesis-driven studies. In this review, we provide a brief summary of the key aspects of mGWAS, followed by an update of recently published mGWAS. We then discuss new approaches of integrating and exploring mGWAS results and finish by presenting selected applications of GIMs in recent studies.
DOI
10.1093/hmg/ddv263

Ahola-Olli AV, et al-27989323

Summary statistics
PUBMED_LINK
27989323
TITLE
Genome-wide Association Study Identifies 27 Loci Influencing Concentrations of Circulating Cytokines and Growth Factors.
Main citation
Ahola-Olli AV, Würtz P, Havulinna AS, Aalto K, ...&, Raitakari OT. (2017) Genome-wide Association Study Identifies 27 Loci Influencing Concentrations of Circulating Cytokines and Growth Factors. Am J Hum Genet, 100 (1) 40-50. doi:10.1016/j.ajhg.2016.11.007. PMID 27989323
ABSTRACT
Circulating cytokines and growth factors are regulators of inflammation and have been implicated in autoimmune and metabolic diseases. In this genome-wide association study (GWAS) of up to 8,293 Finns we identified 27 genome-widely significant loci (p < 1.2 × 10-9) for one or more cytokines. Fifteen of the associated variants had expression quantitative trait loci in whole blood. We provide genetic instruments to clarify the causal roles of cytokine signaling and upstream inflammation in immune-related and other chronic diseases. We further link inflammatory markers with variants previously associated with autoimmune diseases such as Crohn disease, multiple sclerosis, and ulcerative colitis and hereby elucidate the molecular mechanisms underpinning these diseases and suggest potential drug targets.
DOI
10.1016/j.ajhg.2016.11.007

AIDA

Summary statistics
PUBMED_LINK
40112801
DESCRIPTION
Asian Immune Diversity Atlas
URL
https://cellxgene.cziscience.com/collections/ced320a1-29f3-47c1-a735-513c7084d508
TITLE
Asian diversity in human immune cells.
Main citation
Kock KH, Tan LM, Han KY, Ando Y, ...&, Prabhakar S. (2025) Asian diversity in human immune cells. Cell, 188 (8) 2288-2306.e24. doi:10.1016/j.cell.2025.02.017. PMID 40112801
ABSTRACT
The relationships of human diversity with biomedical phenotypes are pervasive yet remain understudied, particularly in a single-cell genomics context. Here, we present the Asian Immune Diversity Atlas (AIDA), a multi-national single-cell RNA sequencing (scRNA-seq) healthy reference atlas of human immune cells. AIDA comprises 1,265,624 circulating immune cells from 619 donors, spanning 7 population groups across 5 Asian countries, and 6 controls. Though population groups are frequently compared at the continental level, we found that sub-continental diversity, age, and sex pervasively impacted cellular and molecular properties of immune cells. These included differential abundance of cell neighborhoods as well as cell populations and genes relevant to disease risk, pathogenesis, and diagnostics. We discovered functional genetic variants influencing cell-type-specific gene expression, which were under-represented in non-Asian populations, and helped contextualize disease-associated variants. AIDA enables analyses of multi-ancestry disease datasets and facilitates the development of precision medicine efforts in Asia and beyond.
DOI
10.1016/j.cell.2025.02.017

Bian

Summary statistics
PUBMED_LINK
40112817
DESCRIPTION
scGaTE
URL
http://ccra.njmu.edu.cn/scgate/
TITLE
Single-cell eQTL mapping reveals cell-type-specific genes associated with the risk of gastric cancer.
Main citation
Bian L, Hu B, Li F, Gu Y, ...&, Jin G. (2025) Single-cell eQTL mapping reveals cell-type-specific genes associated with the risk of gastric cancer. Cell Genom, 5 (4) 100812. doi:10.1016/j.xgen.2025.100812. PMID 40112817
ABSTRACT
Most expression quantitative trait locus (eQTL) analyses have been conducted in heterogeneous gastric tissues, limiting understanding of cell-type-specific regulatory mechanisms. Here, we employed a pooled multiplexing strategy to profile 399,683 gastric cells from 203 Chinese individuals using single-cell RNA sequencing (scRNA-seq). We identified 19 distinct gastric cell types and performed eQTL analyses, uncovering 8,498 independent eQTLs, with a considerable fraction (81%, 6,909/8,498) exhibiting cell-type-specific effects. Integration of these eQTLs with genome-wide association studies for gastric cancer (GC) revealed four co-localization signals in specific cell types. Genetically predicted cell-type-specific gene expression identified 15 genes associated with GC risk, including the upregulation of MUC1 exclusively in parietal cells, linked to decreased GC risk. Our findings highlight substantial heterogeneity in the genetic regulation of gene expression across gastric cell types and provide critical cell-type-specific annotations of genetic variants associated with GC risk, offering new molecular insights underlying GC.
DOI
10.1016/j.xgen.2025.100812

Biobank Japan (BBJ) JENGER

Summary statistics
PUBMED_LINK
39363016
DESCRIPTION
Biobank Japan GWAS summary statistics via the JENGER browser (RIKEN).
URL
http://jenger.riken.jp/result
TITLE
Population-specific putative causal variants shape quantitative traits.
Main citation
Koyama S, Liu X, Koike Y, Hikino K, ...&, Terao C. (2024) Population-specific putative causal variants shape quantitative traits. Nat Genet, 56 (10) 2027-2035. doi:10.1038/s41588-024-01913-5. PMID 39363016
ABSTRACT
Human genetic variants are associated with many traits through largely unknown mechanisms. Here, combining approximately 260,000 Japanese study participants, a Japanese-specific genotype reference panel and statistical fine-mapping, we identified 4,423 significant loci across 63 quantitative traits, among which 601 were new, and 9,406 putatively causal variants. New associations included Japanese-specific coding, splicing and noncoding variants, exemplified by a damaging missense variant rs730881101 in TNNT2 associated with lower heart function and increased risk for heart failure (P = 1.4 × 10-15 and odds ratio = 4.5, 95% confidence interval = 3.1-6.5). Putative causal noncoding variants were supported by state-of-art in silico functional assays and had comparable effect sizes to coding variants. A plausible example of new mechanisms of causal variants is an enrichment of causal variants in 3' untranslated regions (UTRs), including the Japanese-specific rs13306436 in IL6 associated with pro-inflammatory traits and protection against tuberculosis. We experimentally showed that transcripts with rs13306436 are resistant to mRNA degradation by regnase-1, an RNA-binding protein. Our study provides a list of fine-mapped causal variants to be tested for functionality and underscores the importance of sequencing, genotyping and association efforts in diverse populations.
DOI
10.1038/s41588-024-01913-5
RELATED_BIOBANK
BioBank Japan
MAIN ANCESTRY
EAS

Biobank Japan (BBJ) Phewebjp

Summary statistics
PUBMED_LINK
39363016
DESCRIPTION
Japan-wide PheWeb instance for Biobank Japan GWAS summary statistics.
URL
https://pheweb.jp/
TITLE
Population-specific putative causal variants shape quantitative traits.
Main citation
Koyama S, Liu X, Koike Y, Hikino K, ...&, Terao C. (2024) Population-specific putative causal variants shape quantitative traits. Nat Genet, 56 (10) 2027-2035. doi:10.1038/s41588-024-01913-5. PMID 39363016
ABSTRACT
Human genetic variants are associated with many traits through largely unknown mechanisms. Here, combining approximately 260,000 Japanese study participants, a Japanese-specific genotype reference panel and statistical fine-mapping, we identified 4,423 significant loci across 63 quantitative traits, among which 601 were new, and 9,406 putatively causal variants. New associations included Japanese-specific coding, splicing and noncoding variants, exemplified by a damaging missense variant rs730881101 in TNNT2 associated with lower heart function and increased risk for heart failure (P = 1.4 × 10-15 and odds ratio = 4.5, 95% confidence interval = 3.1-6.5). Putative causal noncoding variants were supported by state-of-art in silico functional assays and had comparable effect sizes to coding variants. A plausible example of new mechanisms of causal variants is an enrichment of causal variants in 3' untranslated regions (UTRs), including the Japanese-specific rs13306436 in IL6 associated with pro-inflammatory traits and protection against tuberculosis. We experimentally showed that transcripts with rs13306436 are resistant to mRNA degradation by regnase-1, an RNA-binding protein. Our study provides a list of fine-mapped causal variants to be tested for functionality and underscores the importance of sequencing, genotyping and association efforts in diverse populations.
DOI
10.1038/s41588-024-01913-5
RELATED_BIOBANK
BioBank Japan
MAIN ANCESTRY
EAS

Biobank Russia

Summary statistics
PUBMED_LINK
39043636
DESCRIPTION
GWAS summary statistics from the Russian biobank resource (complex traits in Russian populations).
URL
https://biobank.almazovcentre.ru/#
TITLE
Complex trait susceptibilities and population diversity in a sample of 4,145 Russians.
Main citation
Usoltsev D, Kolosov N, Rotar O, Loboda A, ...&, Artomov M. (2024) Complex trait susceptibilities and population diversity in a sample of 4,145 Russians. Nat Commun, 15 (1) 6212. doi:10.1038/s41467-024-50304-1. PMID 39043636
ABSTRACT
The population of Russia consists of more than 150 local ethnicities. The ethnic diversity and geographic origins, which extend from eastern Europe to Asia, make the population uniquely positioned to investigate the shared properties of inherited disease risks between European and Asian ancestries. We present the analysis of genetic and phenotypic data from a cohort of 4,145 individuals collected in three metro areas in western Russia. We show the presence of multiple admixed genetic ancestry clusters spanning from primarily European to Asian and high identity-by-descent sharing with the Finnish population. As a result, there was notable enrichment of Finnish-specific variants in Russia. We illustrate the utility of Russian-descent cohorts for discovery of novel population-specific genetic associations, as well as replication of previously identified associations that were thought to be population-specific in other cohorts. Finally, we provide access to a database of allele frequencies and GWAS results for 464 phenotypes.
DOI
10.1038/s41467-024-50304-1
MAIN ANCESTRY
EUR

Bretherick AD, et al-32628676

Summary statistics
PUBMED_LINK
32628676
TITLE
Linking protein to phenotype with Mendelian Randomization detects 38 proteins with causal roles in human diseases and traits.
Main citation
Bretherick AD, Canela-Xandri O, Joshi PK, Clark DW, ...&, Haley C. (2020) Linking protein to phenotype with Mendelian Randomization detects 38 proteins with causal roles in human diseases and traits. PLoS Genet, 16 (7) e1008785. doi:10.1371/journal.pgen.1008785. PMID 32628676
ABSTRACT
To efficiently transform genetic associations into drug targets requires evidence that a particular gene, and its encoded protein, contribute causally to a disease. To achieve this, we employ a three-step proteome-by-phenome Mendelian Randomization (MR) approach. In step one, 154 protein quantitative trait loci (pQTLs) were identified and independently replicated. From these pQTLs, 64 replicated locally-acting variants were used as instrumental variables for proteome-by-phenome MR across 846 traits (step two). When its assumptions are met, proteome-by-phenome MR, is equivalent to simultaneously running many randomized controlled trials. Step 2 yielded 38 proteins that significantly predicted variation in traits and diseases in 509 instances. Step 3 revealed that amongst the 271 instances from GeneAtlas (UK Biobank), 77 showed little evidence of pleiotropy (HEIDI), and 92 evidence of colocalization (eCAVIAR). Results were wide ranging: including, for example, new evidence for a causal role of tyrosine-protein phosphatase non-receptor type substrate 1 (SHPS1; SIRPA) in schizophrenia, and a new finding that intestinal fatty acid binding protein (FABP2) abundance contributes to the pathogenesis of cardiovascular disease. We also demonstrated confirmatory evidence for the causal role of four further proteins (FGF5, IL6R, LPL, LTA) in cardiovascular disease risk.
DOI
10.1371/journal.pgen.1008785

Carayol J, et al-29234017

Summary statistics
PUBMED_LINK
29234017
TITLE
Protein quantitative trait locus study in obesity during weight-loss identifies a leptin regulator.
Main citation
Carayol J, Chabert C, Di Cara A, Armenise C, ...&, Hager J. (2017) Protein quantitative trait locus study in obesity during weight-loss identifies a leptin regulator. Nat Commun, 8 (1) 2084. doi:10.1038/s41467-017-02182-z. PMID 29234017
ABSTRACT
Thousands of genetic variants have been associated with complex traits through genome-wide association studies. However, the functional variants or mechanistic consequences remain elusive. Intermediate traits such as gene expression or protein levels are good proxies of the metabolic state of an organism. Proteome analysis especially can provide new insights into the molecular mechanisms of complex traits like obesity. The role of genetic variation in determining protein level variation has not been assessed in obesity. To address this, we design a large-scale protein quantitative trait locus (pQTL) analysis based on a set of 1129 proteins from 494 obese subjects before and after a weight loss intervention. This reveals 55 BMI-associated cis-pQTLs and trans-pQTLs at baseline and 3 trans-pQTLs after the intervention. We provide evidence for distinct genetic mechanisms regulating BMI-associated proteins before and after weight loss. Finally, by functional analysis, we identify and validate FAM46A as a trans regulator for leptin.
DOI
10.1038/s41467-017-02182-z

Cardiovascular Disease Knowledge Portal

Summary statistics
PUBMED_LINK
37814896
DESCRIPTION
Broad/HMS cardiovascular disease knowledge portal with GWAS, gene, and variant views across CV traits.
URL
https://cvd.hugeamp.org/
TITLE
Cardiovascular Disease Knowledge Portal: A Community Resource for Cardiovascular Disease Research.
Main citation
Costanzo MC, Roselli C, Brandes M, Duby M, ...&, Burtt NP. (2023) Cardiovascular Disease Knowledge Portal: A Community Resource for Cardiovascular Disease Research. Circ Genom Precis Med, 16 (6) e004181. doi:10.1161/CIRCGEN.123.004181. PMID 37814896
DOI
10.1161/CIRCGEN.123.004181
MAIN ANCESTRY
Multi-ancestry

Carland C, et al-37550624

Summary statistics
PUBMED_LINK
37550624
TITLE
Proteomic analysis of 92 circulating proteins and their effects in cardiometabolic diseases.
Main citation
Carland C, Png G, Malarstig A, Kho PF, ...&, Assimes T. (2023) Proteomic analysis of 92 circulating proteins and their effects in cardiometabolic diseases. Clin Proteomics, 20 (1) 31. doi:10.1186/s12014-023-09421-0. PMID 37550624
ABSTRACT
BACKGROUND: Human plasma contains a wide variety of circulating proteins. These proteins can be important clinical biomarkers in disease and also possible drug targets. Large scale genomics studies of circulating proteins can identify genetic variants that lead to relative protein abundance. METHODS: We conducted a meta-analysis on genome-wide association studies of autosomal chromosomes in 22,997 individuals of primarily European ancestry across 12 cohorts to identify protein quantitative trait loci (pQTL) for 92 cardiometabolic associated plasma proteins. RESULTS: We identified 503 (337 cis and 166 trans) conditionally independent pQTLs, including several novel variants not reported in the literature. We conducted a sex-stratified analysis and found that 118 (23.5%) of pQTLs demonstrated heterogeneity between sexes. The direction of effect was preserved but there were differences in effect size and significance. Additionally, we annotate trans-pQTLs with nearest genes and report plausible biological relationships. Using Mendelian randomization, we identified causal associations for 18 proteins across 19 phenotypes, of which 10 have additional genetic colocalization evidence. We highlight proteins associated with a constellation of cardiometabolic traits including angiopoietin-related protein 7 (ANGPTL7) and Semaphorin 3F (SEMA3F). CONCLUSION: Through large-scale analysis of protein quantitative trait loci, we provide a comprehensive overview of common variants associated with plasma proteins. We highlight possible biological relationships which may serve as a basis for further investigation into possible causal roles in cardiometabolic diseases.
DOI
10.1186/s12014-023-09421-0
MAIN ANCESTRY
EUR

Caron B, et al-35264221

Summary statistics
PUBMED_LINK
35264221
TITLE
Integrative genetic and immune cell analysis of plasma proteins in healthy donors identifies novel associations involving primary immune deficiency genes.
Main citation
Caron B, Patin E, Rotival M, Charbit B, ...&, Milieu Intérieur Consortium. (2022) Integrative genetic and immune cell analysis of plasma proteins in healthy donors identifies novel associations involving primary immune deficiency genes. Genome Med, 14 (1) 28. doi:10.1186/s13073-022-01032-y. PMID 35264221
ABSTRACT
BACKGROUND: Blood plasma proteins play an important role in immune defense against pathogens, including cytokine signaling, the complement system, and the acute-phase response. Recent large-scale studies have reported genetic (i.e., protein quantitative trait loci, pQTLs) and non-genetic factors, such as age and sex, as major determinants to inter-individual variability in immune response variation. However, the contribution of blood-cell composition to plasma protein heterogeneity has not been fully characterized and may act as a mediating factor in association studies. METHODS: Here, we evaluated plasma protein levels from 400 unrelated healthy individuals of western European ancestry, who were stratified by sex and two decades of life (20-29 and 60-69 years), from the Milieu Intérieur cohort. We quantified 229 proteins by Luminex in a clinically certified laboratory and their levels of variation were analyzed together with 5.2 million single-nucleotide polymorphisms. With respect to non-genetic variables, we included 254 lifestyle and biochemical factors, as well as counts of seven circulating immune cell populations measured by hemogram and standardized flow cytometry. RESULTS: Collectively, we found 152 significant associations involving 49 proteins and 20 non-genetic variables. Consistent with previous studies, age and sex showed a global, pervasive impact on plasma protein heterogeneity, while body mass index and other health status variables were among the non-genetic factors with the highest number of associations. After controlling for these covariates, we identified 100 and 12 pQTLs acting in cis and trans, respectively, collectively associated with 87 plasma proteins and including 19 novel genetic associations. Genetic factors explained the largest fraction of the variability of plasma protein levels, as compared to non-genetic factors. In addition, blood-cell fractions, including leukocytes, lymphocytes, monocytes, neutrophils, eosinophils, basophils, and platelets, had a larger contribution to inter-individual variability than age and sex and appeared as confounders of specific genetic associations. Finally, we identified new genetic associations with plasma protein levels of five monogenic Mendelian disease genes including two primary immunodeficiency genes (Ficolin-3 and FAS). CONCLUSIONS: Our study identified novel genetic and non-genetic factors associated to plasma protein levels which may inform health status and disease management.
DOI
10.1186/s13073-022-01032-y

Cheng C, et al-39837327

Summary statistics
PUBMED_LINK
39837327
DESCRIPTION
Serum metabolome GWAS in Han Chinese; portal lists browse/download for 2,854 serum metabolites in 3,795 individuals on the Westlake Chinese Multi-omics GWAS Catalog (see publication for cohort and analysis details).
URL
https://omics.lab.westlake.edu.cn/data/metabolites/phenotypes ,https://omics.lab.westlake.edu.cn/collect.html
TITLE
Genetic mapping of serum metabolome to chronic diseases among Han Chinese.
Main citation
Cheng C, Xu F, Pan XF, Wang C, ...&, Zheng JS. (2025) Genetic mapping of serum metabolome to chronic diseases among Han Chinese. Cell Genom, 5 (2) 100743. doi:10.1016/j.xgen.2024.100743. PMID 39837327
ABSTRACT
Serum metabolites are potential regulators for chronic diseases. To explore the genetic regulation of metabolites and their roles in chronic diseases, we quantified 2,759 serum metabolites and performed genome-wide association studies (GWASs) among Han Chinese individuals. We identified 184 study-wide significant (p < 1.81 × 10-11) metabolite quantitative trait loci (metaboQTLs), 88.59% (163) of which were novel. Notably, we identified Asian-ancestry-specific metaboQTLs, including the SNP rs2296651 for taurocholic acid and taurochenodesoxycholic acid. Leveraging the GWAS for 37 clinical traits from East Asians, Mendelian randomization analyses identified 906 potential causal relationships between metabolites and clinical traits, including 27 for type 2 diabetes and 38 for coronary artery disease. Integrating genetic regulation of the transcriptome and proteome revealed putative regulators of several metabolites. In summary, we depict a landscape of the genetic architecture of the serum metabolome among Han Chinese and provide insights into the role of serum metabolites in chronic diseases.
DOI
10.1016/j.xgen.2024.100743
MAIN ANCESTRY
EAS

China Kadoorie Biobank (CKB)

Summary statistics
PUBMED_LINK
22158673
DESCRIPTION
PheWeb-style GWAS summary statistics for the China Kadoorie Biobank.
URL
https://pheweb.ckbiobank.org/
TITLE
China Kadoorie Biobank of 0.5 million people: survey methods, baseline characteristics and long-term follow-up.
Main citation
Chen Z, Chen J, Collins R, Guo Y, ...&, China Kadoorie Biobank (CKB) collaborative group. (2011) China Kadoorie Biobank of 0.5 million people: survey methods, baseline characteristics and long-term follow-up. Int J Epidemiol, 40 (6) 1652-66. doi:10.1093/ije/dyr120. PMID 22158673
ABSTRACT
BACKGROUND: Large blood-based prospective studies can provide reliable assessment of the complex interplay of lifestyle, environmental and genetic factors as determinants of chronic disease. METHODS: The baseline survey of the China Kadoorie Biobank took place during 2004-08 in 10 geographically defined regions, with collection of questionnaire data, physical measurements and blood samples. Subsequently, a re-survey of 25,000 randomly selected participants was done (80% responded) using the same methods as in the baseline. All participants are being followed for cause-specific mortality and morbidity, and for any hospital admission through linkages with registries and health insurance (HI) databases. RESULTS: Overall, 512,891 adults aged 30-79 years were recruited, including 41% men, 56% from rural areas and mean age was 52 years. The prevalence of ever-regular smoking was 74% in men and 3% in women. The mean blood pressure was 132/79 mmHg in men and 130/77 mmHg in women. The mean body mass index (BMI) was 23.4 kg/m(2) in men and 23.8 kg/m(2) in women, with only 4% being obese (>30 kg/m(2)), and 3.2% being diabetic. Blood collection was successful in 99.98% and the mean delay from sample collection to processing was 10.6 h. For each of the main baseline variables, there is good reproducibility but large heterogeneity by age, sex and study area. By 1 January 2011, over 10,000 deaths had been recorded, with 91% of surviving participants already linked to HI databases. CONCLUSION: This established large biobank will be a rich and powerful resource for investigating genetic and non-genetic causes of many common chronic diseases in the Chinese population.
DOI
10.1093/ije/dyr120
RELATED_BIOBANK
China Kadoorie Biobank
MAIN ANCESTRY
EAS

CIMA

Summary statistics
PUBMED_LINK
41505528
DESCRIPTION
Chinese Immune Multi-Omics Atlas
URL
https://db.cngb.org/trueblood/cima
TITLE
Chinese Immune Multi-Omics Atlas.
Main citation
Yin J, Zheng Y, Huang Z, Zhou W, ...&, Liu C. (2026) Chinese Immune Multi-Omics Atlas. Science, 391 (6781) eadt3130. doi:10.1126/science.adt3130. PMID 41505528
ABSTRACT
Human peripheral blood exhibits molecular and cellular heterogeneity across populations, yet the underlying mechanisms remain unclear. We present the Chinese Immune Multi-Omics Atlas (CIMA), characterizing molecular variations linked to sex, age, and genetic variants through multi-omics analysis of more than 10 million circulating immune cells from 428 Chinese adults. CIMA established an enhancer-driven gene regulatory network comprising 237 robust regulons; identified 9600 eGenes and 52,361 caPeaks at cell type resolution; and revealed pleiotropic associations among immune-related disease risk loci, cis-expression quantitative trait loci (QTLs), and chromatin accessibility QTLs. Furthermore, the cell language model CIMA-CLM predicted chromatin accessibility and evaluated the effects of noncoding variants from chromatin sequences and gene expression. CIMA provides a comprehensive reference for immune-related disease research.
DOI
10.1126/science.adt3130

CIMA

Summary statistics
PUBMED_LINK
41505528
DESCRIPTION
Chinese Immune Multi-Omics Atlas
TITLE
Chinese Immune Multi-Omics Atlas.
Main citation
Yin J, Zheng Y, Huang Z, Zhou W, ...&, Liu C. (2026) Chinese Immune Multi-Omics Atlas. Science, 391 (6781) eadt3130. doi:10.1126/science.adt3130. PMID 41505528
ABSTRACT
Human peripheral blood exhibits molecular and cellular heterogeneity across populations, yet the underlying mechanisms remain unclear. We present the Chinese Immune Multi-Omics Atlas (CIMA), characterizing molecular variations linked to sex, age, and genetic variants through multi-omics analysis of more than 10 million circulating immune cells from 428 Chinese adults. CIMA established an enhancer-driven gene regulatory network comprising 237 robust regulons; identified 9600 eGenes and 52,361 caPeaks at cell type resolution; and revealed pleiotropic associations among immune-related disease risk loci, cis-expression quantitative trait loci (QTLs), and chromatin accessibility QTLs. Furthermore, the cell language model CIMA-CLM predicted chromatin accessibility and evaluated the effects of noncoding variants from chromatin sequences and gene expression. CIMA provides a comprehensive reference for immune-related disease research.
DOI
10.1126/science.adt3130

DeepFlow/Gomes B-38082205

Summary statistics
PUBMED_LINK
38082205
TITLE
Genetic architecture of cardiac dynamic flow volumes.
Main citation
Gomes B, Singh A, O'Sullivan JW, Schnurr TM, ...&, Ashley EA. (2024) Genetic architecture of cardiac dynamic flow volumes. Nat Genet, 56 (2) 245-257. doi:10.1038/s41588-023-01587-5. PMID 38082205
ABSTRACT
Cardiac blood flow is a critical determinant of human health. However, the definition of its genetic architecture is limited by the technical challenge of capturing dynamic flow volumes from cardiac imaging at scale. We present DeepFlow, a deep-learning system to extract cardiac flow and volumes from phase-contrast cardiac magnetic resonance imaging. A mixed-linear model applied to 37,653 individuals from the UK Biobank reveals genome-wide significant associations across cardiac dynamic flow volumes spanning from aortic forward velocity to aortic regurgitation fraction. Mendelian randomization reveals a causal role for aortic root size in aortic valve regurgitation. Among the most significant contributing variants, localizing genes (near ELN, PRDM6 and ADAMTS7) are implicated in connective tissue and blood pressure pathways. Here we show that DeepFlow cardiac flow phenotyping at scale, combined with genotyping data, reinforces the contribution of connective tissue genes, blood pressure and root size to aortic valve function.
DOI
10.1038/s41588-023-01587-5
MAIN ANCESTRY
EUR

Deming Y, et al-28247064

Summary statistics
PUBMED_LINK
28247064
TITLE
Genome-wide association study identifies four novel loci associated with Alzheimer's endophenotypes and disease modifiers.
Main citation
Deming Y, Li Z, Kapoor M, Harari O, ...&, Cruchaga C. (2017) Genome-wide association study identifies four novel loci associated with Alzheimer's endophenotypes and disease modifiers. Acta Neuropathol, 133 (5) 839-856. doi:10.1007/s00401-017-1685-y. PMID 28247064
ABSTRACT
More than 20 genetic loci have been associated with risk for Alzheimer's disease (AD), but reported genome-wide significant loci do not account for all the estimated heritability and provide little information about underlying biological mechanisms. Genetic studies using intermediate quantitative traits such as biomarkers, or endophenotypes, benefit from increased statistical power to identify variants that may not pass the stringent multiple test correction in case-control studies. Endophenotypes also contain additional information helpful for identifying variants and genes associated with other aspects of disease, such as rate of progression or onset, and provide context to interpret the results from genome-wide association studies (GWAS). We conducted GWAS of amyloid beta (Aβ42), tau, and phosphorylated tau (ptau181) levels in cerebrospinal fluid (CSF) from 3146 participants across nine studies to identify novel variants associated with AD. Five genome-wide significant loci (two novel) were associated with ptau181, including loci that have also been associated with AD risk or brain-related phenotypes. Two novel loci associated with Aβ42 near GLIS1 on 1p32.3 (β = -0.059, P = 2.08 × 10-8) and within SERPINB1 on 6p25 (β = -0.025, P = 1.72 × 10-8) were also associated with AD risk (GLIS1: OR = 1.105, P = 3.43 × 10-2), disease progression (GLIS1: β = 0.277, P = 1.92 × 10-2), and age at onset (SERPINB1: β = 0.043, P = 4.62 × 10-3). Bioinformatics indicate that the intronic SERPINB1 variant (rs316341) affects expression of SERPINB1 in various tissues, including the hippocampus, suggesting that SERPINB1 influences AD through an Aβ-associated mechanism. Analyses of known AD risk loci suggest CLU and FERMT2 may influence CSF Aβ42 (P = 0.001 and P = 0.009, respectively) and the INPP5D locus may affect ptau181 levels (P = 0.009); larger studies are necessary to verify these results. Together the findings from this study can be used to inform future AD studies.
DOI
10.1007/s00401-017-1685-y

Deng YT, et al-39579765

Summary statistics
PUBMED_LINK
39579765
URL
https://proteome-phenome-atlas.com/
TITLE
Atlas of the plasma proteome in health and disease in 53,026 adults.
Main citation
Deng YT, You J, He Y, Zhang Y, ...&, Yu JT. (2025) Atlas of the plasma proteome in health and disease in 53,026 adults. Cell, 188 (1) 253-271.e7. doi:10.1016/j.cell.2024.10.045. PMID 39579765
ABSTRACT
Large-scale proteomics studies can refine our understanding of health and disease and enable precision medicine. Here, we provide a detailed atlas of 2,920 plasma proteins linking to diseases (406 prevalent and 660 incident) and 986 health-related traits in 53,026 individuals (median follow-up: 14.8 years) from the UK Biobank, representing the most comprehensive proteome profiles to date. This atlas revealed 168,100 protein-disease associations and 554,488 protein-trait associations. Over 650 proteins were shared among at least 50 diseases, and over 1,000 showed sex and age heterogeneity. Furthermore, proteins demonstrated promising potential in disease discrimination (area under the curve [AUC] > 0.80 in 183 diseases). Finally, integrating protein quantitative trait locus data determined 474 causal proteins, providing 37 drug-repurposing opportunities and 26 promising targets with favorable safety profiles. These results provide an open-access comprehensive proteome-phenome resource (https://proteome-phenome-atlas.com/) to help elucidate the biological mechanisms of diseases and accelerate the development of disease biomarkers, prediction models, and therapeutic targets.
DOI
10.1016/j.cell.2024.10.045
MAIN ANCESTRY
EAS

Dhindsa RS, et al-37794183

Summary statistics
PUBMED_LINK
37794183
TITLE
Rare variant associations with plasma protein levels in the UK Biobank.
Main citation
Dhindsa RS, Burren OS, Sun BB, Prins BP, ...&, Petrovski S. (2023) Rare variant associations with plasma protein levels in the UK Biobank. Nature, 622 (7982) 339-347. doi:10.1038/s41586-023-06547-x. PMID 37794183
ABSTRACT
Integrating human genomics and proteomics can help elucidate disease mechanisms, identify clinical biomarkers and discover drug targets1-4. Because previous proteogenomic studies have focused on common variation via genome-wide association studies, the contribution of rare variants to the plasma proteome remains largely unknown. Here we identify associations between rare protein-coding variants and 2,923 plasma protein abundances measured in 49,736 UK Biobank individuals. Our variant-level exome-wide association study identified 5,433 rare genotype-protein associations, of which 81% were undetected in a previous genome-wide association study of the same cohort5. We then looked at aggregate signals using gene-level collapsing analysis, which revealed 1,962 gene-protein associations. Of the 691 gene-level signals from protein-truncating variants, 99.4% were associated with decreased protein levels. STAB1 and STAB2, encoding scavenger receptors involved in plasma protein clearance, emerged as pleiotropic loci, with 77 and 41 protein associations, respectively. We demonstrate the utility of our publicly accessible resource through several applications. These include detailing an allelic series in NLRC4, identifying potential biomarkers for a fatty liver disease-associated variant in HSD17B13 and bolstering phenome-wide association studies by integrating protein quantitative trait loci with protein-truncating variants in collapsing analyses. Finally, we uncover distinct proteomic consequences of clonal haematopoiesis (CH), including an association between TET2-CH and increased FLT3 levels. Our results highlight a considerable role for rare variation in plasma protein abundance and the value of proteogenomics in therapeutic discovery.
DOI
10.1038/s41586-023-06547-x
RELATED_BIOBANK
UK Biobank
MAIN ANCESTRY
EUR

DIAGRAM

Summary statistics
PUBMED_LINK
22885922
DESCRIPTION
Type 2 diabetes GWAS meta-analysis summary statistics from the DIAGRAM consortium.
URL
https://www.diagram-consortium.org/downloads.html
TITLE
Large-scale association analysis provides insights into the genetic architecture and pathophysiology of type 2 diabetes.
Main citation
Morris AP, Voight BF, Teslovich TM, Ferreira T, ...&, DIAbetes Genetics Replication And Meta-analysis (DIAGRAM) Consortium. (2012) Large-scale association analysis provides insights into the genetic architecture and pathophysiology of type 2 diabetes. Nat Genet, 44 (9) 981-90. doi:10.1038/ng.2383. PMID 22885922
ABSTRACT
To extend understanding of the genetic architecture and molecular basis of type 2 diabetes (T2D), we conducted a meta-analysis of genetic variants on the Metabochip, including 34,840 cases and 114,981 controls, overwhelmingly of European descent. We identified ten previously unreported T2D susceptibility loci, including two showing sex-differentiated association. Genome-wide analyses of these data are consistent with a long tail of additional common variant loci explaining much of the variation in susceptibility to T2D. Exploration of the enlarged set of susceptibility loci implicates several processes, including CREBBP-related transcription, adipocytokine signaling and cell cycle regulation, in diabetes pathogenesis.
DOI
10.1038/ng.2383
MAIN ANCESTRY
EUR

eGTEx

Summary statistics
PUBMED_LINK
36510025
DESCRIPTION
Enhanceing GTEx
URL
https://gtexportal.org/home/downloads/egtex/methylation
TITLE
DNA methylation QTL mapping across diverse human tissues provides molecular links between genetic variation and complex traits.
Main citation
Oliva M, Demanelis K, Lu Y, Chernoff M, ...&, Pierce BL. (2023) DNA methylation QTL mapping across diverse human tissues provides molecular links between genetic variation and complex traits. Nat Genet, 55 (1) 112-122. doi:10.1038/s41588-022-01248-z. PMID 36510025
ABSTRACT
Studies of DNA methylation (DNAm) in solid human tissues are relatively scarce; tissue-specific characterization of DNAm is needed to understand its role in gene regulation and its relevance to complex traits. We generated array-based DNAm profiles for 987 human samples from the Genotype-Tissue Expression (GTEx) project, representing 9 tissue types and 424 subjects. We characterized methylome and transcriptome correlations (eQTMs), genetic regulation in cis (mQTLs and eQTLs) across tissues and e/mQTLs links to complex traits. We identified mQTLs for 286,152 CpG sites, many of which (>5%) show tissue specificity, and mQTL colocalizations with 2,254 distinct GWAS hits across 83 traits. For 91% of these loci, a candidate gene link was identified by integration of functional maps, including eQTMs, and/or eQTL colocalization, but only 33% of loci involved an eQTL and mQTL present in the same tissue type. With this DNAm-focused integrative analysis, we contribute to the understanding of molecular regulatory mechanisms in human tissues and their impact on complex traits.
DOI
10.1038/s41588-022-01248-z

Eldjarn GH, et al-37794188

Summary statistics
PUBMED_LINK
37794188
TITLE
Large-scale plasma proteomics comparisons through genetics and disease associations.
Main citation
Eldjarn GH, Ferkingstad E, Lund SH, Helgason H, ...&, Stefansson K. (2023) Large-scale plasma proteomics comparisons through genetics and disease associations. Nature, 622 (7982) 348-358. doi:10.1038/s41586-023-06563-x. PMID 37794188
ABSTRACT
High-throughput proteomics platforms measuring thousands of proteins in plasma combined with genomic and phenotypic information have the power to bridge the gap between the genome and diseases. Here we performed association studies of Olink Explore 3072 data generated by the UK Biobank Pharma Proteomics Project1 on plasma samples from more than 50,000 UK Biobank participants with phenotypic and genotypic data, stratifying on British or Irish, African and South Asian ancestries. We compared the results with those of a SomaScan v4 study on plasma from 36,000 Icelandic people2, for 1,514 of whom Olink data were also available. We found modest correlation between the two platforms. Although cis protein quantitative trait loci were detected for a similar absolute number of assays on the two platforms (2,101 on Olink versus 2,120 on SomaScan), the proportion of assays with such supporting evidence for assay performance was higher on the Olink platform (72% versus 43%). A considerable number of proteins had genomic associations that differed between the platforms. We provide examples where differences between platforms may influence conclusions drawn from the integration of protein levels with the study of diseases. We demonstrate how leveraging the diverse ancestries of participants in the UK Biobank helps to detect novel associations and refine genomic location. Our results show the value of the information provided by the two most commonly used high-throughput proteomics platforms and demonstrate the differences between them that at times provides useful complementarity.
DOI
10.1038/s41586-023-06563-x
RELATED_BIOBANK
UK Biobank
MAIN ANCESTRY
EUR

Elliott LT-30305740

Summary statistics
PUBMED_LINK
30305740
TITLE
Genome-wide association studies of brain imaging phenotypes in UK Biobank.
Main citation
Elliott LT, Sharp K, Alfaro-Almagro F, Shi S, ...&, Smith SM. (2018) Genome-wide association studies of brain imaging phenotypes in UK Biobank. Nature, 562 (7726) 210-216. doi:10.1038/s41586-018-0571-7. PMID 30305740
ABSTRACT
The genetic architecture of brain structure and function is largely unknown. To investigate this, we carried out genome-wide association studies of 3,144 functional and structural brain imaging phenotypes from UK Biobank (discovery dataset 8,428 subjects). Here we show that many of these phenotypes are heritable. We identify 148 clusters of associations between single nucleotide polymorphisms and imaging phenotypes that replicate at P < 0.05, when we would expect 21 to replicate by chance. Notable significant, interpretable associations include: iron transport and storage genes, related to magnetic susceptibility of subcortical brain tissue; extracellular matrix and epidermal growth factor genes, associated with white matter micro-structure and lesions; genes that regulate mid-line axon development, associated with organization of the pontine crossing tract; and overall 17 genes involved in development, pathway signalling and plasticity. Our results provide insights into the genetic architecture of the brain that are relevant to neurological and psychiatric disorders, brain development and ageing.
DOI
10.1038/s41586-018-0571-7
MAIN ANCESTRY
EUR

Emilsson V, et al-30072576

Summary statistics
PUBMED_LINK
30072576
TITLE
Co-regulatory networks of human serum proteins link genetics to disease.
Main citation
Emilsson V, Ilkov M, Lamb JR, Finkel N, ...&, Gudnason V. (2018) Co-regulatory networks of human serum proteins link genetics to disease. Science, 361 (6404) 769-773. doi:10.1126/science.aaq1327. PMID 30072576
ABSTRACT
Proteins circulating in the blood are critical for age-related disease processes; however, the serum proteome has remained largely unexplored. To this end, 4137 proteins covering most predicted extracellular proteins were measured in the serum of 5457 Icelanders over 65 years of age. Pairwise correlation between proteins as they varied across individuals revealed 27 different network modules of serum proteins, many of which were associated with cardiovascular and metabolic disease states, as well as overall survival. The protein modules were controlled by cis- and trans-acting genetic variants, which in many cases were also associated with complex disease. This revealed co-regulated groups of circulating proteins that incorporated regulatory control between tissues and demonstrated close relationships to past, current, and future disease states.
DOI
10.1126/science.aaq1327

Enroth S, et al-25147954

Summary statistics
PUBMED_LINK
25147954
TITLE
Strong effects of genetic and lifestyle factors on biomarker variation and use of personalized cutoffs.
Main citation
Enroth S, Johansson A, Enroth SB, Gyllensten U. (2014) Strong effects of genetic and lifestyle factors on biomarker variation and use of personalized cutoffs. Nat Commun, 5 () 4684. doi:10.1038/ncomms5684. PMID 25147954
ABSTRACT
Ideal biomarkers used for disease diagnosis should display deviating levels in affected individuals only and be robust to factors unrelated to the disease. Here we show the impact of genetic, clinical and lifestyle factors on circulating levels of 92 protein biomarkers for cancer and inflammation, using a population-based cohort of 1,005 individuals. For 75% of the biomarkers, the levels are significantly heritable and genome-wide association studies identifies 16 novel loci and replicate 2 previously known loci with strong effects on one or several of the biomarkers with P-values down to 4.4 × 10(-58). Integrative analysis attributes as much as 56.3% of the observed variance to non-disease factors. We propose that information on the biomarker-specific profile of major genetic, clinical and lifestyle factors should be used to establish personalized clinical cutoffs, and that this would increase the sensitivity of using biomarkers for prediction of clinical end points.
DOI
10.1038/ncomms5684

eQTLGen Phase I

Summary statistics
PUBMED_LINK
34475573
URL
https://www.eqtlgen.org/
TITLE
Large-scale cis- and trans-eQTL analyses identify thousands of genetic loci and polygenic scores that regulate blood gene expression.
Main citation
Võsa U, Claringbould A, Westra HJ, Bonder MJ, ...&, Franke L. (2021) Large-scale cis- and trans-eQTL analyses identify thousands of genetic loci and polygenic scores that regulate blood gene expression. Nat Genet, 53 (9) 1300-1310. doi:10.1038/s41588-021-00913-z. PMID 34475573
ABSTRACT
Trait-associated genetic variants affect complex phenotypes primarily via regulatory mechanisms on the transcriptome. To investigate the genetics of gene expression, we performed cis- and trans-expression quantitative trait locus (eQTL) analyses using blood-derived expression from 31,684 individuals through the eQTLGen Consortium. We detected cis-eQTL for 88% of genes, and these were replicable in numerous tissues. Distal trans-eQTL (detected for 37% of 10,317 trait-associated variants tested) showed lower replication rates, partially due to low replication power and confounding by cell type composition. However, replication analyses in single-cell RNA-seq data prioritized intracellular trans-eQTL. Trans-eQTL exerted their effects via several mechanisms, primarily through regulation by transcription factors. Expression of 13% of the genes correlated with polygenic scores for 1,263 phenotypes, pinpointing potential drivers for those traits. In summary, this work represents a large eQTL resource, and its results serve as a starting point for in-depth interpretation of complex phenotypes.
DOI
10.1038/s41588-021-00913-z

eQTLGen Phase II

Summary statistics
DESCRIPTION
Expanded blood eQTL meta-analysis and genome-wide summary statistics across cohorts; consortium coordination, cookbook, and downloads via the Phase II portal.
URL
https://www.eqtlgen.org/
TITLE
eQTLGen Phase II (blood eQTL consortium resource).
Main citation
eQTLGen Consortium. eQTLGen Phase II (blood eQTL consortium resource).

Ferkingstad E, et al-34857953

Summary statistics
PUBMED_LINK
34857953
TITLE
Large-scale integration of the plasma proteome with genetics and disease.
Main citation
Ferkingstad E, Sulem P, Atlason BA, Sveinbjornsson G, ...&, Stefansson K. (2021) Large-scale integration of the plasma proteome with genetics and disease. Nat Genet, 53 (12) 1712-1721. doi:10.1038/s41588-021-00978-w. PMID 34857953
ABSTRACT
The plasma proteome can help bridge the gap between the genome and diseases. Here we describe genome-wide association studies (GWASs) of plasma protein levels measured with 4,907 aptamers in 35,559 Icelanders. We found 18,084 associations between sequence variants and levels of proteins in plasma (protein quantitative trait loci; pQTL), of which 19% were with rare variants (minor allele frequency (MAF) < 1%). We tested plasma protein levels for association with 373 diseases and other traits and identified 257,490 associations. We integrated pQTL and genetic associations with diseases and other traits and found that 12% of 45,334 lead associations in the GWAS Catalog are with variants in high linkage disequilibrium with pQTL. We identified 938 genes encoding potential drug targets with variants that influence levels of possible biomarkers. Combining proteomics, genomics and transcriptomics, we provide a valuable resource that can be used to improve understanding of disease pathogenesis and to assist with drug discovery and development.
DOI
10.1038/s41588-021-00978-w

FinnGen R10 (December 18 2023)

Summary statistics
PUBMED_LINK
36653562
DESCRIPTION
FinnGen data freeze R10 (18 Dec 2023) GWAS summary statistics; flagship FinnGen resource described in Kurki et al., Nature 2023.
URL
https://r10.finngen.fi/
TITLE
FinnGen provides genetic insights from a well-phenotyped isolated population.
Main citation
Kurki MI, Karjalainen J, Palta P, Sipilä TP, ...&, Palotie A. (2023) FinnGen provides genetic insights from a well-phenotyped isolated population. Nature, 613 (7944) 508-518. doi:10.1038/s41586-022-05473-8. PMID 36653562
ABSTRACT
Population isolates such as those in Finland benefit genetic research because deleterious alleles are often concentrated on a small number of low-frequency variants (0.1% ≤ minor allele frequency < 5%). These variants survived the founding bottleneck rather than being distributed over a large number of ultrarare variants. Although this effect is well established in Mendelian genetics, its value in common disease genetics is less explored1,2. FinnGen aims to study the genome and national health register data of 500,000 Finnish individuals. Given the relatively high median age of participants (63 years) and the substantial fraction of hospital-based recruitment, FinnGen is enriched for disease end points. Here we analyse data from 224,737 participants from FinnGen and study 15 diseases that have previously been investigated in large genome-wide association studies (GWASs). We also include meta-analyses of biobank data from Estonia and the United Kingdom. We identified 30 new associations, primarily low-frequency variants, enriched in the Finnish population. A GWAS of 1,932 diseases also identified 2,733 genome-wide significant associations (893 phenome-wide significant (PWS), P < 2.6 × 10-11) at 2,496 (771 PWS) independent loci with 807 (247 PWS) end points. Among these, fine-mapping implicated 148 (73 PWS) coding variants associated with 83 (42 PWS) end points. Moreover, 91 (47 PWS) had an allele frequency of <5% in non-Finnish European individuals, of which 62 (32 PWS) were enriched by more than twofold in Finland. These findings demonstrate the power of bottlenecked populations to find entry points into the biology of common diseases through low-frequency, high impact variants.
DOI
10.1038/s41586-022-05473-8
RELATED_BIOBANK
FinnGen
MAIN ANCESTRY
EUR

FinnGen R10-UKBB meta-analysis

Summary statistics
PUBMED_LINK
36653562
DESCRIPTION
Meta-analysis of FinnGen R10 with UK Biobank GWAS summary statistics (FinnGen distribution).
URL
https://public-metaresults-fg-ukbb.finngen.fi
TITLE
FinnGen provides genetic insights from a well-phenotyped isolated population.
Main citation
Kurki MI, Karjalainen J, Palta P, Sipilä TP, ...&, Palotie A. (2023) FinnGen provides genetic insights from a well-phenotyped isolated population. Nature, 613 (7944) 508-518. doi:10.1038/s41586-022-05473-8. PMID 36653562
ABSTRACT
Population isolates such as those in Finland benefit genetic research because deleterious alleles are often concentrated on a small number of low-frequency variants (0.1% ≤ minor allele frequency < 5%). These variants survived the founding bottleneck rather than being distributed over a large number of ultrarare variants. Although this effect is well established in Mendelian genetics, its value in common disease genetics is less explored1,2. FinnGen aims to study the genome and national health register data of 500,000 Finnish individuals. Given the relatively high median age of participants (63 years) and the substantial fraction of hospital-based recruitment, FinnGen is enriched for disease end points. Here we analyse data from 224,737 participants from FinnGen and study 15 diseases that have previously been investigated in large genome-wide association studies (GWASs). We also include meta-analyses of biobank data from Estonia and the United Kingdom. We identified 30 new associations, primarily low-frequency variants, enriched in the Finnish population. A GWAS of 1,932 diseases also identified 2,733 genome-wide significant associations (893 phenome-wide significant (PWS), P < 2.6 × 10-11) at 2,496 (771 PWS) independent loci with 807 (247 PWS) end points. Among these, fine-mapping implicated 148 (73 PWS) coding variants associated with 83 (42 PWS) end points. Moreover, 91 (47 PWS) had an allele frequency of <5% in non-Finnish European individuals, of which 62 (32 PWS) were enriched by more than twofold in Finland. These findings demonstrate the power of bottlenecked populations to find entry points into the biology of common diseases through low-frequency, high impact variants.
DOI
10.1038/s41586-022-05473-8
RELATED_BIOBANK
UK Biobank ,FinnGen
MAIN ANCESTRY
EUR

FinnGen R11 (June 24 2024)

Summary statistics
PUBMED_LINK
36653562
DESCRIPTION
FinnGen data freeze R11 (24 Jun 2024) GWAS summary statistics; resource overview in Kurki et al., Nature 2023.
URL
https://r11.finngen.fi/
TITLE
FinnGen provides genetic insights from a well-phenotyped isolated population.
Main citation
Kurki MI, Karjalainen J, Palta P, Sipilä TP, ...&, Palotie A. (2023) FinnGen provides genetic insights from a well-phenotyped isolated population. Nature, 613 (7944) 508-518. doi:10.1038/s41586-022-05473-8. PMID 36653562
ABSTRACT
Population isolates such as those in Finland benefit genetic research because deleterious alleles are often concentrated on a small number of low-frequency variants (0.1% ≤ minor allele frequency < 5%). These variants survived the founding bottleneck rather than being distributed over a large number of ultrarare variants. Although this effect is well established in Mendelian genetics, its value in common disease genetics is less explored1,2. FinnGen aims to study the genome and national health register data of 500,000 Finnish individuals. Given the relatively high median age of participants (63 years) and the substantial fraction of hospital-based recruitment, FinnGen is enriched for disease end points. Here we analyse data from 224,737 participants from FinnGen and study 15 diseases that have previously been investigated in large genome-wide association studies (GWASs). We also include meta-analyses of biobank data from Estonia and the United Kingdom. We identified 30 new associations, primarily low-frequency variants, enriched in the Finnish population. A GWAS of 1,932 diseases also identified 2,733 genome-wide significant associations (893 phenome-wide significant (PWS), P < 2.6 × 10-11) at 2,496 (771 PWS) independent loci with 807 (247 PWS) end points. Among these, fine-mapping implicated 148 (73 PWS) coding variants associated with 83 (42 PWS) end points. Moreover, 91 (47 PWS) had an allele frequency of <5% in non-Finnish European individuals, of which 62 (32 PWS) were enriched by more than twofold in Finland. These findings demonstrate the power of bottlenecked populations to find entry points into the biology of common diseases through low-frequency, high impact variants.
DOI
10.1038/s41586-022-05473-8
RELATED_BIOBANK
FinnGen
MAIN ANCESTRY
EUR

FinnGen R12 (November 4 2024)

Summary statistics
PUBMED_LINK
36653562
DESCRIPTION
FinnGen data freeze R12 (4 Nov 2024) GWAS summary statistics; resource overview in Kurki et al., Nature 2023.
URL
https://r12.finngen.fi/
TITLE
FinnGen provides genetic insights from a well-phenotyped isolated population.
Main citation
Kurki MI, Karjalainen J, Palta P, Sipilä TP, ...&, Palotie A. (2023) FinnGen provides genetic insights from a well-phenotyped isolated population. Nature, 613 (7944) 508-518. doi:10.1038/s41586-022-05473-8. PMID 36653562
ABSTRACT
Population isolates such as those in Finland benefit genetic research because deleterious alleles are often concentrated on a small number of low-frequency variants (0.1% ≤ minor allele frequency < 5%). These variants survived the founding bottleneck rather than being distributed over a large number of ultrarare variants. Although this effect is well established in Mendelian genetics, its value in common disease genetics is less explored1,2. FinnGen aims to study the genome and national health register data of 500,000 Finnish individuals. Given the relatively high median age of participants (63 years) and the substantial fraction of hospital-based recruitment, FinnGen is enriched for disease end points. Here we analyse data from 224,737 participants from FinnGen and study 15 diseases that have previously been investigated in large genome-wide association studies (GWASs). We also include meta-analyses of biobank data from Estonia and the United Kingdom. We identified 30 new associations, primarily low-frequency variants, enriched in the Finnish population. A GWAS of 1,932 diseases also identified 2,733 genome-wide significant associations (893 phenome-wide significant (PWS), P < 2.6 × 10-11) at 2,496 (771 PWS) independent loci with 807 (247 PWS) end points. Among these, fine-mapping implicated 148 (73 PWS) coding variants associated with 83 (42 PWS) end points. Moreover, 91 (47 PWS) had an allele frequency of <5% in non-Finnish European individuals, of which 62 (32 PWS) were enriched by more than twofold in Finland. These findings demonstrate the power of bottlenecked populations to find entry points into the biology of common diseases through low-frequency, high impact variants.
DOI
10.1038/s41586-022-05473-8
RELATED_BIOBANK
FinnGen
MAIN ANCESTRY
EUR

FinnGen R12-UKBB meta-analysis

Summary statistics
PUBMED_LINK
36653562
DESCRIPTION
Meta-analysis of FinnGen R12 with UK Biobank GWAS summary statistics (FinnGen distribution).
URL
https://metaresults-ukbb.finngen.fi/
TITLE
FinnGen provides genetic insights from a well-phenotyped isolated population.
Main citation
Kurki MI, Karjalainen J, Palta P, Sipilä TP, ...&, Palotie A. (2023) FinnGen provides genetic insights from a well-phenotyped isolated population. Nature, 613 (7944) 508-518. doi:10.1038/s41586-022-05473-8. PMID 36653562
ABSTRACT
Population isolates such as those in Finland benefit genetic research because deleterious alleles are often concentrated on a small number of low-frequency variants (0.1% ≤ minor allele frequency < 5%). These variants survived the founding bottleneck rather than being distributed over a large number of ultrarare variants. Although this effect is well established in Mendelian genetics, its value in common disease genetics is less explored1,2. FinnGen aims to study the genome and national health register data of 500,000 Finnish individuals. Given the relatively high median age of participants (63 years) and the substantial fraction of hospital-based recruitment, FinnGen is enriched for disease end points. Here we analyse data from 224,737 participants from FinnGen and study 15 diseases that have previously been investigated in large genome-wide association studies (GWASs). We also include meta-analyses of biobank data from Estonia and the United Kingdom. We identified 30 new associations, primarily low-frequency variants, enriched in the Finnish population. A GWAS of 1,932 diseases also identified 2,733 genome-wide significant associations (893 phenome-wide significant (PWS), P < 2.6 × 10-11) at 2,496 (771 PWS) independent loci with 807 (247 PWS) end points. Among these, fine-mapping implicated 148 (73 PWS) coding variants associated with 83 (42 PWS) end points. Moreover, 91 (47 PWS) had an allele frequency of <5% in non-Finnish European individuals, of which 62 (32 PWS) were enriched by more than twofold in Finland. These findings demonstrate the power of bottlenecked populations to find entry points into the biology of common diseases through low-frequency, high impact variants.
DOI
10.1038/s41586-022-05473-8
RELATED_BIOBANK
UK Biobank ,FinnGen
MAIN ANCESTRY
EUR

FinnGen R4 (November 30 2020)

Summary statistics
PUBMED_LINK
36653562
DESCRIPTION
FinnGen data freeze R4 (30 Nov 2020) GWAS summary statistics; resource overview in Kurki et al., Nature 2023.
URL
https://r4.finngen.fi/about
TITLE
FinnGen provides genetic insights from a well-phenotyped isolated population.
Main citation
Kurki MI, Karjalainen J, Palta P, Sipilä TP, ...&, Palotie A. (2023) FinnGen provides genetic insights from a well-phenotyped isolated population. Nature, 613 (7944) 508-518. doi:10.1038/s41586-022-05473-8. PMID 36653562
ABSTRACT
Population isolates such as those in Finland benefit genetic research because deleterious alleles are often concentrated on a small number of low-frequency variants (0.1% ≤ minor allele frequency < 5%). These variants survived the founding bottleneck rather than being distributed over a large number of ultrarare variants. Although this effect is well established in Mendelian genetics, its value in common disease genetics is less explored1,2. FinnGen aims to study the genome and national health register data of 500,000 Finnish individuals. Given the relatively high median age of participants (63 years) and the substantial fraction of hospital-based recruitment, FinnGen is enriched for disease end points. Here we analyse data from 224,737 participants from FinnGen and study 15 diseases that have previously been investigated in large genome-wide association studies (GWASs). We also include meta-analyses of biobank data from Estonia and the United Kingdom. We identified 30 new associations, primarily low-frequency variants, enriched in the Finnish population. A GWAS of 1,932 diseases also identified 2,733 genome-wide significant associations (893 phenome-wide significant (PWS), P < 2.6 × 10-11) at 2,496 (771 PWS) independent loci with 807 (247 PWS) end points. Among these, fine-mapping implicated 148 (73 PWS) coding variants associated with 83 (42 PWS) end points. Moreover, 91 (47 PWS) had an allele frequency of <5% in non-Finnish European individuals, of which 62 (32 PWS) were enriched by more than twofold in Finland. These findings demonstrate the power of bottlenecked populations to find entry points into the biology of common diseases through low-frequency, high impact variants.
DOI
10.1038/s41586-022-05473-8
RELATED_BIOBANK
FinnGen
MAIN ANCESTRY
EUR

FinnGen R5 (May 11 2021)

Summary statistics
PUBMED_LINK
36653562
DESCRIPTION
FinnGen data freeze R5 (11 May 2021) GWAS summary statistics; resource overview in Kurki et al., Nature 2023.
URL
https://r5.finngen.fi/about
TITLE
FinnGen provides genetic insights from a well-phenotyped isolated population.
Main citation
Kurki MI, Karjalainen J, Palta P, Sipilä TP, ...&, Palotie A. (2023) FinnGen provides genetic insights from a well-phenotyped isolated population. Nature, 613 (7944) 508-518. doi:10.1038/s41586-022-05473-8. PMID 36653562
ABSTRACT
Population isolates such as those in Finland benefit genetic research because deleterious alleles are often concentrated on a small number of low-frequency variants (0.1% ≤ minor allele frequency < 5%). These variants survived the founding bottleneck rather than being distributed over a large number of ultrarare variants. Although this effect is well established in Mendelian genetics, its value in common disease genetics is less explored1,2. FinnGen aims to study the genome and national health register data of 500,000 Finnish individuals. Given the relatively high median age of participants (63 years) and the substantial fraction of hospital-based recruitment, FinnGen is enriched for disease end points. Here we analyse data from 224,737 participants from FinnGen and study 15 diseases that have previously been investigated in large genome-wide association studies (GWASs). We also include meta-analyses of biobank data from Estonia and the United Kingdom. We identified 30 new associations, primarily low-frequency variants, enriched in the Finnish population. A GWAS of 1,932 diseases also identified 2,733 genome-wide significant associations (893 phenome-wide significant (PWS), P < 2.6 × 10-11) at 2,496 (771 PWS) independent loci with 807 (247 PWS) end points. Among these, fine-mapping implicated 148 (73 PWS) coding variants associated with 83 (42 PWS) end points. Moreover, 91 (47 PWS) had an allele frequency of <5% in non-Finnish European individuals, of which 62 (32 PWS) were enriched by more than twofold in Finland. These findings demonstrate the power of bottlenecked populations to find entry points into the biology of common diseases through low-frequency, high impact variants.
DOI
10.1038/s41586-022-05473-8
RELATED_BIOBANK
FinnGen
MAIN ANCESTRY
EUR

FinnGen R6 (January 24 2022)

Summary statistics
PUBMED_LINK
36653562
DESCRIPTION
FinnGen data freeze R6 (24 Jan 2022) GWAS summary statistics; resource overview in Kurki et al., Nature 2023.
URL
https://r6.finngen.fi/about
TITLE
FinnGen provides genetic insights from a well-phenotyped isolated population.
Main citation
Kurki MI, Karjalainen J, Palta P, Sipilä TP, ...&, Palotie A. (2023) FinnGen provides genetic insights from a well-phenotyped isolated population. Nature, 613 (7944) 508-518. doi:10.1038/s41586-022-05473-8. PMID 36653562
ABSTRACT
Population isolates such as those in Finland benefit genetic research because deleterious alleles are often concentrated on a small number of low-frequency variants (0.1% ≤ minor allele frequency < 5%). These variants survived the founding bottleneck rather than being distributed over a large number of ultrarare variants. Although this effect is well established in Mendelian genetics, its value in common disease genetics is less explored1,2. FinnGen aims to study the genome and national health register data of 500,000 Finnish individuals. Given the relatively high median age of participants (63 years) and the substantial fraction of hospital-based recruitment, FinnGen is enriched for disease end points. Here we analyse data from 224,737 participants from FinnGen and study 15 diseases that have previously been investigated in large genome-wide association studies (GWASs). We also include meta-analyses of biobank data from Estonia and the United Kingdom. We identified 30 new associations, primarily low-frequency variants, enriched in the Finnish population. A GWAS of 1,932 diseases also identified 2,733 genome-wide significant associations (893 phenome-wide significant (PWS), P < 2.6 × 10-11) at 2,496 (771 PWS) independent loci with 807 (247 PWS) end points. Among these, fine-mapping implicated 148 (73 PWS) coding variants associated with 83 (42 PWS) end points. Moreover, 91 (47 PWS) had an allele frequency of <5% in non-Finnish European individuals, of which 62 (32 PWS) were enriched by more than twofold in Finland. These findings demonstrate the power of bottlenecked populations to find entry points into the biology of common diseases through low-frequency, high impact variants.
DOI
10.1038/s41586-022-05473-8
RELATED_BIOBANK
FinnGen
MAIN ANCESTRY
EUR

FinnGen R7 (June 1 2022)

Summary statistics
PUBMED_LINK
36653562
DESCRIPTION
FinnGen data freeze R7 (1 Jun 2022) GWAS summary statistics; resource overview in Kurki et al., Nature 2023.
URL
https://r7.finngen.fi/about
TITLE
FinnGen provides genetic insights from a well-phenotyped isolated population.
Main citation
Kurki MI, Karjalainen J, Palta P, Sipilä TP, ...&, Palotie A. (2023) FinnGen provides genetic insights from a well-phenotyped isolated population. Nature, 613 (7944) 508-518. doi:10.1038/s41586-022-05473-8. PMID 36653562
ABSTRACT
Population isolates such as those in Finland benefit genetic research because deleterious alleles are often concentrated on a small number of low-frequency variants (0.1% ≤ minor allele frequency < 5%). These variants survived the founding bottleneck rather than being distributed over a large number of ultrarare variants. Although this effect is well established in Mendelian genetics, its value in common disease genetics is less explored1,2. FinnGen aims to study the genome and national health register data of 500,000 Finnish individuals. Given the relatively high median age of participants (63 years) and the substantial fraction of hospital-based recruitment, FinnGen is enriched for disease end points. Here we analyse data from 224,737 participants from FinnGen and study 15 diseases that have previously been investigated in large genome-wide association studies (GWASs). We also include meta-analyses of biobank data from Estonia and the United Kingdom. We identified 30 new associations, primarily low-frequency variants, enriched in the Finnish population. A GWAS of 1,932 diseases also identified 2,733 genome-wide significant associations (893 phenome-wide significant (PWS), P < 2.6 × 10-11) at 2,496 (771 PWS) independent loci with 807 (247 PWS) end points. Among these, fine-mapping implicated 148 (73 PWS) coding variants associated with 83 (42 PWS) end points. Moreover, 91 (47 PWS) had an allele frequency of <5% in non-Finnish European individuals, of which 62 (32 PWS) were enriched by more than twofold in Finland. These findings demonstrate the power of bottlenecked populations to find entry points into the biology of common diseases through low-frequency, high impact variants.
DOI
10.1038/s41586-022-05473-8
RELATED_BIOBANK
FinnGen
MAIN ANCESTRY
EUR

FinnGen R8 (Dec 1 2022)

Summary statistics
PUBMED_LINK
36653562
DESCRIPTION
FinnGen data freeze R8 (1 Dec 2022) GWAS summary statistics; resource overview in Kurki et al., Nature 2023.
URL
https://r8.finngen.fi/about
TITLE
FinnGen provides genetic insights from a well-phenotyped isolated population.
Main citation
Kurki MI, Karjalainen J, Palta P, Sipilä TP, ...&, Palotie A. (2023) FinnGen provides genetic insights from a well-phenotyped isolated population. Nature, 613 (7944) 508-518. doi:10.1038/s41586-022-05473-8. PMID 36653562
ABSTRACT
Population isolates such as those in Finland benefit genetic research because deleterious alleles are often concentrated on a small number of low-frequency variants (0.1% ≤ minor allele frequency < 5%). These variants survived the founding bottleneck rather than being distributed over a large number of ultrarare variants. Although this effect is well established in Mendelian genetics, its value in common disease genetics is less explored1,2. FinnGen aims to study the genome and national health register data of 500,000 Finnish individuals. Given the relatively high median age of participants (63 years) and the substantial fraction of hospital-based recruitment, FinnGen is enriched for disease end points. Here we analyse data from 224,737 participants from FinnGen and study 15 diseases that have previously been investigated in large genome-wide association studies (GWASs). We also include meta-analyses of biobank data from Estonia and the United Kingdom. We identified 30 new associations, primarily low-frequency variants, enriched in the Finnish population. A GWAS of 1,932 diseases also identified 2,733 genome-wide significant associations (893 phenome-wide significant (PWS), P < 2.6 × 10-11) at 2,496 (771 PWS) independent loci with 807 (247 PWS) end points. Among these, fine-mapping implicated 148 (73 PWS) coding variants associated with 83 (42 PWS) end points. Moreover, 91 (47 PWS) had an allele frequency of <5% in non-Finnish European individuals, of which 62 (32 PWS) were enriched by more than twofold in Finland. These findings demonstrate the power of bottlenecked populations to find entry points into the biology of common diseases through low-frequency, high impact variants.
DOI
10.1038/s41586-022-05473-8
RELATED_BIOBANK
FinnGen
MAIN ANCESTRY
EUR

FinnGen R9 (May 11 2023)

Summary statistics
PUBMED_LINK
36653562
DESCRIPTION
FinnGen data freeze R9 (11 May 2023) GWAS summary statistics; resource overview in Kurki et al., Nature 2023.
URL
https://r9.finngen.fi/about
TITLE
FinnGen provides genetic insights from a well-phenotyped isolated population.
Main citation
Kurki MI, Karjalainen J, Palta P, Sipilä TP, ...&, Palotie A. (2023) FinnGen provides genetic insights from a well-phenotyped isolated population. Nature, 613 (7944) 508-518. doi:10.1038/s41586-022-05473-8. PMID 36653562
ABSTRACT
Population isolates such as those in Finland benefit genetic research because deleterious alleles are often concentrated on a small number of low-frequency variants (0.1% ≤ minor allele frequency < 5%). These variants survived the founding bottleneck rather than being distributed over a large number of ultrarare variants. Although this effect is well established in Mendelian genetics, its value in common disease genetics is less explored1,2. FinnGen aims to study the genome and national health register data of 500,000 Finnish individuals. Given the relatively high median age of participants (63 years) and the substantial fraction of hospital-based recruitment, FinnGen is enriched for disease end points. Here we analyse data from 224,737 participants from FinnGen and study 15 diseases that have previously been investigated in large genome-wide association studies (GWASs). We also include meta-analyses of biobank data from Estonia and the United Kingdom. We identified 30 new associations, primarily low-frequency variants, enriched in the Finnish population. A GWAS of 1,932 diseases also identified 2,733 genome-wide significant associations (893 phenome-wide significant (PWS), P < 2.6 × 10-11) at 2,496 (771 PWS) independent loci with 807 (247 PWS) end points. Among these, fine-mapping implicated 148 (73 PWS) coding variants associated with 83 (42 PWS) end points. Moreover, 91 (47 PWS) had an allele frequency of <5% in non-Finnish European individuals, of which 62 (32 PWS) were enriched by more than twofold in Finland. These findings demonstrate the power of bottlenecked populations to find entry points into the biology of common diseases through low-frequency, high impact variants.
DOI
10.1038/s41586-022-05473-8
RELATED_BIOBANK
FinnGen
MAIN ANCESTRY
EUR

Folkersen L, et al-28369058

Summary statistics
PUBMED_LINK
28369058
TITLE
Mapping of 79 loci for 83 plasma protein biomarkers in cardiovascular disease.
Main citation
Folkersen L, Fauman E, Sabater-Lleal M, Strawbridge RJ, ...&, Mälarstig A. (2017) Mapping of 79 loci for 83 plasma protein biomarkers in cardiovascular disease. PLoS Genet, 13 (4) e1006706. doi:10.1371/journal.pgen.1006706. PMID 28369058
ABSTRACT
Recent advances in highly multiplexed immunoassays have allowed systematic large-scale measurement of hundreds of plasma proteins in large cohort studies. In combination with genotyping, such studies offer the prospect to 1) identify mechanisms involved with regulation of protein expression in plasma, and 2) determine whether the plasma proteins are likely to be causally implicated in disease. We report here the results of genome-wide association (GWA) studies of 83 proteins considered relevant to cardiovascular disease (CVD), measured in 3,394 individuals with multiple CVD risk factors. We identified 79 genome-wide significant (p<5e-8) association signals, 55 of which replicated at P<0.0007 in separate validation studies (n = 2,639 individuals). Using automated text mining, manual curation, and network-based methods incorporating information on expression quantitative trait loci (eQTL), we propose plausible causal mechanisms for 25 trans-acting loci, including a potential post-translational regulation of stem cell factor by matrix metalloproteinase 9 and receptor-ligand pairs such as RANK-RANK ligand. Using public GWA study data, we further evaluate all 79 loci for their causal effect on coronary artery disease, and highlight several potentially causal associations. Overall, a majority of the plasma proteins studied showed evidence of regulation at the genetic level. Our results enable future studies of the causal architecture of human disease, which in turn should aid discovery of new drug targets.
DOI
10.1371/journal.pgen.1006706

Folkersen L, et al-33067605

Summary statistics
PUBMED_LINK
33067605
TITLE
Genomic and drug target evaluation of 90 cardiovascular proteins in 30,931 individuals.
Main citation
Folkersen L, Gustafsson S, Wang Q, Hansen DH, ...&, Mälarstig A. (2020) Genomic and drug target evaluation of 90 cardiovascular proteins in 30,931 individuals. Nat Metab, 2 (10) 1135-1148. doi:10.1038/s42255-020-00287-2. PMID 33067605
ABSTRACT
Circulating proteins are vital in human health and disease and are frequently used as biomarkers for clinical decision-making or as targets for pharmacological intervention. Here, we map and replicate protein quantitative trait loci (pQTL) for 90 cardiovascular proteins in over 30,000 individuals, resulting in 451 pQTLs for 85 proteins. For each protein, we further perform pathway mapping to obtain trans-pQTL gene and regulatory designations. We substantiate these regulatory findings with orthogonal evidence for trans-pQTLs using mouse knockdown experiments (ABCA1 and TRIB1) and clinical trial results (chemokine receptors CCR2 and CCR5), with consistent regulation. Finally, we evaluate known drug targets, and suggest new target candidates or repositioning opportunities using Mendelian randomization. This identifies 11 proteins with causal evidence of involvement in human disease that have not previously been targeted, including EGF, IL-16, PAPPA, SPON1, F3, ADM, CASP-8, CHI3L1, CXCL16, GDF15 and MMP-12. Taken together, these findings demonstrate the utility of large-scale mapping of the genetics of the proteome and provide a resource for future precision studies of circulating proteins in human health.
DOI
10.1038/s42255-020-00287-2

Fu

Summary statistics
PUBMED_LINK
41386230
TITLE
Single-cell eQTL mapping reveals cell-type-specific genetic regulation in lung cancer.
Main citation
Fu Y, Wang Y, Jin C, Zhang C, ...&, Ma H. (2026) Single-cell eQTL mapping reveals cell-type-specific genetic regulation in lung cancer. Cell Genom, 6 (3) 101100. doi:10.1016/j.xgen.2025.101100. PMID 41386230
ABSTRACT
Genome-wide association studies (GWASs) have identified over 50 lung cancer risk loci; however, the precise cellular context of these genetic mechanisms remains unclear due to limitations in bulk tissue expression quantitative trait locus (eQTL) analyses. Here, we present the largest single-cell eQTL (sc-eQTL) atlas of human lung tissue to date, profiling 222 donors using multiplexed single-cell RNA sequencing (scRNA-seq). We identified 4,341 independent eQTLs across 17 cell types, with over 60% of sc-eQTLs and 51% of eGenes being cell-type specific, and fewer than 52% were detectable in paired bulk datasets. Integration with GWASs for non-small cell lung cancer highlighted epithelial and immune cells as key contributors to genetic susceptibility, identifying 28 candidate genes within known risk loci and 24 in novel regions. Notably, 47% of established non-small cell lung cancer (NSCLC) susceptibility loci exhibited cell-type-specific pleiotropic genetic regulation. This study provides a valuable resource of lung sc-eQTLs and illuminates how genetic variation modulates gene expression in a cell-type-specific fashion, contributing to lung cancer susceptibility.
DOI
10.1016/j.xgen.2025.101100

Fu J-38811844

Summary statistics
PUBMED_LINK
38811844
TITLE
Cross-ancestry genome-wide association studies of brain imaging phenotypes.
Main citation
Fu J, Zhang Q, Wang J, Wang M, ...&, CHIMGEN Consortium. (2024) Cross-ancestry genome-wide association studies of brain imaging phenotypes. Nat Genet, 56 (6) 1110-1120. doi:10.1038/s41588-024-01766-y. PMID 38811844
ABSTRACT
Genome-wide association studies of brain imaging phenotypes are mainly performed in European populations, but other populations are severely under-represented. Here, we conducted Chinese-alone and cross-ancestry genome-wide association studies of 3,414 brain imaging phenotypes in 7,058 Chinese Han and 33,224 white British participants. We identified 38 new associations in Chinese-alone analyses and 486 additional new associations in cross-ancestry meta-analyses at P < 1.46 × 10-11 for discovery and P < 0.05 for replication. We pooled significant autosomal associations identified by single- or cross-ancestry analyses into 6,443 independent associations, which showed uneven distribution in the genome and the phenotype subgroups. We further divided them into 44 associations with different effect sizes and 3,557 associations with similar effect sizes between ancestries. Loci of these associations were shared with 15 brain-related non-imaging traits including cognition and neuropsychiatric disorders. Our results provide a valuable catalog of genetic associations for brain imaging phenotypes in more diverse populations.
DOI
10.1038/s41588-024-01766-y
MAIN ANCESTRY
EAS,EUR

GENOA

Summary statistics
PUBMED_LINK
37169753
URL
http://mqtldb.godmc.org.uk/
TITLE
meQTL mapping in the GENOA study reveals genetic determinants of DNA methylation in African Americans.
Main citation
Shang L, Zhao W, Wang YZ, Li Z, ...&, Zhou X. (2023) meQTL mapping in the GENOA study reveals genetic determinants of DNA methylation in African Americans. Nat Commun, 14 (1) 2711. doi:10.1038/s41467-023-37961-4. PMID 37169753
ABSTRACT
Identifying genetic variants that are associated with variation in DNA methylation, an analysis commonly referred to as methylation quantitative trait locus (meQTL) mapping, is an important first step towards understanding the genetic architecture underlying epigenetic variation. Most existing meQTL mapping studies have focused on individuals of European ancestry and are underrepresented in other populations, with a particular absence of large studies in populations with African ancestry. We fill this critical knowledge gap by performing a large-scale cis-meQTL mapping study in 961 African Americans from the Genetic Epidemiology Network of Arteriopathy (GENOA) study. We identify a total of 4,565,687 cis-acting meQTLs in 320,965 meCpGs. We find that 45% of meCpGs harbor multiple independent meQTLs, suggesting potential polygenic genetic architecture underlying methylation variation. A large percentage of the cis-meQTLs also colocalize with cis-expression QTLs (eQTLs) in the same population. Importantly, the identified cis-meQTLs explain a substantial proportion (median = 24.6%) of methylation variation. In addition, the cis-meQTL associated CpG sites mediate a substantial proportion (median = 24.9%) of SNP effects underlying gene expression. Overall, our results represent an important step toward revealing the co-regulation of methylation and gene expression, facilitating the functional interpretation of epigenetic and gene regulation underlying common diseases in African Americans.
DOI
10.1038/s41467-023-37961-4

GIANT (Genetic Investigation of ANthropometric Traits)

Summary statistics
PUBMED_LINK
20881960
DESCRIPTION
Anthropometric trait GWAS meta-analysis summary statistics from the GIANT consortium.
URL
https://portals.broadinstitute.org/collaboration/giant/index.php/Main_Page
TITLE
Hundreds of variants clustered in genomic loci and biological pathways affect human height.
Main citation
Lango Allen H, Estrada K, Lettre G, Berndt SI, ...&, Hirschhorn JN. (2010) Hundreds of variants clustered in genomic loci and biological pathways affect human height. Nature, 467 (7317) 832-8. doi:10.1038/nature09410. PMID 20881960
ABSTRACT
Most common human traits and diseases have a polygenic pattern of inheritance: DNA sequence variants at many genetic loci influence the phenotype. Genome-wide association (GWA) studies have identified more than 600 variants associated with human traits, but these typically explain small fractions of phenotypic variation, raising questions about the use of further studies. Here, using 183,727 individuals, we show that hundreds of genetic variants, in at least 180 loci, influence adult height, a highly heritable and classic polygenic trait. The large number of loci reveals patterns with important implications for genetic studies of common human diseases and traits. First, the 180 loci are not random, but instead are enriched for genes that are connected in biological pathways (P = 0.016) and that underlie skeletal growth defects (P < 0.001). Second, the likely causal gene is often located near the most strongly associated variant: in 13 of 21 loci containing a known skeletal growth gene, that gene was closest to the associated variant. Third, at least 19 loci have multiple independently associated variants, suggesting that allelic heterogeneity is a frequent feature of polygenic traits, that comprehensive explorations of already-discovered loci should discover additional variants and that an appreciable fraction of associated loci may have been identified. Fourth, associated variants are enriched for likely functional effects on genes, being over-represented among variants that alter amino-acid structure of proteins and expression levels of nearby genes. Our data explain approximately 10% of the phenotypic variation in height, and we estimate that unidentified common variants of similar effect sizes would increase this figure to approximately 16% of phenotypic variation (approximately 20% of heritable variation). Although additional approaches are needed to dissect the genetic architecture of polygenic human traits fully, our findings indicate that GWA studies can identify large numbers of loci that implicate biologically relevant genes and pathways.
DOI
10.1038/nature09410
MAIN ANCESTRY
Multi-ancestry

Gilly A, et al-37778719

Summary statistics
PUBMED_LINK
37778719
TITLE
Genome-wide meta-analysis of 92 cardiometabolic protein serum levels.
Main citation
Gilly A, Park YC, Tsafantakis E, Karaleftheri M, ...&, Zeggini E. (2023) Genome-wide meta-analysis of 92 cardiometabolic protein serum levels. Mol Metab, 78 () 101810. doi:10.1016/j.molmet.2023.101810. PMID 37778719
ABSTRACT
OBJECTIVES: Global cardiometabolic disease prevalence has grown rapidly over the years, making it the leading cause of death worldwide. Proteins are crucial components in biological pathways dysregulated in disease states. Identifying genetic components that influence circulating protein levels may lead to the discovery of biomarkers for early stages of disease or offer opportunities as therapeutic targets. METHODS: Here, we carry out a genome-wide association study (GWAS) utilising whole genome sequencing data in 3,005 individuals from the HELIC founder populations cohort, across 92 proteins of cardiometabolic relevance. RESULTS: We report 322 protein quantitative trait loci (pQTL) signals across 92 proteins, of which 76 are located in or near the coding gene (cis-pQTL). We link those association signals with changes in protein expression and cardiometabolic disease risk using colocalisation and Mendelian randomisation (MR) analyses. CONCLUSIONS: The majority of previously unknown signals we describe point to proteins or protein interactions involved in inflammation and immune response, providing genetic evidence for the contributing role of inflammation in cardiometabolic disease processes.
DOI
10.1016/j.molmet.2023.101810
MAIN ANCESTRY
EUR

GLGC (Global Lipids Genetics Consortium)

Summary statistics
PUBMED_LINK
24097068
DESCRIPTION
Blood lipid trait GWAS meta-analysis summary statistics from the GLGC.
URL
http://csg.sph.umich.edu/willer/public/glgc-lipids2021/
TITLE
Discovery and refinement of loci associated with lipid levels.
Main citation
Willer CJ, Schmidt EM, Sengupta S, Peloso GM, ...&, Global Lipids Genetics Consortium. (2013) Discovery and refinement of loci associated with lipid levels. Nat Genet, 45 (11) 1274-1283. doi:10.1038/ng.2797. PMID 24097068
ABSTRACT
Levels of low-density lipoprotein (LDL) cholesterol, high-density lipoprotein (HDL) cholesterol, triglycerides and total cholesterol are heritable, modifiable risk factors for coronary artery disease. To identify new loci and refine known loci influencing these lipids, we examined 188,577 individuals using genome-wide and custom genotyping arrays. We identify and annotate 157 loci associated with lipid levels at P < 5 × 10(-8), including 62 loci not previously associated with lipid levels in humans. Using dense genotyping in individuals of European, East Asian, South Asian and African ancestry, we narrow association signals in 12 loci. We find that loci associated with blood lipid levels are often associated with cardiovascular and metabolic traits, including coronary artery disease, type 2 diabetes, blood pressure, waist-hip ratio and body mass index. Our results demonstrate the value of using genetic data from individuals of diverse ancestry and provide insights into the biological mechanisms regulating blood lipids to guide future genetic, biological and therapeutic research.
DOI
10.1038/ng.2797
MAIN ANCESTRY
Multi-ancestry

Global Biobank

Summary statistics
PUBMED_LINK
36777996
DESCRIPTION
Global Biobank Meta-analysis Initiative (GBMI) harmonized GWAS across many biobanks.
URL
http://results.globalbiobankmeta.org/
TITLE
Global Biobank Meta-analysis Initiative: Powering genetic discovery across human disease.
Main citation
Zhou W, Kanai M, Wu KH, Rasheed H, ...&, Neale BM. (2022) Global Biobank Meta-analysis Initiative: Powering genetic discovery across human disease. Cell Genom, 2 (10) 100192. doi:10.1016/j.xgen.2022.100192. PMID 36777996
ABSTRACT
Biobanks facilitate genome-wide association studies (GWASs), which have mapped genomic loci across a range of human diseases and traits. However, most biobanks are primarily composed of individuals of European ancestry. We introduce the Global Biobank Meta-analysis Initiative (GBMI)-a collaborative network of 23 biobanks from 4 continents representing more than 2.2 million consented individuals with genetic data linked to electronic health records. GBMI meta-analyzes summary statistics from GWASs generated using harmonized genotypes and phenotypes from member biobanks for 14 exemplar diseases and endpoints. This strategy validates that GWASs conducted in diverse biobanks can be integrated despite heterogeneity in case definitions, recruitment strategies, and baseline characteristics. This collaborative effort improves GWAS power for diseases, benefits understudied diseases, and improves risk prediction while also enabling the nomination of disease genes and drug candidates by incorporating gene and protein expression data and providing insight into the underlying biology of human diseases and traits.
DOI
10.1016/j.xgen.2022.100192
MAIN ANCESTRY
ALL

GTEx

Summary statistics
PUBMED_LINK
32913098
DESCRIPTION
V11 GTEx V11 updates the GTEx V10 data to use GENCODE 47 annotation. It contains no new samples or donors compared to V10.
URL
https://gtexportal.org/home/
TITLE
The GTEx Consortium atlas of genetic regulatory effects across human tissues.
Main citation
GTEx Consortium. (2020) The GTEx Consortium atlas of genetic regulatory effects across human tissues. Science, 369 (6509) 1318-1330. doi:10.1126/science.aaz1776. PMID 32913098
ABSTRACT
The Genotype-Tissue Expression (GTEx) project was established to characterize genetic effects on the transcriptome across human tissues and to link these regulatory mechanisms to trait and disease associations. Here, we present analyses of the version 8 data, examining 15,201 RNA-sequencing samples from 49 tissues of 838 postmortem donors. We comprehensively characterize genetic associations for gene expression and splicing in cis and trans, showing that regulatory associations are found for almost all genes, and describe the underlying molecular mechanisms and their contribution to allelic heterogeneity and pleiotropy of complex traits. Leveraging the large diversity of tissues, we provide insights into the tissue specificity of genetic effects and show that cell type composition is a key factor in understanding gene regulatory mechanisms in human tissues.
DOI
10.1126/science.aaz1776

GTEx

Summary statistics
PUBMED_LINK
32913098
DESCRIPTION
V8
TITLE
The GTEx Consortium atlas of genetic regulatory effects across human tissues.
Main citation
GTEx Consortium. (2020) The GTEx Consortium atlas of genetic regulatory effects across human tissues. Science, 369 (6509) 1318-1330. doi:10.1126/science.aaz1776. PMID 32913098
ABSTRACT
The Genotype-Tissue Expression (GTEx) project was established to characterize genetic effects on the transcriptome across human tissues and to link these regulatory mechanisms to trait and disease associations. Here, we present analyses of the version 8 data, examining 15,201 RNA-sequencing samples from 49 tissues of 838 postmortem donors. We comprehensively characterize genetic associations for gene expression and splicing in cis and trans, showing that regulatory associations are found for almost all genes, and describe the underlying molecular mechanisms and their contribution to allelic heterogeneity and pleiotropy of complex traits. Leveraging the large diversity of tissues, we provide insights into the tissue specificity of genetic effects and show that cell type composition is a key factor in understanding gene regulatory mechanisms in human tissues.
DOI
10.1126/science.aaz1776

GTEx

Summary statistics
PUBMED_LINK
35922509
DESCRIPTION
V9 long-read RNA-seq data
TITLE
Transcriptome variation in human tissues revealed by long-read sequencing.
Main citation
Glinos DA, Garborcauskas G, Hoffman P, Ehsan N, ...&, Cummings BB. (2022) Transcriptome variation in human tissues revealed by long-read sequencing. Nature, 608 (7922) 353-359. doi:10.1038/s41586-022-05035-y. PMID 35922509
ABSTRACT
Regulation of transcript structure generates transcript diversity and plays an important role in human disease1-7. The advent of long-read sequencing technologies offers the opportunity to study the role of genetic variation in transcript structure8-16. In this Article, we present a large human long-read RNA-seq dataset using the Oxford Nanopore Technologies platform from 88 samples from Genotype-Tissue Expression (GTEx) tissues and cell lines, complementing the GTEx resource. We identified just over 70,000 novel transcripts for annotated genes, and validated the protein expression of 10% of novel transcripts. We developed a new computational package, LORALS, to analyse the genetic effects of rare and common variants on the transcriptome by allele-specific analysis of long reads. We characterized allele-specific expression and transcript structure events, providing new insights into the specific transcript alterations caused by common and rare genetic variants and highlighting the resolution gained from long-read data. We were able to perturb the transcript structure upon knockdown of PTBP1, an RNA binding protein that mediates splicing, thereby finding genetic regulatory effects that are modified by the cellular environment. Finally, we used this dataset to enhance variant interpretation and study rare variants leading to aberrant splicing patterns.
DOI
10.1038/s41586-022-05035-y

GTEx

Summary statistics
PUBMED_LINK
23715323
DESCRIPTION
Project overview
TITLE
The Genotype-Tissue Expression (GTEx) project.
Main citation
GTEx Consortium. (2013) The Genotype-Tissue Expression (GTEx) project. Nat Genet, 45 (6) 580-5. doi:10.1038/ng.2653. PMID 23715323
ABSTRACT
Genome-wide association studies have identified thousands of loci for common diseases, but, for the majority of these, the mechanisms underlying disease susceptibility remain unknown. Most associated variants are not correlated with protein-coding changes, suggesting that polymorphisms in regulatory regions probably contribute to many disease phenotypes. Here we describe the Genotype-Tissue Expression (GTEx) project, which will establish a resource database and associated tissue bank for the scientific community to study the relationship between genetic variation and gene expression in human tissues.
DOI
10.1038/ng.2653

GTEx

Summary statistics
PUBMED_LINK
35549429
DESCRIPTION
V9 snRNA-Seq
TITLE
Single-nucleus cross-tissue molecular reference maps toward understanding disease gene function.
Main citation
Eraslan G, Drokhlyansky E, Anand S, Fiskin E, ...&, Regev A. (2022) Single-nucleus cross-tissue molecular reference maps toward understanding disease gene function. Science, 376 (6594) eabl4290. doi:10.1126/science.abl4290. PMID 35549429
ABSTRACT
Understanding gene function and regulation in homeostasis and disease requires knowledge of the cellular and tissue contexts in which genes are expressed. Here, we applied four single-nucleus RNA sequencing methods to eight diverse, archived, frozen tissue types from 16 donors and 25 samples, generating a cross-tissue atlas of 209,126 nuclei profiles, which we integrated across tissues, donors, and laboratory methods with a conditional variational autoencoder. Using the resulting cross-tissue atlas, we highlight shared and tissue-specific features of tissue-resident cell populations; identify cell types that might contribute to neuromuscular, metabolic, and immune components of monogenic diseases and the biological processes involved in their pathology; and determine cell types and gene modules that might underlie disease mechanisms for complex traits analyzed by genome-wide association studies.
DOI
10.1126/science.abl4290

Gudjonsson A, et al-35078996

Summary statistics
PUBMED_LINK
35078996
TITLE
A genome-wide association study of serum proteins reveals shared loci with common diseases.
Main citation
Gudjonsson A, Gudmundsdottir V, Axelsson GT, Gudmundsson EF, ...&, Gudnason V. (2022) A genome-wide association study of serum proteins reveals shared loci with common diseases. Nat Commun, 13 (1) 480. doi:10.1038/s41467-021-27850-z. PMID 35078996
ABSTRACT
With the growing number of genetic association studies, the genotype-phenotype atlas has become increasingly more complex, yet the functional consequences of most disease associated alleles is not understood. The measurement of protein level variation in solid tissues and biofluids integrated with genetic variants offers a path to deeper functional insights. Here we present a large-scale proteogenomic study in 5,368 individuals, revealing 4,035 independent associations between genetic variants and 2,091 serum proteins, of which 36% are previously unreported. The majority of both cis- and trans-acting genetic signals are unique for a single protein, although our results also highlight numerous highly pleiotropic genetic effects on protein levels and demonstrate that a protein's genetic association profile reflects certain characteristics of the protein, including its location in protein networks, tissue specificity and intolerance to loss of function mutations. Integrating protein measurements with deep phenotyping of the cohort, we observe substantial enrichment of phenotype associations for serum proteins regulated by established GWAS loci, and offer new insights into the interplay between genetics, serum protein levels and complex disease.
DOI
10.1038/s41467-021-27850-z

GWAS catalog

Summary statistics
PUBMED_LINK
36350656
DESCRIPTION
NHGRI–EBI GWAS Catalog — curated SNP–trait associations and deposition hub for full summary statistics.
URL
https://www.ebi.ac.uk/gwas/
TITLE
The NHGRI-EBI GWAS Catalog: knowledgebase and deposition resource.
Main citation
Sollis E, Mosaku A, Abid A, Buniello A, ...&, Harris LW. (2023) The NHGRI-EBI GWAS Catalog: knowledgebase and deposition resource. Nucleic Acids Res, 51 (D1) D977-D985. doi:10.1093/nar/gkac1010. PMID 36350656
ABSTRACT
The NHGRI-EBI GWAS Catalog (www.ebi.ac.uk/gwas) is a FAIR knowledgebase providing detailed, structured, standardised and interoperable genome-wide association study (GWAS) data to >200 000 users per year from academic research, healthcare and industry. The Catalog contains variant-trait associations and supporting metadata for >45 000 published GWAS across >5000 human traits, and >40 000 full P-value summary statistics datasets. Content is curated from publications or acquired via author submission of prepublication summary statistics through a new submission portal and validation tool. GWAS data volume has vastly increased in recent years. We have updated our software to meet this scaling challenge and to enable rapid release of submitted summary statistics. The scope of the repository has expanded to include additional data types of high interest to the community, including sequencing-based GWAS, gene-based analyses and copy number variation analyses. Community outreach has increased the number of shared datasets from under-represented traits, e.g. cancer, and we continue to contribute to awareness of the lack of population diversity in GWAS. Interoperability of the Catalog has been enhanced through links to other resources including the Polygenic Score Catalog and the International Mouse Phenotyping Consortium, refinements to GWAS trait annotation, and the development of a standard format for GWAS data.
DOI
10.1093/nar/gkac1010
MAIN ANCESTRY
Multi-ancestry

Haas ME-34957434

Summary statistics
PUBMED_LINK
34957434
TITLE
Machine learning enables new insights into genetic contributions to liver fat accumulation.
Main citation
Haas ME, Pirruccello JP, Friedman SN, Wang M, ...&, Khera AV. (2021) Machine learning enables new insights into genetic contributions to liver fat accumulation. Cell Genom, 1 (3) . doi:10.1016/j.xgen.2021.100066. PMID 34957434
ABSTRACT
Excess liver fat, called hepatic steatosis, is a leading risk factor for end-stage liver disease and cardiometabolic diseases but often remains undiagnosed in clinical practice because of the need for direct imaging assessments. We developed an abdominal MRI-based machine-learning algorithm to accurately estimate liver fat (correlation coefficients, 0.97-0.99) from a truth dataset of 4,511 middle-aged UK Biobank participants, enabling quantification in 32,192 additional individuals. 17% of participants had predicted liver fat levels indicative of steatosis, and liver fat could not have been reliably estimated based on clinical factors such as BMI. A genome-wide association study of common genetic variants and liver fat replicated three known associations and identified five newly associated variants in or near the MTARC1, ADH1B, TRIB1, GPAM, and MAST3 genes (p < 3 × 10-8). A polygenic score integrating these eight genetic variants was strongly associated with future risk of chronic liver disease (hazard ratio > 1.32 per SD score, p < 9 × 10-17). Rare inactivating variants in the APOB or MTTP genes were identified in 0.8% of individuals with steatosis and conferred more than 6-fold risk (p < 2 × 10-5), highlighting a molecular subtype of hepatic steatosis characterized by defective secretion of apolipoprotein B-containing lipoproteins. We demonstrate that our imaging-based machine-learning model accurately estimates liver fat and may be useful in epidemiological and genetic studies of hepatic steatosis.
DOI
10.1016/j.xgen.2021.100066
MAIN ANCESTRY
EUR

Hannon

Summary statistics
PUBMED_LINK
26619357
URL
https://epigenetics.essex.ac.uk/mQTL/
TITLE
Methylation QTLs in the developing brain and their enrichment in schizophrenia risk loci.
Main citation
Hannon E, Spiers H, Viana J, Pidsley R, ...&, Mill J. (2016) Methylation QTLs in the developing brain and their enrichment in schizophrenia risk loci. Nat Neurosci, 19 (1) 48-54. doi:10.1038/nn.4182. PMID 26619357
ABSTRACT
We characterized DNA methylation quantitative trait loci (mQTLs) in a large collection (n = 166) of human fetal brain samples spanning 56-166 d post-conception, identifying >16,000 fetal brain mQTLs. Fetal brain mQTLs were primarily cis-acting, enriched in regulatory chromatin domains and transcription factor binding sites, and showed substantial overlap with genetic variants that were also associated with gene expression in the brain. Using tissue from three distinct regions of the adult brain (prefrontal cortex, striatum and cerebellum), we found that most fetal brain mQTLs were developmentally stable, although a subset was characterized by fetal-specific effects. Fetal brain mQTLs were enriched amongst risk loci identified in a recent large-scale genome-wide association study (GWAS) of schizophrenia, a severe psychiatric disorder with a hypothesized neurodevelopmental component. Finally, we found that mQTLs can be used to refine GWAS loci through the identification of discrete sites of variable fetal brain methylation associated with schizophrenia risk variants.
DOI
10.1038/nn.4182

Hansson O, et al-36504281

Summary statistics
PUBMED_LINK
36504281
TITLE
The genetic regulation of protein expression in cerebrospinal fluid.
Main citation
Hansson O, Kumar A, Janelidze S, Stomrud E, ...&, Mattsson-Carlgren N. (2023) The genetic regulation of protein expression in cerebrospinal fluid. EMBO Mol Med, 15 (1) e16359. doi:10.15252/emmm.202216359. PMID 36504281
ABSTRACT
Studies of the genetic regulation of cerebrospinal fluid (CSF) proteins may reveal pathways for treatment of neurological diseases. 398 proteins in CSF were measured in 1,591 participants from the BioFINDER study. Protein quantitative trait loci (pQTL) were identified as associations between genetic variants and proteins, with 176 pQTLs for 145 CSF proteins (P < 1.25 × 10-10 , 117 cis-pQTLs and 59 trans-pQTLs). Ventricular volume (measured with brain magnetic resonance imaging) was a confounder for several pQTLs. pQTLs for CSF and plasma proteins were overall correlated, but CSF-specific pQTLs were also observed. Mendelian randomization analyses suggested causal roles for several proteins, for example, ApoE, CD33, and GRN in Alzheimer's disease, MMP-10 in preclinical Alzheimer's disease, SIGLEC9 in amyotrophic lateral sclerosis, and CD38, GPNMB, and ADAM15 in Parkinson's disease. CSF levels of GRN, MMP-10, and GPNMB were altered in Alzheimer's disease, preclinical Alzheimer's disease, and Parkinson's disease, respectively. These findings point to pathways to be explored for novel therapies. The novel finding that ventricular volume confounded pQTLs has implications for design of future studies of the genetic regulation of the CSF proteome.
DOI
10.15252/emmm.202216359

Hatton

Summary statistics
PUBMED_LINK
38548728
DESCRIPTION
cis DNAm QTLs in three European (n = 3701) and two East Asian (n = 2099) cohorts
URL
https://yanglab.westlake.edu.cn/software/smr/#mQTLsummarydata
TITLE
Genetic control of DNA methylation is largely shared across European and East Asian populations.
Main citation
Hatton AA, Cheng FF, Lin T, Shen RJ, ...&, McRae AF. (2024) Genetic control of DNA methylation is largely shared across European and East Asian populations. Nat Commun, 15 (1) 2713. doi:10.1038/s41467-024-47005-0. PMID 38548728
ABSTRACT
DNA methylation is an ideal trait to study the extent of the shared genetic control across ancestries, effectively providing hundreds of thousands of model molecular traits with large QTL effect sizes. We investigate cis DNAm QTLs in three European (n = 3701) and two East Asian (n = 2099) cohorts to quantify the similarities and differences in the genetic architecture across populations. We observe 80,394 associated mQTLs (62.2% of DNAm probes with significant mQTL) to be significant in both ancestries, while 28,925 mQTLs (22.4%) are identified in only a single ancestry. mQTL effect sizes are highly conserved across populations, with differences in mQTL discovery likely due to differences in allele frequency of associated variants and differing linkage disequilibrium between causal variants and assayed SNPs. This study highlights the overall similarity of genetic control across ancestries and the value of ancestral diversity in increasing the power to detect associations and enhancing fine mapping resolution.
DOI
10.1038/s41467-024-47005-0

Hillary RF, et al-31320639

Summary statistics
PUBMED_LINK
31320639
TITLE
Genome and epigenome wide studies of neurological protein biomarkers in the Lothian Birth Cohort 1936.
Main citation
Hillary RF, McCartney DL, Harris SE, Stevenson AJ, ...&, Marioni RE. (2019) Genome and epigenome wide studies of neurological protein biomarkers in the Lothian Birth Cohort 1936. Nat Commun, 10 (1) 3160. doi:10.1038/s41467-019-11177-x. PMID 31320639
ABSTRACT
Although plasma proteins may serve as markers of neurological disease risk, the molecular mechanisms responsible for inter-individual variation in plasma protein levels are poorly understood. Therefore, we conduct genome- and epigenome-wide association studies on the levels of 92 neurological proteins to identify genetic and epigenetic loci associated with their plasma concentrations (n = 750 healthy older adults). We identify 41 independent genome-wide significant (P < 5.4 × 10-10) loci for 33 proteins and 26 epigenome-wide significant (P < 3.9 × 10-10) sites associated with the levels of 9 proteins. Using this information, we identify biological pathways in which putative neurological biomarkers are implicated (neurological, immunological and extracellular matrix metabolic pathways). We also observe causal relationships (by Mendelian randomisation analysis) between changes in gene expression (DRAXIN, MDGA1 and KYNU), or DNA methylation profiles (MATN3, MDGA1 and NEP), and altered plasma protein levels. Together, this may help inform causal relationships between biomarkers and neurological diseases.
DOI
10.1038/s41467-019-11177-x

Hillary RF, et al-32641083

Summary statistics
PUBMED_LINK
32641083
TITLE
Multi-method genome- and epigenome-wide studies of inflammatory protein levels in healthy older adults.
Main citation
Hillary RF, Trejo-Banos D, Kousathanas A, McCartney DL, ...&, Marioni RE. (2020) Multi-method genome- and epigenome-wide studies of inflammatory protein levels in healthy older adults. Genome Med, 12 (1) 60. doi:10.1186/s13073-020-00754-1. PMID 32641083
ABSTRACT
BACKGROUND: The molecular factors which control circulating levels of inflammatory proteins are not well understood. Furthermore, association studies between molecular probes and human traits are often performed by linear model-based methods which may fail to account for complex structure and interrelationships within molecular datasets. METHODS: In this study, we perform genome- and epigenome-wide association studies (GWAS/EWAS) on the levels of 70 plasma-derived inflammatory protein biomarkers in healthy older adults (Lothian Birth Cohort 1936; n = 876; Olink® inflammation panel). We employ a Bayesian framework (BayesR+) which can account for issues pertaining to data structure and unknown confounding variables (with sensitivity analyses using ordinary least squares- (OLS) and mixed model-based approaches). RESULTS: We identified 13 SNPs associated with 13 proteins (n = 1 SNP each) concordant across OLS and Bayesian methods. We identified 3 CpG sites spread across 3 proteins (n = 1 CpG each) that were concordant across OLS, mixed-model and Bayesian analyses. Tagged genetic variants accounted for up to 45% of variance in protein levels (for MCP2, 36% of variance alone attributable to 1 polymorphism). Methylation data accounted for up to 46% of variation in protein levels (for CXCL10). Up to 66% of variation in protein levels (for VEGFA) was explained using genetic and epigenetic data combined. We demonstrated putative causal relationships between CD6 and IL18R1 with inflammatory bowel disease and between IL12B and Crohn's disease. CONCLUSIONS: Our data may aid understanding of the molecular regulation of the circulating inflammatory proteome as well as causal relationships between inflammatory mediators and disease.
DOI
10.1186/s13073-020-00754-1

Huang YJ-38762475

Summary statistics
PUBMED_LINK
38762475
DESCRIPTION
ABD,
carotid artery ultrasonography (CAU), BMD, ECG, and thyroid ultra- sonography (TU) : 28 ABD features, 29 CAU features, 85 BMD features, and 10 ECG features
TITLE
AI-enhanced integration of genetic and medical imaging data for risk assessment of Type 2 diabetes.
Main citation
Huang YJ, Chen CH, Yang HC. (2024) AI-enhanced integration of genetic and medical imaging data for risk assessment of Type 2 diabetes. Nat Commun, 15 (1) 4230. doi:10.1038/s41467-024-48618-1. PMID 38762475
ABSTRACT
Type 2 diabetes (T2D) presents a formidable global health challenge, highlighted by its escalating prevalence, underscoring the critical need for precision health strategies and early detection initiatives. Leveraging artificial intelligence, particularly eXtreme Gradient Boosting (XGBoost), we devise robust risk assessment models for T2D. Drawing upon comprehensive genetic and medical imaging datasets from 68,911 individuals in the Taiwan Biobank, our models integrate Polygenic Risk Scores (PRS), Multi-image Risk Scores (MRS), and demographic variables, such as age, sex, and T2D family history. Here, we show that our model achieves an Area Under the Receiver Operating Curve (AUC) of 0.94, effectively identifying high-risk T2D subgroups. A streamlined model featuring eight key variables also maintains a high AUC of 0.939. This high accuracy for T2D risk assessment promises to catalyze early detection and preventive strategies. Moreover, we introduce an accessible online risk assessment tool for T2D, facilitating broader applicability and dissemination of our findings.
DOI
10.1038/s41467-024-48618-1
MAIN ANCESTRY
EAS

Ikram M.-28627999

Summary statistics
PUBMED_LINK
28627999
TITLE
Heritability and genome-wide associations studies of cerebral blood flow in the general population.
Main citation
Ikram MA, Zonneveld HI, Roshchupkin G, Smith AV, ...&, Adams HH. (2018) Heritability and genome-wide associations studies of cerebral blood flow in the general population. J Cereb Blood Flow Metab, 38 (9) 1598-1608. doi:10.1177/0271678X17715861. PMID 28627999
ABSTRACT
Cerebral blood flow is an important process for brain functioning and its dysregulation is implicated in multiple neurological disorders. While environmental risk factors have been identified, it remains unclear to what extent the flow is regulated by genetics. Here we performed heritability and genome-wide association analyses of cerebral blood flow in a population-based cohort study. We included 4472 persons free of cortical infarcts who underwent genotyping and phase-contrast magnetic resonance flow imaging (mean age 64.8 ± 10.8 years). The flow rate, cross-sectional area of the vessel, and flow velocity through the vessel were measured in the basilar artery and bilateral carotids. We found that the flow rate of the basilar artery is most heritable (h2 (SE) = 24.1 (9.8), p-value = 0.0056), and this increased over age. The association studies revealed two significant loci for the right carotid artery area (rs12546630, p-value = 2.0 × 10-8) and velocity (rs2971609, p-value = 1.4 × 10-8), with the latter showing a concordant effect in an independent sample (N = 1350, p-value = 0.057, meta-analyzed p-value = 2.5 × 10-9). These loci were also associated with other cerebral blood flow parameters below genome-wide significance, and rs2971609 lies in a known migraine locus. These findings establish that cerebral blood flow is under genetic control with potential relevance for neurological diseases.
DOI
10.1177/0271678X17715861
MAIN ANCESTRY
EUR

Ishigaki

Summary statistics
PUBMED_LINK
28553958
TITLE
Polygenic burdens on cell-specific pathways underlie the risk of rheumatoid arthritis.
Main citation
Ishigaki K, Kochi Y, Suzuki A, Tsuchida Y, ...&, Yamamoto K. (2017) Polygenic burdens on cell-specific pathways underlie the risk of rheumatoid arthritis. Nat Genet, 49 (7) 1120-1125. doi:10.1038/ng.3885. PMID 28553958
ABSTRACT
Recent evidence suggests that a substantial portion of complex disease risk alleles modify gene expression in a cell-specific manner. To identify candidate causal genes and biological pathways of immune-related complex diseases, we conducted expression quantitative trait loci (eQTL) analysis on five subsets of immune cells (CD4+ T cells, CD8+ T cells, B cells, natural killer (NK) cells and monocytes) and unfractionated peripheral blood from 105 healthy Japanese volunteers. We developed a three-step analytical pipeline comprising (i) prediction of individual gene expression using our eQTL database and public epigenomic data, (ii) gene-level association analysis and (iii) prediction of cell-specific pathway activity by integrating the direction of eQTL effects. By applying this pipeline to rheumatoid arthritis data sets, we identified candidate causal genes and a cytokine pathway (upregulation of tumor necrosis factor (TNF) in CD4+ T cells). Our approach is an efficient way to characterize the polygenic contributions and potential biological mechanisms of complex diseases.
DOI
10.1038/ng.3885

Japan Omics Browser

Summary statistics
PUBMED_LINK
40335902
DESCRIPTION
Japan Omics Browser (JOB) for browsing omics and GWAS-style association results in Japanese cohorts.
URL
https://japan-omics.jp/
TITLE
JOB: Japan Omics Browser provides integrative visualization of multi-omics data.
Main citation
Takahashi Y, Wang QS, Hasegawa T, Namkoong H, ...&, Japan COVID-19 Task Force. (2025) JOB: Japan Omics Browser provides integrative visualization of multi-omics data. BMC Genomics, 26 (1) 451. doi:10.1186/s12864-025-11639-1. PMID 40335902
ABSTRACT
We present the Japan Omics Browser (JOB), which enables integrative analysis of human omics at different layers. JOB offers visualization of per-variant regulatory effects in the human blood at mRNA and protein level distinctively, quantified from statistical fine-mapping of mRNA-expression quantitative loci (eQTL) and protein QTLs (pQTLs) in 1,405 Japanese, together with fine-mapping results of 94 complex traits in UK Biobank. In addition, JOB shows per-tissue regulatory effect prediction score (EMS), trained via multi-task learning. Furthermore, validation scores from Massively Parallel Reporter Assay (MPRA) in two cell types are available for over 10,000 variants. JOB is publicly available at https://japan-omics.jp/ .
DOI
10.1186/s12864-025-11639-1
RELATED_BIOBANK
The Japan COVID-19 Task Force study
MAIN ANCESTRY
EAS

JCTF

Summary statistics
PUBMED_LINK
39317738
DESCRIPTION
Japan COVID-19 Task Force
TITLE
Statistically and functionally fine-mapped blood eQTLs and pQTLs from 1,405 humans reveal distinct regulation patterns and disease relevance.
Main citation
Wang QS, Hasegawa T, Namkoong H, Saiki R, ...&, Japan COVID-19 Task Force. (2024) Statistically and functionally fine-mapped blood eQTLs and pQTLs from 1,405 humans reveal distinct regulation patterns and disease relevance. Nat Genet, 56 (10) 2054-2067. doi:10.1038/s41588-024-01896-3. PMID 39317738
ABSTRACT
Studying the genetic regulation of protein expression (through protein quantitative trait loci (pQTLs)) offers a deeper understanding of regulatory variants uncharacterized by mRNA expression regulation (expression QTLs (eQTLs)) studies. Here we report cis-eQTL and cis-pQTL statistical fine-mapping from 1,405 genotyped samples with blood mRNA and 2,932 plasma samples of protein expression, as part of the Japan COVID-19 Task Force (JCTF). Fine-mapped eQTLs (n = 3,464) were enriched for 932 variants validated with a massively parallel reporter assay. Fine-mapped pQTLs (n = 582) were enriched for missense variations on structured and extracellular domains, although the possibility of epitope-binding artifacts remains. Trans-eQTL and trans-pQTL analysis highlighted associations of class I HLA allele variation with KIR genes. We contrast the multi-tissue origin of plasma protein with blood mRNA, contributing to the limited colocalization level, distinct regulatory mechanisms and trait relevance of eQTLs and pQTLs. We report a negative correlation between ABO mRNA and protein expression because of linkage disequilibrium between distinct nearby eQTLs and pQTLs.
DOI
10.1038/s41588-024-01896-3

Johansson Å, et al-23487758

Summary statistics
PUBMED_LINK
23487758
TITLE
Identification of genetic variants influencing the human plasma proteome.
Main citation
Johansson Å, Enroth S, Palmblad M, Deelder AM, ...&, Gyllensten U. (2013) Identification of genetic variants influencing the human plasma proteome. Proc Natl Acad Sci U S A, 110 (12) 4673-8. doi:10.1073/pnas.1217238110. PMID 23487758
ABSTRACT
Genetic variants influencing the transcriptome have been extensively studied. However, the impact of the genetic factors on the human proteome is largely unexplored, mainly due to lack of suitable high-throughput methods. Here we present unique and comprehensive identification of genetic variants affecting the human plasma protein profile by combining high-throughput and high-resolution mass spectrometry (MS) with genome-wide SNP data. We identified and quantified the abundance of 1,056 tryptic-digested peptides, representing 163 proteins in the plasma of 1,060 individuals from two population-based cohorts. The abundance level of almost one-fifth (19%) of the peptides was found to be heritable, with heritability ranging from 0.08 to 0.43. The levels of 60 peptides from 25 proteins, 15% of the proteins studied, were influenced by cis-acting SNPs. We identified and replicated individual cis-acting SNPs (combined P value ranging from 3.1 × 10(-52) to 2.9 × 10(-12)) influencing 11 peptides from 5 individual proteins. These SNPs represent both regulatory SNPs and nonsynonymous changes defining well-studied disease alleles such as the ε4 allele of apolipoprotein E (APOE), which has been shown to increase risk of Alzheimer's disease. Our results show that high-throughput mass spectrometry represents a promising method for large-scale characterization of the human proteome, allowing for both quantification and sequencing of individual proteins. Abundance and peptide composition of a protein plays an important role in the etiology, diagnosis, and treatment of a number of diseases. A better understanding of the genetic impact on the plasma proteome is therefore important for evaluating potential biomarkers and therapeutic agents for common diseases.
DOI
10.1073/pnas.1217238110

Karkar S-33664500

Summary statistics
PUBMED_LINK
33664500
TITLE
Genome-wide haplotype association study in imaging genetics using whole-brain sulcal openings of 16,304 UK Biobank subjects.
Main citation
Karkar S, Dandine-Roulland C, Mangin JF, Le Guen Y, ...&, Frouin V. (2021) Genome-wide haplotype association study in imaging genetics using whole-brain sulcal openings of 16,304 UK Biobank subjects. Eur J Hum Genet, 29 (9) 1424-1437. doi:10.1038/s41431-021-00827-8. PMID 33664500
ABSTRACT
Neuroimaging-genetics cohorts gather two types of data: brain imaging and genetic data. They allow the discovery of associations between genetic variants and brain imaging features. They are invaluable resources to study the influence of genetics and environment in the brain features variance observed in normal and pathological populations. This study presents a genome-wide haplotype analysis for 123 brain sulcus opening value (a measure of sulcal width) across the whole brain that include 16,304 subjects from UK Biobank. Using genetic maps, we defined 119,548 blocks of low recombination rate distributed along the 22 autosomal chromosomes and analyzed 1,051,316 haplotypes. To test associations between haplotypes and complex traits, we designed three statistical approaches. Two of them use a model that includes all the haplotypes for a single block, while the last approach considers each haplotype independently. All the statistics produced were assessed as rigorously as possible. Thanks to the rich imaging dataset at hand, we used resampling techniques to assess False Positive Rate for each statistical approach in a genome-wide and brain-wide context. The results on real data show that genome-wide haplotype analyses are more sensitive than single-SNP approach and account for local complex Linkage Disequilibrium (LD) structure, which makes genome-wide haplotype analysis an interesting and statistically sound alternative to the single-SNP counterpart.
DOI
10.1038/s41431-021-00827-8
MAIN ANCESTRY
EUR

Katz DH, et al-34814699

Summary statistics
PUBMED_LINK
34814699
TITLE
Whole Genome Sequence Analysis of the Plasma Proteome in Black Adults Provides Novel Insights Into Cardiovascular Disease.
Main citation
Katz DH, Tahir UA, Bick AG, Pampana A, ...&, and Blood Institute TOPMed (Trans-Omics for Precision Medicine) Consortium†. (2022) Whole Genome Sequence Analysis of the Plasma Proteome in Black Adults Provides Novel Insights Into Cardiovascular Disease. Circulation, 145 (5) 357-370. doi:10.1161/CIRCULATIONAHA.121.055117. PMID 34814699
ABSTRACT
BACKGROUND: Plasma proteins are critical mediators of cardiovascular processes and are the targets of many drugs. Previous efforts to characterize the genetic architecture of the plasma proteome have been limited by a focus on individuals of European descent and leveraged genotyping arrays and imputation. Here we describe whole genome sequence analysis of the plasma proteome in individuals with greater African ancestry, increasing our power to identify novel genetic determinants. METHODS: Proteomic profiling of 1301 proteins was performed in 1852 Black adults from the Jackson Heart Study using aptamer-based proteomics (SomaScan). Whole genome sequencing association analysis was ascertained for all variants with minor allele count ≥5. Results were validated using an alternative, antibody-based, proteomic platform (Olink) as well as replicated in the Multi-Ethnic Study of Atherosclerosis and the HERITAGE Family Study (Health, Risk Factors, Exercise Training and Genetics). RESULTS: We identify 569 genetic associations between 479 proteins and 438 unique genetic regions at a Bonferroni-adjusted significance level of 3.8×10-11. These associations include 114 novel locus-protein relationships and an additional 217 novel sentinel variant-protein relationships. Novel cardiovascular findings include new protein associations at the APOE gene locus including ZAP70 (sentinel single nucleotide polymorphism [SNP] rs7412-T, β=0.61±0.05, P=3.27×10-30) and MMP-3 (β=-0.60±0.05, P=1.67×10-32), as well as a completely novel pleiotropic locus at the HPX gene, associated with 9 proteins. Further, the associations suggest new mechanisms of genetically mediated cardiovascular disease linked to African ancestry; we identify a novel association between variants linked to APOL1-associated chronic kidney and heart disease and the protein CKAP2 (rs73885319-G, β=0.34±0.04, P=1.34×10-17) as well as an association between ATTR amyloidosis and RBP4 levels in community-dwelling individuals without heart failure. CONCLUSIONS: Taken together, these results provide evidence for the functional importance of variants in non-European populations, and suggest new biological mechanisms for ancestry-specific determinants of lipids, coagulation, and myocardial function.
DOI
10.1161/CIRCULATIONAHA.121.055117

Katz DH, et al-35984888

Summary statistics
PUBMED_LINK
35984888
TITLE
Proteomic profiling platforms head to head: Leveraging genetics and clinical traits to compare aptamer- and antibody-based methods.
Main citation
Katz DH, Robbins JM, Deng S, Tahir UA, ...&, Gerszten RE. (2022) Proteomic profiling platforms head to head: Leveraging genetics and clinical traits to compare aptamer- and antibody-based methods. Sci Adv, 8 (33) eabm5164. doi:10.1126/sciadv.abm5164. PMID 35984888
ABSTRACT
High-throughput proteomic profiling using antibody or aptamer-based affinity reagents is used increasingly in human studies. However, direct analyses to address the relative strengths and weaknesses of these platforms are lacking. We assessed findings from the SomaScan1.3K (N = 1301 reagents), the SomaScan5K platform (N = 4979 reagents), and the Olink Explore (N = 1472 reagents) profiling techniques in 568 adults from the Jackson Heart Study and 219 participants in the HERITAGE Family Study across four performance domains: precision, accuracy, analytic breadth, and phenotypic associations leveraging detailed clinical phenotyping and genetic data. Across these studies, we show evidence supporting more reliable protein target specificity and a higher number of phenotypic associations for the Olink platform, while the Soma platforms benefit from greater measurement precision and analytic breadth across the proteome.
DOI
10.1126/sciadv.abm5164

Kauwe JS, et al-25340798

Summary statistics
PUBMED_LINK
25340798
TITLE
Genome-wide association study of CSF levels of 59 alzheimer's disease candidate proteins: significant associations with proteins involved in amyloid processing and inflammation.
Main citation
Kauwe JS, Bailey MH, Ridge PG, Perry R, ...&, Goate AM. (2014) Genome-wide association study of CSF levels of 59 alzheimer's disease candidate proteins: significant associations with proteins involved in amyloid processing and inflammation. PLoS Genet, 10 (10) e1004758. doi:10.1371/journal.pgen.1004758. PMID 25340798
ABSTRACT
Cerebrospinal fluid (CSF) 42 amino acid species of amyloid beta (Aβ42) and tau levels are strongly correlated with the presence of Alzheimer's disease (AD) neuropathology including amyloid plaques and neurodegeneration and have been successfully used as endophenotypes for genetic studies of AD. Additional CSF analytes may also serve as useful endophenotypes that capture other aspects of AD pathophysiology. Here we have conducted a genome-wide association study of CSF levels of 59 AD-related analytes. All analytes were measured using the Rules Based Medicine Human DiscoveryMAP Panel, which includes analytes relevant to several disease-related processes. Data from two independently collected and measured datasets, the Knight Alzheimer's Disease Research Center (ADRC) and Alzheimer's Disease Neuroimaging Initiative (ADNI), were analyzed separately, and combined results were obtained using meta-analysis. We identified genetic associations with CSF levels of 5 proteins (Angiotensin-converting enzyme (ACE), Chemokine (C-C motif) ligand 2 (CCL2), Chemokine (C-C motif) ligand 4 (CCL4), Interleukin 6 receptor (IL6R) and Matrix metalloproteinase-3 (MMP3)) with study-wide significant p-values (p<1.46×10-10) and significant, consistent evidence for association in both the Knight ADRC and the ADNI samples. These proteins are involved in amyloid processing and pro-inflammatory signaling. SNPs associated with ACE, IL6R and MMP3 protein levels are located within the coding regions of the corresponding structural gene. The SNPs associated with CSF levels of CCL4 and CCL2 are located in known chemokine binding proteins. The genetic associations reported here are novel and suggest mechanisms for genetic control of CSF and plasma levels of these disease-related proteins. Significant SNPs in ACE and MMP3 also showed association with AD risk. Our findings suggest that these proteins/pathways may be valuable therapeutic targets for AD. Robust associations in cognitively normal individuals suggest that these SNPs also influence regulation of these proteins more generally and may therefore be relevant to other diseases.
DOI
10.1371/journal.pgen.1004758

Khurshid S-36944631

Summary statistics
PUBMED_LINK
36944631
TITLE
Clinical and genetic associations of deep learning-derived cardiac magnetic resonance-based left ventricular mass.
Main citation
Khurshid S, Lazarte J, Pirruccello JP, Weng LC, ...&, Lubitz SA. (2023) Clinical and genetic associations of deep learning-derived cardiac magnetic resonance-based left ventricular mass. Nat Commun, 14 (1) 1558. doi:10.1038/s41467-023-37173-w. PMID 36944631
ABSTRACT
Left ventricular mass is a risk marker for cardiovascular events, and may indicate an underlying cardiomyopathy. Cardiac magnetic resonance is the gold-standard for left ventricular mass estimation, but is challenging to obtain at scale. Here, we use deep learning to enable genome-wide association study of cardiac magnetic resonance-derived left ventricular mass indexed to body surface area within 43,230 UK Biobank participants. We identify 12 genome-wide associations (1 known at TTN and 11 novel for left ventricular mass), implicating genes previously associated with cardiac contractility and cardiomyopathy. Cardiac magnetic resonance-derived indexed left ventricular mass is associated with incident dilated and hypertrophic cardiomyopathies, and implantable cardioverter-defibrillator implant. An indexed left ventricular mass polygenic risk score ≥90th percentile is also associated with incident implantable cardioverter-defibrillator implant in separate UK Biobank (hazard ratio 1.22, 95% CI 1.05-1.44) and Mass General Brigham (hazard ratio 1.75, 95% CI 1.12-2.74) samples. Here, we perform a genome-wide association study of cardiac magnetic resonance-derived indexed left ventricular mass to identify 11 novel variants and demonstrate that cardiac magnetic resonance-derived and genetically predicted indexed left ventricular mass are associated with incident cardiomyopathy.
DOI
10.1038/s41467-023-37173-w
MAIN ANCESTRY
EUR

Kim S, et al-23894628

Summary statistics
PUBMED_LINK
23894628
TITLE
Influence of genetic variation on plasma protein levels in older adults using a multi-analyte panel.
Main citation
Kim S, Swaminathan S, Inlow M, Risacher SL, ...&, Alzheimer’s Disease Neuroimaging Initiative (ADNI). (2013) Influence of genetic variation on plasma protein levels in older adults using a multi-analyte panel. PLoS One, 8 (7) e70269. doi:10.1371/journal.pone.0070269. PMID 23894628
ABSTRACT
Proteins, widely studied as potential biomarkers, play important roles in numerous physiological functions and diseases. Genetic variation may modulate corresponding protein levels and point to the role of these variants in disease pathophysiology. Effects of individual single nucleotide polymorphisms (SNPs) within a gene were analyzed for corresponding plasma protein levels using genome-wide association study (GWAS) genotype data and proteomic panel data with 132 quality-controlled analytes from 521 Caucasian participants in the Alzheimer's Disease Neuroimaging Initiative (ADNI) cohort. Linear regression analysis detected 112 significant (Bonferroni threshold p=2.44×10(-5)) associations between 27 analytes and 112 SNPs. 107 out of these 112 associations were tested in the Indiana Memory and Aging Study (IMAS) cohort for replication and 50 associations were replicated at uncorrected p<0.05 in the same direction of effect as those in the ADNI. We identified multiple novel associations including the association of rs7517126 with plasma complement factor H-related protein 1 (CFHR1) level at p<1.46×10(-60), accounting for 40 percent of total variation of the protein level. We serendipitously found the association of rs6677604 with the same protein at p<9.29×10(-112). Although these two SNPs were not in the strong linkage disequilibrium, 61 percent of total variation of CFHR1 was accounted for by rs6677604 without additional variation by rs7517126 when both SNPs were tested together. 78 other SNP-protein associations in the ADNI sample exceeded genome-wide significance (5×10(-8)). Our results confirmed previously identified gene-protein associations for interleukin-6 receptor, chemokine CC-4, angiotensin-converting enzyme, and angiotensinogen, although the direction of effect was reversed in some cases. This study is among the first analyses of gene-protein product relationships integrating multiplex-panel proteomics and targeted genes extracted from a GWAS array. With intensive searches taking place for proteomic biomarkers for many diseases, the role of genetic variation takes on new importance and should be considered in interpretation of proteomic results.
DOI
10.1371/journal.pone.0070269

Kirchler M-35640976 (transferGWAS)

Summary statistics
PUBMED_LINK
35640976
DESCRIPTION
transferGWAS is a method for performing genome-wide association studies on whole images.
URL
https://github.com/mkirchler/transferGWAS/
TITLE
transferGWAS: GWAS of images using deep transfer learning.
Main citation
Kirchler M, Konigorski S, Norden M, Meltendorf C, ...&, Lippert C. (2022) transferGWAS: GWAS of images using deep transfer learning. Bioinformatics, 38 (14) 3621-3628. doi:10.1093/bioinformatics/btac369. PMID 35640976
ABSTRACT
MOTIVATION: Medical images can provide rich information about diseases and their biology. However, investigating their association with genetic variation requires non-standard methods. We propose transferGWAS, a novel approach to perform genome-wide association studies directly on full medical images. First, we learn semantically meaningful representations of the images based on a transfer learning task, during which a deep neural network is trained on independent but similar data. Then, we perform genetic association tests with these representations. RESULTS: We validate the type I error rates and power of transferGWAS in simulation studies of synthetic images. Then we apply transferGWAS in a genome-wide association study of retinal fundus images from the UK Biobank. This first-of-a-kind GWAS of full imaging data yielded 60 genomic regions associated with retinal fundus images, of which 7 are novel candidate loci for eye-related traits and diseases. AVAILABILITY AND IMPLEMENTATION: Our method is implemented in Python and available at https://github.com/mkirchler/transferGWAS/. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.
DOI
10.1093/bioinformatics/btac369

Koprulu M, et al-36823471

Summary statistics
PUBMED_LINK
36823471
TITLE
Proteogenomic links to human metabolic diseases.
Main citation
Koprulu M, Carrasco-Zanini J, Wheeler E, Lockhart S, ...&, Langenberg C. (2023) Proteogenomic links to human metabolic diseases. Nat Metab, 5 (3) 516-528. doi:10.1038/s42255-023-00753-7. PMID 36823471
ABSTRACT
Studying the plasma proteome as the intermediate layer between the genome and the phenome has the potential to identify new disease processes. Here, we conducted a cis-focused proteogenomic analysis of 2,923 plasma proteins measured in 1,180 individuals using antibody-based assays. We (1) identify 256 unreported protein quantitative trait loci (pQTL); (2) demonstrate shared genetic regulation of 224 cis-pQTLs with 575 specific health outcomes, revealing examples for notable metabolic diseases (such as gastrin-releasing peptide as a potential therapeutic target for type 2 diabetes); (3) improve causal gene assignment at 40% (n = 192) of overlapping risk loci; and (4) observe convergence of phenotypic consequences of cis-pQTLs and rare loss-of-function gene burden for 12 proteins, such as TIMD4 for lipoprotein metabolism. Our findings demonstrate the value of integrating complementary proteomic technologies with genomics even at moderate scale to identify new mediators of metabolic diseases with the potential for therapeutic interventions.
DOI
10.1038/s42255-023-00753-7

KoreanChip

Summary statistics
PUBMED_LINK
30718733
DESCRIPTION
GWAS summary statistics based on the Korea Biobank Array (KoreanChip / KoGES).
URL
https://www.koreanchip.org/downloads
TITLE
The Korea Biobank Array: Design and Identification of Coding Variants Associated with Blood Biochemical Traits.
Main citation
Moon S, Kim YJ, Han S, Hwang MY, ...&, Kim BJ. (2019) The Korea Biobank Array: Design and Identification of Coding Variants Associated with Blood Biochemical Traits. Sci Rep, 9 (1) 1382. doi:10.1038/s41598-018-37832-9. PMID 30718733
ABSTRACT
We introduce the design and implementation of a new array, the Korea Biobank Array (referred to as KoreanChip), optimized for the Korean population and demonstrate findings from GWAS of blood biochemical traits. KoreanChip comprised >833,000 markers including >247,000 rare-frequency or functional variants estimated from >2,500 sequencing data in Koreans. Of the 833 K markers, 208 K functional markers were directly genotyped. Particularly, >89 K markers were presented in East Asians. KoreanChip achieved higher imputation performance owing to the excellent genomic coverage of 95.38% for common and 73.65% for low-frequency variants. From GWAS (Genome-wide association study) using 6,949 individuals, 28 associations were successfully recapitulated. Moreover, 9 missense variants were newly identified, of which we identified new associations between a common population-specific missense variant, rs671 (p.Glu457Lys) of ALDH2, and two traits including aspartate aminotransferase (P = 5.20 × 10-13) and alanine aminotransferase (P = 4.98 × 10-8). Furthermore, two novel missense variants of GPT with rare frequency in East Asians but extreme rarity in other populations were associated with alanine aminotransferase (rs200088103; p.Arg133Trp, P = 2.02 × 10-9 and rs748547625; p.Arg143Cys, P = 1.41 × 10-6). These variants were successfully replicated in 6,000 individuals (P = 5.30 × 10-8 and P = 1.24 × 10-6). GWAS results suggest the promising utility of KoreanChip with a substantial number of damaging variants to identify new population-specific disease-associated rare/functional variants.
DOI
10.1038/s41598-018-37832-9
MAIN ANCESTRY
EAS

Krishna C, et al-39085222

Summary statistics
PUBMED_LINK
39085222
TITLE
The influence of HLA genetic variation on plasma protein expression.
Main citation
Krishna C, Chiou J, Sakaue S, Kang JB, ...&, Hu X. (2024) The influence of HLA genetic variation on plasma protein expression. Nat Commun, 15 (1) 6469. doi:10.1038/s41467-024-50583-8. PMID 39085222
ABSTRACT
Genetic variation in the human leukocyte antigen (HLA) loci is associated with risk of immune-mediated diseases, but the molecular effects of HLA polymorphism are unclear. Here we examined the effects of HLA genetic variation on the expression of 2940 plasma proteins across 45,330 Europeans in the UK Biobank, with replication analyses across multiple ancestry groups. We detected 504 proteins affected by HLA variants (HLA-pQTL), including widespread trans effects by autoimmune disease risk alleles. More than 80% of the HLA-pQTL fine-mapped to amino acid positions in the peptide binding groove. HLA-I and II affected proteins expressed in similar cell types but in different pathways of both adaptive and innate immunity. Finally, we investigated potential HLA-pQTL effects on disease by integrating HLA-pQTL with fine-mapped HLA-disease signals in the UK Biobank. Our data reveal the diverse effects of HLA genetic variation and aid the interpretation of associations between HLA alleles and immune-mediated diseases.
DOI
10.1038/s41467-024-50583-8
RELATED_BIOBANK
UK Biobank
MAIN ANCESTRY
EUR

Littlejohns TJ-32457287

Summary statistics
PUBMED_LINK
32457287
DESCRIPTION
brain, cardiac and abdominal magnetic resonance imaging, dual-energy X-ray absorptiometry and carotid ultrasound
TITLE
The UK Biobank imaging enhancement of 100,000 participants: rationale, data collection, management and future directions.
Main citation
Littlejohns TJ, Holliday J, Gibson LM, Garratt S, ...&, Allen NE. (2020) The UK Biobank imaging enhancement of 100,000 participants: rationale, data collection, management and future directions. Nat Commun, 11 (1) 2624. doi:10.1038/s41467-020-15948-9. PMID 32457287
ABSTRACT
UK Biobank is a population-based cohort of half a million participants aged 40-69 years recruited between 2006 and 2010. In 2014, UK Biobank started the world's largest multi-modal imaging study, with the aim of re-inviting 100,000 participants to undergo brain, cardiac and abdominal magnetic resonance imaging, dual-energy X-ray absorptiometry and carotid ultrasound. The combination of large-scale multi-modal imaging with extensive phenotypic and genetic data offers an unprecedented resource for scientists to conduct health-related research. This article provides an in-depth overview of the imaging enhancement, including the data collected, how it is managed and processed, and future directions.
DOI
10.1038/s41467-020-15948-9
MAIN ANCESTRY
EUR

Liu F-23028347

Summary statistics
PUBMED_LINK
23028347
TITLE
A genome-wide association study identifies five loci influencing facial morphology in Europeans.
Main citation
Liu F, van der Lijn F, Schurmann C, Zhu G, ...&, Kayser M. (2012) A genome-wide association study identifies five loci influencing facial morphology in Europeans. PLoS Genet, 8 (9) e1002932. doi:10.1371/journal.pgen.1002932. PMID 23028347
ABSTRACT
Inter-individual variation in facial shape is one of the most noticeable phenotypes in humans, and it is clearly under genetic regulation; however, almost nothing is known about the genetic basis of normal human facial morphology. We therefore conducted a genome-wide association study for facial shape phenotypes in multiple discovery and replication cohorts, considering almost ten thousand individuals of European descent from several countries. Phenotyping of facial shape features was based on landmark data obtained from three-dimensional head magnetic resonance images (MRIs) and two-dimensional portrait images. We identified five independent genetic loci associated with different facial phenotypes, suggesting the involvement of five candidate genes--PRDM16, PAX3, TP63, C5orf50, and COL17A1--in the determination of the human face. Three of them have been implicated previously in vertebrate craniofacial development and disease, and the remaining two genes potentially represent novel players in the molecular networks governing facial development. Our finding at PAX3 influencing the position of the nasion replicates a recent GWAS of facial features. In addition to the reported GWA findings, we established links between common DNA variants previously associated with NSCL/P at 2p21, 8q24, 13q31, and 17q22 and normal facial-shape variations based on a candidate gene approach. Overall our study implies that DNA variants in genes essential for craniofacial development contribute with relatively small effect size to the spectrum of normal variation in human facial morphology. This observation has important consequences for future studies aiming to identify more genes involved in the human facial morphology, as well as for potential applications of DNA prediction of facial shape such as in future forensic applications.
DOI
10.1371/journal.pgen.1002932
MAIN ANCESTRY
EUR

Liu M-38038215

Summary statistics
PUBMED_LINK
38038215
TITLE
Chromosome 10q24.32 Variants Associate With Brain Arterial Diameters in Diverse Populations: A Genome-Wide Association Study.
Main citation
Liu M, Khasiyev F, Sariya S, Spagnolo-Allende A, ...&, Gutierrez J. (2023) Chromosome 10q24.32 Variants Associate With Brain Arterial Diameters in Diverse Populations: A Genome-Wide Association Study. J Am Heart Assoc, 12 (23) e030935. doi:10.1161/JAHA.123.030935. PMID 38038215
ABSTRACT
BACKGROUND: Brain arterial diameters (BADs) are novel imaging biomarkers of cerebrovascular disease, cognitive decline, and dementia. Traditional vascular risk factors have been associated with BADs, but whether there may be genetic determinants of BADs is unknown. METHODS AND RESULTS: The authors studied 4150 participants from 6 geographically diverse population-based cohorts (40% European, 14% African, 22% Hispanic, 24% Asian ancestries). Brain arterial diameters for 13 segments were measured and averaged to obtain a global measure of BADs as well as the posterior and anterior circulations. A genome-wide association study revealed 14 variants at one locus associated with global BAD at genome-wide significance (P<5×10-8) (top single-nucleotide polymorphism, rs7921574; β=0.06 [P=1.54×10-8]). This locus mapped to an intron of CNNM2. A trans-ancestry genome-wide association study meta-analysis identified 2 more loci at NT5C2 (rs10748839; P=2.54×10-8) and AS3MT (rs10786721; P=4.97×10-8), associated with global BAD. In addition, 2 single-nucleotide polymorphisms colocalized with expression of CNNM2 (rs7897654; β=0.12 [P=6.17×10-7]) and AL356608.1 (rs10786719; β=-0.17 [P=6.60×10-6]) in brain tissue. For the posterior BAD, 2 variants at one locus mapped to an intron of TCF25 were identified (top single-nucleotide polymorphism, rs35994878; β=0.11 [P=2.94×10-8]). For the anterior BAD, one locus at ADAP1 was identified in trans-ancestry genome-wide association analysis (rs34217249; P=3.11×10-8). CONCLUSIONS: The current study reveals 3 novel risk loci (CNNM2, NT5C2, and AS3MT) associated with BADs. These findings may help elucidate the mechanism by which BADs may influence cerebrovascular health.
DOI
10.1161/JAHA.123.030935
MAIN ANCESTRY
Cross-ancestry

Liu Y-34128465

Summary statistics
PUBMED_LINK
34128465
TITLE
Genetic architecture of 11 organ traits derived from abdominal MRI using deep learning.
Main citation
Liu Y, Basty N, Whitcher B, Bell JD, ...&, Cule M. (2021) Genetic architecture of 11 organ traits derived from abdominal MRI using deep learning. Elife, 10 () . doi:10.7554/eLife.65554. PMID 34128465
ABSTRACT
Cardiometabolic diseases are an increasing global health burden. While socioeconomic, environmental, behavioural, and genetic risk factors have been identified, a better understanding of the underlying mechanisms is required to develop more effective interventions. Magnetic resonance imaging (MRI) has been used to assess organ health, but biobank-scale studies are still in their infancy. Using over 38,000 abdominal MRI scans in the UK Biobank, we used deep learning to quantify volume, fat, and iron in seven organs and tissues, and demonstrate that imaging-derived phenotypes reflect health status. We show that these traits have a substantial heritable component (8-44%) and identify 93 independent genome-wide significant associations, including four associations with liver traits that have not previously been reported. Our work demonstrates the tractability of deep learning to systematically quantify health parameters from high-throughput MRI across a range of organs and tissues, and use the largest-ever study of its kind to generate new insights into the genetic architecture of these traits.
DOI
10.7554/eLife.65554
MAIN ANCESTRY
EUR

Macdonald-Dunlop

Summary statistics
PREPRINT_DOI
2021.08.03.21261494
SERVER
medrxiv
Main citation
Macdonald-Dunlop, E. et al. Mapping genetic determinants of 184 circulating proteins in 26,494 individuals to connect proteins and diseases. bioRxiv (2021) doi:10.1101/2021.08.03.21261494.
MAIN ANCESTRY
EUR

MANE PheWeb

Summary statistics
PUBMED_LINK
39389017
DESCRIPTION
MANE PheWeb — Chinese maternal cohort GWAS summary statistics browser.
URL
https://db.cngb.org/MANE.PheWeb/
TITLE
Genetic analyses of 104 phenotypes in 20,900 Chinese pregnant women reveal pregnancy-specific discoveries.
Main citation
Xiao H, Li L, Yang M, Zhang X, ...&, Jin X. (2024) Genetic analyses of 104 phenotypes in 20,900 Chinese pregnant women reveal pregnancy-specific discoveries. Cell Genom, 4 (10) 100633. doi:10.1016/j.xgen.2024.100633. PMID 39389017
ABSTRACT
Monitoring biochemical phenotypes during pregnancy is vital for maternal and fetal health, allowing early detection and management of pregnancy-related conditions to ensure safety for both. Here, we conducted a genetic analysis of 104 pregnancy phenotypes in 20,900 Chinese women. The genome-wide association study (GWAS) identified a total of 410 trait-locus associations, with 71.71% reported previously. Among the 116 novel hits for 45 phenotypes, 83 were successfully replicated. Among them, 31 were defined as potentially pregnancy-specific associations, including creatine and HELLPAR and neutrophils and ESR1, with subsequent analysis revealing enrichments in estrogen-related pathways and female reproductive tissues. The partitioning heritability underscored the significant roles of fetal blood, embryoid bodies, and female reproductive organs in pregnancy hematology and birth outcomes. Pathway analysis confirmed the intricate interplay of hormone and immune regulation, metabolism, and cell cycle during pregnancy. This study contributes to the understanding of genetic influences on pregnancy phenotypes and their implications for maternal health.
DOI
10.1016/j.xgen.2024.100633
MAIN ANCESTRY
EAS

Megastroke

Summary statistics
PUBMED_LINK
29531354
DESCRIPTION
MEGASTROKE multi-ancestry stroke GWAS meta-analysis summary statistics and portal.
URL
https://www.megastroke.org/index.html
TITLE
Multiancestry genome-wide association study of 520,000 subjects identifies 32 loci associated with stroke and stroke subtypes.
Main citation
Malik R, Chauhan G, Traylor M, Sargurupremraj M, ...&, Dichgans M. (2018) Multiancestry genome-wide association study of 520,000 subjects identifies 32 loci associated with stroke and stroke subtypes. Nat Genet, 50 (4) 524-537. doi:10.1038/s41588-018-0058-3. PMID 29531354
ABSTRACT
Stroke has multiple etiologies, but the underlying genes and pathways are largely unknown. We conducted a multiancestry genome-wide-association meta-analysis in 521,612 individuals (67,162 cases and 454,450 controls) and discovered 22 new stroke risk loci, bringing the total to 32. We further found shared genetic variation with related vascular traits, including blood pressure, cardiac traits, and venous thromboembolism, at individual loci (n = 18), and using genetic risk scores and linkage-disequilibrium-score regression. Several loci exhibited distinct association and pleiotropy patterns for etiological stroke subtypes. Eleven new susceptibility loci indicate mechanisms not previously implicated in stroke pathophysiology, with prioritization of risk variants and genes accomplished through bioinformatics analyses using extensive functional datasets. Stroke risk loci were significantly enriched in drug targets for antithrombotic therapy.
DOI
10.1038/s41588-018-0058-3
MAIN ANCESTRY
Multi-ancestry

Melzer D, et al-18464913

Summary statistics
PUBMED_LINK
18464913
TITLE
A genome-wide association study identifies protein quantitative trait loci (pQTLs).
Main citation
Melzer D, Perry JR, Hernandez D, Corsi AM, ...&, Ferrucci L. (2008) A genome-wide association study identifies protein quantitative trait loci (pQTLs). PLoS Genet, 4 (5) e1000072. doi:10.1371/journal.pgen.1000072. PMID 18464913
ABSTRACT
There is considerable evidence that human genetic variation influences gene expression. Genome-wide studies have revealed that mRNA levels are associated with genetic variation in or close to the gene coding for those mRNA transcripts - cis effects, and elsewhere in the genome - trans effects. The role of genetic variation in determining protein levels has not been systematically assessed. Using a genome-wide association approach we show that common genetic variation influences levels of clinically relevant proteins in human serum and plasma. We evaluated the role of 496,032 polymorphisms on levels of 42 proteins measured in 1200 fasting individuals from the population based InCHIANTI study. Proteins included insulin, several interleukins, adipokines, chemokines, and liver function markers that are implicated in many common diseases including metabolic, inflammatory, and infectious conditions. We identified eight Cis effects, including variants in or near the IL6R (p = 1.8x10(-57)), CCL4L1 (p = 3.9x10(-21)), IL18 (p = 6.8x10(-13)), LPA (p = 4.4x10(-10)), GGT1 (p = 1.5x10(-7)), SHBG (p = 3.1x10(-7)), CRP (p = 6.4x10(-6)) and IL1RN (p = 7.3x10(-6)) genes, all associated with their respective protein products with effect sizes ranging from 0.19 to 0.69 standard deviations per allele. Mechanisms implicated include altered rates of cleavage of bound to unbound soluble receptor (IL6R), altered secretion rates of different sized proteins (LPA), variation in gene copy number (CCL4L1) and altered transcription (GGT1). We identified one novel trans effect that was an association between ABO blood group and tumour necrosis factor alpha (TNF-alpha) levels (p = 6.8x10(-40)), but this finding was not present when TNF-alpha was measured using a different assay , or in a second study, suggesting an assay-specific association. Our results show that protein levels share some of the features of the genetics of gene expression. These include the presence of strong genetic effects in cis locations. The identification of protein quantitative trait loci (pQTLs) may be a powerful complementary method of improving our understanding of disease pathways.
DOI
10.1371/journal.pgen.1000072

Min

Summary statistics
PUBMED_LINK
34493871
DESCRIPTION
Cis and trans meta-analysis results from genome-wide scans of 420,509 DNA methylation sites
URL
http://mqtldb.godmc.org.uk/
TITLE
Genomic and phenotypic insights from an atlas of genetic effects on DNA methylation.
Main citation
Min JL, Hemani G, Hannon E, Dekkers KF, ...&, Relton CL. (2021) Genomic and phenotypic insights from an atlas of genetic effects on DNA methylation. Nat Genet, 53 (9) 1311-1321. doi:10.1038/s41588-021-00923-x. PMID 34493871
ABSTRACT
Characterizing genetic influences on DNA methylation (DNAm) provides an opportunity to understand mechanisms underpinning gene regulation and disease. In the present study, we describe results of DNAm quantitative trait locus (mQTL) analyses on 32,851 participants, identifying genetic variants associated with DNAm at 420,509 DNAm sites in blood. We present a database of >270,000 independent mQTLs, of which 8.5% comprise long-range (trans) associations. Identified mQTL associations explain 15-17% of the additive genetic variance of DNAm. We show that the genetic architecture of DNAm levels is highly polygenic. Using shared genetic control between distal DNAm sites, we constructed networks, identifying 405 discrete genomic communities enriched for genomic annotations and complex traits. Shared genetic variants are associated with both DNAm levels and complex diseases, but only in a minority of cases do these associations reflect causal relationships from DNAm to trait or vice versa, indicating a more complex genotype-phenotype map than previously anticipated.
DOI
10.1038/s41588-021-00923-x

MVP-Finngen-UKBB meta-analysis

Summary statistics
PUBMED_LINK
39974076
DESCRIPTION
Cross-biobank GWAS meta-analysis across MVP, FinnGen, and UK Biobank (phenome-wide association resource).
URL
https://mvp-ukbb.finngen.fi/
TITLE
Prevalence and disease risks for male and female sex chromosome trisomies: a registry-based phenome-wide association study in 1.5 million participants of MVP, FinnGen, and UK Biobank.
Main citation
Davis SM, Liu A, Teerlink CC, Lapato DM, ...&, Hauger RL. (2025) Prevalence and disease risks for male and female sex chromosome trisomies: a registry-based phenome-wide association study in 1.5 million participants of MVP, FinnGen, and UK Biobank. medRxiv, () . doi:10.1101/2025.01.31.25321488. PMID 39974076
ABSTRACT
Sex chromosome trisomies (SCT) are the most common whole chromosome aneuploidy in humans. Yet, our understanding of the prevalence and associated health outcomes is largely driven by observational studies of clinically diagnosed cases, resulting in a disproportionate focus on 47,XXY and associated hypogonadism. We analyzed microarray intensity data of sex chromosomes for 1.5 million individuals enrolled in three large cohorts-Million Veteran Program, FinnGen, and UK Biobank-to identify individuals with 47,XXY, 47,XYY, and 47,XXX. We examined disease conditions associated with SCTs by performing phenome-wide association studies (PheWAS) using electronic health records (EHR) data for each cohort, followed by meta-analysis across cohorts. Association results are presented for each SCT and also stratified by presence or absence of a documented clinical diagnosis for 47,XXY. We identified 2,769 individuals with (47,XXY: 1,319; 47,XYY: 1,108; 47,XXX: 342), most of whom had no documented clinical diagnosis (47,XXY: 73.8%; 47,XYY: 98.6%; 47,XXX: 93.6%). The identified phenotypic associations with SCT spanned all PheWAS disease categories except neoplasms. Many associations are shared among three SCT subtypes, particularly for vascular diseases (e.g., chronic venous insufficiency (OR [95% CI] for 47,XXY 4.7 [3.9,5.8]; 47,XYY 5.6 [4.5,7.0]; 4 7,XXX 4.6 [2.7,7.6], venous thromboembolism (47,XXY 4.6 [3.7-5.6]; 47,XYY 4.1 [3.3-5.0]; 47,XXX 8.1 [4.2-15.4]), and glaucoma (47,XXY 2.5 [2.1-2.9]; 47,XYY 2.4 [2.0-2.8]; 47,XXX 2.3 [1.4-3.5]). A third sex chromosome confers an increased risk for systemic comorbidities, even if the SCT is not documented. SCT phenotypes largely overlap, suggesting one or more X/Y homolog genes may underlie pathophysiology and comorbidities across SCTs.
DOI
10.1101/2025.01.31.25321488
RELATED_BIOBANK
UK Biobank ,FinnGen ,Million Veteran Program
MAIN ANCESTRY
EUR

Ning C-38036550

Summary statistics
PUBMED_LINK
38036550
TITLE
Genome-wide association analysis of left ventricular imaging-derived phenotypes identifies 72 risk loci and yields genetic insights into hypertrophic cardiomyopathy.
Main citation
Ning C, Fan L, Jin M, Wang W, ...&, Miao X. (2023) Genome-wide association analysis of left ventricular imaging-derived phenotypes identifies 72 risk loci and yields genetic insights into hypertrophic cardiomyopathy. Nat Commun, 14 (1) 7900. doi:10.1038/s41467-023-43771-5. PMID 38036550
ABSTRACT
Left ventricular regional wall thickness (LVRWT) is an independent predictor of morbidity and mortality in cardiovascular diseases (CVDs). To identify specific genetic influences on individual LVRWT, we established a novel deep learning algorithm to calculate 12 LVRWTs accurately in 42,194 individuals from the UK Biobank with cardiac magnetic resonance (CMR) imaging. Genome-wide association studies of CMR-derived 12 LVRWTs identified 72 significant genetic loci associated with at least one LVRWT phenotype (P < 5 × 10-8), which were revealed to actively participate in heart development and contraction pathways. Significant causal relationships were observed between the LVRWT traits and hypertrophic cardiomyopathy (HCM) using genetic correlation and Mendelian randomization analyses (P < 0.01). The polygenic risk score of inferoseptal LVRWT at end systole exhibited a notable association with incident HCM, facilitating the identification of high-risk individuals. The findings yield insights into the genetic determinants of LVRWT phenotypes and shed light on the biological basis for HCM etiology.
DOI
10.1038/s41467-023-43771-5
MAIN ANCESTRY
EUR

NSPT

Summary statistics
PUBMED_LINK
38641644
DESCRIPTION
Methylation quantitative trait loci (mQTLs) CpGs in the whole blood of 3,523 Han Chinese from the National Survey of Physical Traits (NSPT) cohort
URL
https://www.biosino.org/sinomqtl/
TITLE
Analysis of blood methylation quantitative trait loci in East Asians reveals ancestry-specific impacts on complex traits.
Main citation
Peng Q, Liu X, Li W, Jing H, ...&, Wang S. (2024) Analysis of blood methylation quantitative trait loci in East Asians reveals ancestry-specific impacts on complex traits. Nat Genet, 56 (5) 846-860. doi:10.1038/s41588-023-01494-9. PMID 38641644
ABSTRACT
Methylation quantitative trait loci (mQTLs) are essential for understanding the role of DNA methylation changes in genetic predisposition, yet they have not been fully characterized in East Asians (EAs). Here we identified mQTLs in whole blood from 3,523 Chinese individuals and replicated them in additional 1,858 Chinese individuals from two cohorts. Over 9% of mQTLs displayed specificity to EAs, facilitating the fine-mapping of EA-specific genetic associations, as shown for variants associated with height. Trans-mQTL hotspots revealed biological pathways contributing to EA-specific genetic associations, including an ERG-mediated 233 trans-mCpG network, implicated in hematopoietic cell differentiation, which likely reflects binding efficiency modulation of the ERG protein complex. More than 90% of mQTLs were shared between different blood cell lineages, with a smaller fraction of lineage-specific mQTLs displaying preferential hypomethylation in the respective lineages. Our study provides new insights into the mQTL landscape across genetic ancestries and their downstream effects on cellular processes and diseases/traits.
DOI
10.1038/s41588-023-01494-9

OmicsPred portal

Summary statistics
PUBMED_LINK
36991119
URL
https://www.omicspred.org/
TITLE
An atlas of genetic scores to predict multi-omic traits.
Main citation
Xu Y, Ritchie SC, Liang Y, Timmers PRHJ, ...&, Inouye M. (2023) An atlas of genetic scores to predict multi-omic traits. Nature, 616 (7955) 123-131. doi:10.1038/s41586-023-05844-9. PMID 36991119
ABSTRACT
The use of omic modalities to dissect the molecular underpinnings of common diseases and traits is becoming increasingly common. But multi-omic traits can be genetically predicted, which enables highly cost-effective and powerful analyses for studies that do not have multi-omics1. Here we examine a large cohort (the INTERVAL study2; n = 50,000 participants) with extensive multi-omic data for plasma proteomics (SomaScan, n = 3,175; Olink, n = 4,822), plasma metabolomics (Metabolon HD4, n = 8,153), serum metabolomics (Nightingale, n = 37,359) and whole-blood Illumina RNA sequencing (n = 4,136), and use machine learning to train genetic scores for 17,227 molecular traits, including 10,521 that reach Bonferroni-adjusted significance. We evaluate the performance of genetic scores through external validation across cohorts of individuals of European, Asian and African American ancestries. In addition, we show the utility of these multi-omic genetic scores by quantifying the genetic control of biological pathways and by generating a synthetic multi-omic dataset of the UK Biobank3 to identify disease associations using a phenome-wide scan. We highlight a series of biological insights with regard to genetic mechanisms in metabolism and canonical pathway associations with disease; for example, JAK-STAT signalling and coronary atherosclerosis. Finally, we develop a portal ( https://www.omicspred.org/ ) to facilitate public access to all genetic scores and validation results, as well as to serve as a platform for future extensions and enhancements of multi-omic genetic scores.
DOI
10.1038/s41586-023-05844-9

OneK1k

Summary statistics
PUBMED_LINK
35389779
URL
https://onek1k.org/
TITLE
Single-cell eQTL mapping identifies cell type-specific genetic control of autoimmune disease.
Main citation
Yazar S, Alquicira-Hernandez J, Wing K, Senabouth A, ...&, Powell JE. (2022) Single-cell eQTL mapping identifies cell type-specific genetic control of autoimmune disease. Science, 376 (6589) eabf3041. doi:10.1126/science.abf3041. PMID 35389779
ABSTRACT
The human immune system displays substantial variation between individuals, leading to differences in susceptibility to autoimmune disease. We present single-cell RNA sequencing (scRNA-seq) data from 1,267,758 peripheral blood mononuclear cells from 982 healthy human subjects. For 14 cell types, we identified 26,597 independent cis-expression quantitative trait loci (eQTLs) and 990 trans-eQTLs, with most showing cell type-specific effects on gene expression. We subsequently show how eQTLs have dynamic allelic effects in B cells that are transitioning from naïve to memory states and demonstrate how commonly segregating alleles lead to interindividual variation in immune function. Finally, using a Mendelian randomization approach, we identify the causal route by which 305 risk loci contribute to autoimmune disease at the cellular level. This work brings together genetic epidemiology with scRNA-seq to uncover drivers of interindividual variation in the immune system.
DOI
10.1126/science.abf3041

OpenGWAS

Summary statistics
DESCRIPTION
MRC IEU OpenGWAS database — harmonized GWAS summary statistics and API for MR and related analyses.
URL
https://gwas.mrcieu.ac.uk/
PREPRINT_DOI
10.1101/2020.08.10.244293
SERVER
biorxiv
Main citation
Elsworth, B., Lyon, M., Alexander, T., Liu, Y., Matthews, P., Hallett, J., ... & Hemani, G. (2020). The MRC IEU OpenGWAS data infrastructure. BioRxiv, 2020-08.
MAIN ANCESTRY
Multi-ancestry

Ota

Summary statistics
PUBMED_LINK
33930287
TITLE
Dynamic landscape of immune cell-specific gene regulation in immune-mediated diseases.
Main citation
Ota M, Nagafuchi Y, Hatano H, Ishigaki K, ...&, Fujio K. (2021) Dynamic landscape of immune cell-specific gene regulation in immune-mediated diseases. Cell, 184 (11) 3006-3021.e17. doi:10.1016/j.cell.2021.03.056. PMID 33930287
ABSTRACT
Genetic studies have revealed many variant loci that are associated with immune-mediated diseases. To elucidate the disease pathogenesis, it is essential to understand the function of these variants, especially under disease-associated conditions. Here, we performed a large-scale immune cell gene-expression analysis, together with whole-genome sequence analysis. Our dataset consists of 28 distinct immune cell subsets from 337 patients diagnosed with 10 categories of immune-mediated diseases and 79 healthy volunteers. Our dataset captured distinctive gene-expression profiles across immune cell types and diseases. Expression quantitative trait loci (eQTL) analysis revealed dynamic variations of eQTL effects in the context of immunological conditions, as well as cell types. These cell-type-specific and context-dependent eQTLs showed significant enrichment in immune disease-associated genetic variants, and they implicated the disease-relevant cell types, genes, and environment. This atlas deepens our understanding of the immunogenetic functions of disease-associated variants under in vivo disease conditions.
DOI
10.1016/j.cell.2021.03.056

Pan-UKB

Summary statistics
PUBMED_LINK
40968291
DESCRIPTION
Pan-UK Biobank — multi-ancestry GWAS in UK Biobank across thousands of phenotypes.
URL
https://pan.ukbb.broadinstitute.org/
TITLE
Pan-UK Biobank genome-wide association analyses enhance discovery and resolution of ancestry-enriched effects.
Main citation
Karczewski KJ, Gupta R, Kanai M, Lu W, ...&, Martin AR. (2025) Pan-UK Biobank genome-wide association analyses enhance discovery and resolution of ancestry-enriched effects. Nat Genet, 57 (10) 2408-2417. doi:10.1038/s41588-025-02335-7. PMID 40968291
ABSTRACT
Large biobanks, such as the UK Biobank (UKB), enable massive phenome by genome-wide association studies that elucidate genetic etiology of complex traits. However, people from diverse genetic ancestry groups are often excluded from association analyses due to concerns about population structure introducing false positive associations. Here we generate mixed model associations and meta-analyses across genetic ancestry groups, inclusive of a larger fraction of the UK Biobank than previous efforts, to produce freely available summary statistics for 7,266 traits. We build a quality control and analysis framework informed by genetic architecture. Overall, we identify 14,676 significant loci (P < 5 × 10-8) in the meta-analysis that were not found in the EUR genetic ancestry group alone, including new associations, for example between CAMK2D and triglycerides. We also highlight associations from ancestry-enriched variation, including a known pleiotropic missense variant in G6PD associated with several biomarker traits. We release these results publicly alongside frequently asked questions that describe caveats for interpretation of results, enhancing available resources for interpretation of risk variants across diverse populations.
DOI
10.1038/s41588-025-02335-7
RELATED_BIOBANK
UK Biobank
MAIN ANCESTRY
EUR

Parisinos C-32247823

Summary statistics
PUBMED_LINK
32247823
TITLE
Genome-wide and Mendelian randomisation studies of liver MRI yield insights into the pathogenesis of steatohepatitis.
Main citation
Parisinos CA, Wilman HR, Thomas EL, Kelly M, ...&, Yaghootkar H. (2020) Genome-wide and Mendelian randomisation studies of liver MRI yield insights into the pathogenesis of steatohepatitis. J Hepatol, 73 (2) 241-251. doi:10.1016/j.jhep.2020.03.032. PMID 32247823
ABSTRACT
BACKGROUND & AIMS: MRI-based corrected T1 (cT1) is a non-invasive method to grade the severity of steatohepatitis and liver fibrosis. We aimed to identify genetic variants influencing liver cT1 and use genetics to understand mechanisms underlying liver fibroinflammatory disease and its link with other metabolic traits and diseases. METHODS: First, we performed a genome-wide association study (GWAS) in 14,440 Europeans, with liver cT1 measures, from the UK Biobank. Second, we explored the effects of the cT1 variants on liver blood tests, and a range of metabolic traits and diseases. Third, we used Mendelian randomisation to test the causal effects of 24 predominantly metabolic traits on liver cT1 measures. RESULTS: We identified 6 independent genetic variants associated with liver cT1 that reached the GWAS significance threshold (p <5×10-8). Four of the variants (rs759359281 in SLC30A10, rs13107325 in SLC39A8, rs58542926 in TM6SF2, rs738409 in PNPLA3) were also associated with elevated aminotransferases and had variable effects on liver fat and other metabolic traits. Insulin resistance, type 2 diabetes, non-alcoholic fatty liver and body mass index were causally associated with elevated cT1, whilst favourable adiposity (instrumented by variants associated with higher adiposity but lower risk of cardiometabolic disease and lower liver fat) was found to be protective. CONCLUSION: The association between 2 metal ion transporters and cT1 indicates an important new mechanism in steatohepatitis. Future studies are needed to determine whether interventions targeting the identified transporters might prevent liver disease in at-risk individuals. LAY SUMMARY: We estimated levels of liver inflammation and scarring based on magnetic resonance imaging of 14,440 UK Biobank participants. We performed a genetic study and identified variations in 6 genes associated with levels of liver inflammation and scarring. Participants with variations in 4 of these genes also had higher levels of markers of liver cell injury in blood samples, further validating their role in liver health. Two identified genes are involved in the transport of metal ions in our body. Further investigation of these variations may lead to better detection, assessment, and/or treatment of liver inflammation and scarring.
DOI
10.1016/j.jhep.2020.03.032
MAIN ANCESTRY
EUR

Persyn E-32358547

Summary statistics
PUBMED_LINK
32358547
TITLE
Genome-wide association study of MRI markers of cerebral small vessel disease in 42,310 participants.
Main citation
Persyn E, Hanscombe KB, Howson JMM, Lewis CM, ...&, Markus HS. (2020) Genome-wide association study of MRI markers of cerebral small vessel disease in 42,310 participants. Nat Commun, 11 (1) 2175. doi:10.1038/s41467-020-15932-3. PMID 32358547
ABSTRACT
Cerebral small vessel disease is a major cause of stroke and dementia, but its genetic basis is incompletely understood. We perform a genetic study of three MRI markers of the disease in UK Biobank imaging data and other sources: white matter hyperintensities (N = 42,310), fractional anisotropy (N = 17,663) and mean diffusivity (N = 17,467). Our aim is to better understand the disease pathophysiology. Across the three traits, we identify 31 loci, of which 21 were previously unreported. We perform a transcriptome-wide association study to identify associations with gene expression in relevant tissues, identifying 66 associated genes across the three traits. This genetic study provides insights into the understanding of the biological mechanisms underlying small vessel disease.
DOI
10.1038/s41467-020-15932-3
MAIN ANCESTRY
EUR

PGC (Psychiatric Genomics Consortium)

Summary statistics
PUBMED_LINK
25056061
DESCRIPTION
Psychiatric Genomics Consortium meta-analysis summary statistics for psychiatric disorders.
URL
https://www.med.unc.edu/pgc/download-results/
TITLE
Biological insights from 108 schizophrenia-associated genetic loci.
Main citation
Schizophrenia Working Group of the Psychiatric Genomics Consortium. (2014) Biological insights from 108 schizophrenia-associated genetic loci. Nature, 511 (7510) 421-7. doi:10.1038/nature13595. PMID 25056061
ABSTRACT
Schizophrenia is a highly heritable disorder. Genetic risk is conferred by a large number of alleles, including common alleles of small effect that might be detected by genome-wide association studies. Here we report a multi-stage schizophrenia genome-wide association study of up to 36,989 cases and 113,075 controls. We identify 128 independent associations spanning 108 conservatively defined loci that meet genome-wide significance, 83 of which have not been previously reported. Associations were enriched among genes expressed in brain, providing biological plausibility for the findings. Many findings have the potential to provide entirely new insights into aetiology, but associations at DRD2 and several genes involved in glutamatergic neurotransmission highlight molecules of known and potential therapeutic relevance to schizophrenia, and are consistent with leading pathophysiological hypotheses. Independent of genes expressed in brain, associations were enriched among genes expressed in tissues that have important roles in immunity, providing support for the speculated link between the immune system and schizophrenia.
DOI
10.1038/nature13595
MAIN ANCESTRY
Multi-ancestry

pGWAS server

Summary statistics
PUBMED_LINK
28240269
DESCRIPTION
In our study, we performed a genome-wide association study with protein levels (pGWAS). Using a highly multiplexed, aptamer-based, affinity proteomics platform (SOMAscan™), we quantified levels of 1,124 proteins in blood plasma samples from 1,000 German individuals (KORA cohort) and 338 Arab or Asian individuals (QMDiab cohort). We identified 539 independent, genome-wide significant SNP-to-protein associations, which can be investigated using this webserver.
URL
https://metabolomics.helmholtz-muenchen.de/pgwas/
TITLE
Connecting genetic risk to disease end points through the human blood plasma proteome.
Main citation
Suhre K, Arnold M, Bhagwat AM, Cotton RJ, ...&, Graumann J. (2017) Connecting genetic risk to disease end points through the human blood plasma proteome. Nat Commun, 8 () 14357. doi:10.1038/ncomms14357. PMID 28240269
ABSTRACT
Genome-wide association studies (GWAS) with intermediate phenotypes, like changes in metabolite and protein levels, provide functional evidence to map disease associations and translate them into clinical applications. However, although hundreds of genetic variants have been associated with complex disorders, the underlying molecular pathways often remain elusive. Associations with intermediate traits are key in establishing functional links between GWAS-identified risk-variants and disease end points. Here we describe a GWAS using a highly multiplexed aptamer-based affinity proteomics platform. We quantify 539 associations between protein levels and gene variants (pQTLs) in a German cohort and replicate over half of them in an Arab and Asian cohort. Fifty-five of the replicated pQTLs are located in trans. Our associations overlap with 57 genetic risk loci for 42 unique disease end points. We integrate this information into a genome-proteome network and provide an interactive web-tool for interrogations. Our results provide a basis for novel approaches to pharmaceutical and diagnostic applications.
DOI
10.1038/ncomms14357

Pietzner M, et al-34648354

Summary statistics
PUBMED_LINK
34648354
TITLE
Mapping the proteo-genomic convergence of human diseases.
Main citation
Pietzner M, Wheeler E, Carrasco-Zanini J, Cortes A, ...&, Langenberg C. (2021) Mapping the proteo-genomic convergence of human diseases. Science, 374 (6569) eabj1541. doi:10.1126/science.abj1541. PMID 34648354
ABSTRACT
Characterization of the genetic regulation of proteins is essential for understanding disease etiology and developing therapies. We identified 10,674 genetic associations for 3892 plasma proteins to create a cis-anchored gene-protein-disease map of 1859 connections that highlights strong cross-disease biological convergence. This proteo-genomic map provides a framework to connect etiologically related diseases, to provide biological context for new or emerging disorders, and to integrate different biological domains to establish mechanisms for known gene-disease links. Our results identify proteo-genomic connections within and between diseases and establish the value of cis-protein variants for annotation of likely causal disease genes at loci identified in genome-wide association studies, thereby addressing a major barrier to experimental validation and clinical translation of genetic discoveries.
DOI
10.1126/science.abj1541

Pirruccello JP-32382064

Summary statistics
PUBMED_LINK
32382064
TITLE
Analysis of cardiac magnetic resonance imaging in 36,000 individuals yields genetic insights into dilated cardiomyopathy.
Main citation
Pirruccello JP, Bick A, Wang M, Chaffin M, ...&, Aragam KG. (2020) Analysis of cardiac magnetic resonance imaging in 36,000 individuals yields genetic insights into dilated cardiomyopathy. Nat Commun, 11 (1) 2254. doi:10.1038/s41467-020-15823-7. PMID 32382064
ABSTRACT
Dilated cardiomyopathy (DCM) is an important cause of heart failure and the leading indication for heart transplantation. Many rare genetic variants have been associated with DCM, but common variant studies of the disease have yielded few associated loci. As structural changes in the heart are a defining feature of DCM, we report a genome-wide association study of cardiac magnetic resonance imaging (MRI)-derived left ventricular measurements in 36,041 UK Biobank participants, with replication in 2184 participants from the Multi-Ethnic Study of Atherosclerosis. We identify 45 previously unreported loci associated with cardiac structure and function, many near well-established genes for Mendelian cardiomyopathies. A polygenic score of MRI-derived left ventricular end systolic volume strongly associates with incident DCM in the general population. Even among carriers of TTN truncating mutations, this polygenic score influences the size and function of the human heart. These results further implicate common genetic polymorphisms in the pathogenesis of DCM.
DOI
10.1038/s41467-020-15823-7
MAIN ANCESTRY
EUR

PLATLAS

Summary statistics
PUBMED_LINK
40313291
FULL NAME
PLeiotropic ATLAS
DESCRIPTION
PLATLAS — pleiotropy atlas with GWAS summary statistics across >1000 phenotypes (multi-biobank).
URL
https://platlas.cels.anl.gov/
TITLE
Genome-Wide Assessment of Pleiotropy Across >1000 Traits from Global Biobanks.
Main citation
Levin MG, Koyama S, Woerner J, Zhang DY, ...&, Natarajan P. (2025) Genome-Wide Assessment of Pleiotropy Across >1000 Traits from Global Biobanks. medRxiv, () . doi:10.1101/2025.04.18.25326074. PMID 40313291
ABSTRACT
Large-scale genetic association studies have identified thousands of trait-associated risk loci, establishing the polygenic basis for common complex traits and diseases. Although prior studies suggest that many trait-associated loci are pleiotropic, the extent to which this pleiotropy reflects shared causal variants or confounding by linkage disequilibrium remains poorly characterized. To define a set of candidate loci with potentially pleiotropic associations, we performed genome-wide association study (GWAS) meta-analyses of up to 1,167 clinically relevant traits and diseases across 1,789,365 diverse individuals genetically similar to Admixed American (AMR, NMax = 60,756), African (AFR, NMax = 128,361), East Asian (EAS, NMax = 307,465), European (EUR, NMax = 1,283,907), and South Asian (SAS, NMax = 8,876) reference populations from the VA Million Veteran Program (MVP), UK Biobank (UKB), FinnGen, Biobank Japan (BBJ), Tohoku Medical Megabank (ToMMo), and Korean Genome and Epidemiology Study (KoGES). We identified 27,193 genome-wide significant locus-trait pairs (1MB region with PGWAMA < 5 × 10-8) in within-population analysis and 29,139 in multi-population analysis (PMR-MEGA < 5 × 10-8). Among these, 11.5% (n = 3,149) of locus-trait pairs in population-wise and 6.4% (n = 1,875) in multi-population analyses did not reach genome-wide significance in previously published GWAS. In aggregate, the genome-wide significant loci fell within 2,624 non-overlapping autosomal genomic windows on average ~600kb in size. Each locus contained genome-wide significant signals for a median of 6 traits (IQR 2 to 18), including 2,110 (80%) pleiotropic loci associated with >1 trait. Multi-trait colocalization identified 1,902 (72%) loci with high-confidence (posterior probability > 0.9) evidence of a shared causal variant across two or more traits. Variants in pleiotropic loci were significantly enriched for a broad spectrum of functional annotations compared to non-pleiotropic counterparts. Polygenic scores (PGS) developed from these data generally improved prediction compared to existing PGS, and were broadly associated with both primary and pleiotropic phenotypes. These results provide a contemporary map of genetic pleiotropy across the spectrum of human traits/diseases and diverse genetic backgrounds.
DOI
10.1101/2025.04.18.25326074
MAIN ANCESTRY
ALL

Png G, et al-34857772

Summary statistics
PUBMED_LINK
34857772
TITLE
Mapping the serum proteome to neurological diseases using whole genome sequencing.
Main citation
Png G, Barysenka A, Repetto L, Navarro P, ...&, Zeggini E. (2021) Mapping the serum proteome to neurological diseases using whole genome sequencing. Nat Commun, 12 (1) 7042. doi:10.1038/s41467-021-27387-1. PMID 34857772
ABSTRACT
Despite the increasing global burden of neurological disorders, there is a lack of effective diagnostic and therapeutic biomarkers. Proteins are often dysregulated in disease and have a strong genetic component. Here, we carry out a protein quantitative trait locus analysis of 184 neurologically-relevant proteins, using whole genome sequencing data from two isolated population-based cohorts (N = 2893). In doing so, we elucidate the genetic landscape of the circulating proteome and its connection to neurological disorders. We detect 214 independently-associated variants for 107 proteins, the majority of which (76%) are cis-acting, including 114 variants that have not been previously identified. Using two-sample Mendelian randomisation, we identify causal associations between serum CD33 and Alzheimer's disease, GPNMB and Parkinson's disease, and MSR1 and schizophrenia, describing their clinical potential and highlighting drug repurposing opportunities.
DOI
10.1038/s41467-021-27387-1

Proteome PheWAS browser

Summary statistics
PUBMED_LINK
32895551
TITLE
Phenome-wide Mendelian randomization mapping the influence of the plasma proteome on complex diseases.
Main citation
Zheng J, Haberland V, Baird D, Walker V, ...&, Gaunt TR. (2020) Phenome-wide Mendelian randomization mapping the influence of the plasma proteome on complex diseases. Nat Genet, 52 (10) 1122-1131. doi:10.1038/s41588-020-0682-6. PMID 32895551
ABSTRACT
The human proteome is a major source of therapeutic targets. Recent genetic association analyses of the plasma proteome enable systematic evaluation of the causal consequences of variation in plasma protein levels. Here we estimated the effects of 1,002 proteins on 225 phenotypes using two-sample Mendelian randomization (MR) and colocalization. Of 413 associations supported by evidence from MR, 130 (31.5%) were not supported by results of colocalization analyses, suggesting that genetic confounding due to linkage disequilibrium is widespread in naïve phenome-wide association studies of proteins. Combining MR and colocalization evidence in cis-only analyses, we identified 111 putatively causal effects between 65 proteins and 52 disease-related phenotypes ( https://www.epigraphdb.org/pqtl/ ). Evaluation of data from historic drug development programs showed that target-indication pairs with MR and colocalization support were more likely to be approved, evidencing the value of this approach in identifying and prioritizing potential therapeutic targets.
DOI
10.1038/s41588-020-0682-6

PsychENCODE

Summary statistics
PUBMED_LINK
26605881
DESCRIPTION
Established in 2015 by the National Institute of Mental Health, the PsychENCODE Consortium brings together multidisciplinary teams to study the molecular basis of neuropsychiatric diseases. Genetic influences on brain function are remarkably complex, characterized by a highly polygenic risk architecture and often located in the non-coding regions of the genome. PsychENCODE members generate large-scale gene expression and regulatory data from human postmortem brain tissues in major psychiatric disorders across multiple developmental stages. The goal is to map and functionally validate disease‐associated genetic variants, regulatory elements, genes and cell types. Phase II of the project focused on single-cell and spatial data, culminating in a collection of 14 papers published on May 24, 2024 (9 in Science, 3 in Science Advances, 1 in Scientific Reports, and 1 in Molecular Psychiatry). Phase I of the project was published in 2018 in a collection of 11 papers in Science, Science Translational Medicine, and Science Advances.
URL
https://www.psychencode.org/home
TITLE
The PsychENCODE project.
Main citation
PsychENCODE Consortium, Akbarian S, Liu C, Knowles JA, ...&, Sestan N. (2015) The PsychENCODE project. Nat Neurosci, 18 (12) 1707-12. doi:10.1038/nn.4156. PMID 26605881
ABSTRACT
Recent research on disparate psychiatric disorders has implicated rare variants in genes involved in global gene regulation and chromatin modification, as well as many common variants located primarily in regulatory regions of the genome. Understanding precisely how these variants contribute to disease will require a deeper appreciation for the mechanisms of gene regulation in the developing and adult human brain. The PsychENCODE project aims to produce a public resource of multidimensional genomic data using tissue- and cell type–specific samples from approximately 1,000 phenotypically well-characterized, high-quality healthy and disease-affected human post-mortem brains, as well as functionally characterize disease-associated regulatory elements and variants in model systems. We are beginning with a focus on autism spectrum disorder, bipolar disorder and schizophrenia, and expect that this knowledge will apply to a wide variety of psychiatric disorders. This paper outlines the motivation and design of PsychENCODE.
DOI
10.1038/nn.4156

PsychENCODE Phase I

Summary statistics
PUBMED_LINK
30545857
DESCRIPTION
Phase I of the project was published on Dec 14, 2018 in a collection of 11 papers in Science, Science Translational Medicine, and Science Advances.
URL
https://www.psychencode.org/home
TITLE
Comprehensive functional genomic resource and integrative model for the human brain.
Main citation
Wang D, Liu S, Warrell J, Won H, ...&, Gerstein MB. (2018) Comprehensive functional genomic resource and integrative model for the human brain. Science, 362 (6420) . doi:10.1126/science.aat8464. PMID 30545857
ABSTRACT
Despite progress in defining genetic risk for psychiatric disorders, their molecular mechanisms remain elusive. Addressing this, the PsychENCODE Consortium has generated a comprehensive online resource for the adult brain across 1866 individuals. The PsychENCODE resource contains ~79,000 brain-active enhancers, sets of Hi-C linkages, and topologically associating domains; single-cell expression profiles for many cell types; expression quantitative-trait loci (QTLs); and further QTLs associated with chromatin, splicing, and cell-type proportions. Integration shows that varying cell-type proportions largely account for the cross-population variation in expression (with >88% reconstruction accuracy). It also allows building of a gene regulatory network, linking genome-wide association study variants to genes (e.g., 321 for schizophrenia). We embed this network into an interpretable deep-learning model, which improves disease prediction by ~6-fold versus polygenic risk scores and identifies key genes and pathways in psychiatric disorders.
DOI
10.1126/science.aat8464

PsychENCODE Phase II

Summary statistics
PUBMED_LINK
38781368
DESCRIPTION
A large-scale, cross-population resource of gene, isoform, and splicing regulation in the developing human brain
URL
https://www.psychencode.org/home
TITLE
Cross-ancestry atlas of gene, isoform, and splicing regulation in the developing human brain.
Main citation
Wen C, Margolis M, Dai R, Zhang P, ...&, PsychENCODE Consortium. (2024) Cross-ancestry atlas of gene, isoform, and splicing regulation in the developing human brain. Science, 384 (6698) eadh0829. doi:10.1126/science.adh0829. PMID 38781368
ABSTRACT
Neuropsychiatric genome-wide association studies (GWASs), including those for autism spectrum disorder and schizophrenia, show strong enrichment for regulatory elements in the developing brain. However, prioritizing risk genes and mechanisms is challenging without a unified regulatory atlas. Across 672 diverse developing human brains, we identified 15,752 genes harboring gene, isoform, and/or splicing quantitative trait loci, mapping 3739 to cellular contexts. Gene expression heritability drops during development, likely reflecting both increasing cellular heterogeneity and the intrinsic properties of neuronal maturation. Isoform-level regulation, particularly in the second trimester, mediated the largest proportion of GWAS heritability. Through colocalization, we prioritized mechanisms for about 60% of GWAS loci across five disorders, exceeding adult brain findings. Finally, we contextualized results within gene and isoform coexpression networks, revealing the comprehensive landscape of transcriptome regulation in development and disease.
DOI
10.1126/science.adh0829

PsychENCODE Phase II

Summary statistics
PUBMED_LINK
38781369
DESCRIPTION
Phase II of the project focused on single-cell and spatial data, culminating in a collection of 14 papers published on May 24, 2024 (9 in Science, 3 in Science Advances, 1 in Scientific Reports, and 1 in Molecular Psychiatry).
URL
https://www.psychencode.org/home
TITLE
Single-cell genomics and regulatory networks for 388 human brains.
Main citation
Emani PS, Liu JJ, Clarke D, Jensen M, ...&, PsychENCODE Consortium. (2024) Single-cell genomics and regulatory networks for 388 human brains. Science, 384 (6698) eadi5199. doi:10.1126/science.adi5199. PMID 38781369
ABSTRACT
Single-cell genomics is a powerful tool for studying heterogeneous tissues such as the brain. Yet little is understood about how genetic variants influence cell-level gene expression. Addressing this, we uniformly processed single-nuclei, multiomics datasets into a resource comprising >2.8 million nuclei from the prefrontal cortex across 388 individuals. For 28 cell types, we assessed population-level variation in expression and chromatin across gene families and drug targets. We identified >550,000 cell type-specific regulatory elements and >1.4 million single-cell expression quantitative trait loci, which we used to build cell-type regulatory and cell-to-cell communication networks. These networks manifest cellular changes in aging and neuropsychiatric disorders. We further constructed an integrative model accurately imputing single-cell expression and simulating perturbations; the model prioritized ~250 disease-risk genes and drug targets with associated cell types.
DOI
10.1126/science.adi5199

Qi

Summary statistics
PUBMED_LINK
35982161
DESCRIPTION
THISTLE; 2,865 brain cortex samples from 2,443 unrelated individuals of European ancestry with genome-wide SNP data
URL
https://yanglab.westlake.edu.cn/data/brainmeta/cis_sqtl/
TITLE
Genetic control of RNA splicing and its distinct role in complex trait variation.
Main citation
Qi T, Wu Y, Fang H, Zhang F, ...&, Yang J. (2022) Genetic control of RNA splicing and its distinct role in complex trait variation. Nat Genet, 54 (9) 1355-1363. doi:10.1038/s41588-022-01154-4. PMID 35982161
ABSTRACT
Most genetic variants identified from genome-wide association studies (GWAS) in humans are noncoding, indicating their role in gene regulation. Previous studies have shown considerable links of GWAS signals to expression quantitative trait loci (eQTLs) but the links to other genetic regulatory mechanisms, such as splicing QTLs (sQTLs), are underexplored. Here, we introduce an sQTL mapping method, testing for heterogeneity between isoform-eQTL effects (THISTLE), with improved power over competing methods. Applying THISTLE together with a complementary sQTL mapping strategy to brain transcriptomic (n = 2,865) and genotype data, we identified 12,794 genes with cis-sQTLs at P < 5 × 10-8, approximately 61% of which were distinct from eQTLs. Integrating the sQTL data into GWAS for 12 brain-related complex traits (including diseases), we identified 244 genes associated with the traits through cis-sQTLs, approximately 61% of which could not be discovered using the corresponding eQTL data. Our study demonstrates the distinct role of most sQTLs in the genetic regulation of transcription and complex trait variation.
DOI
10.1038/s41588-022-01154-4

Rakowski A

Summary statistics
PREPRINT_DOI
10.1101/2024.06.11.24308721
SERVER
biorxiv
Main citation
Rakowski, A., Monti, R. & Lippert, C. TransferGWAS of T1-weighted brain MRI data from the UK Biobank. bioRxiv 2024.06.11.24308721 (2024) doi:10.1101/2024.06.11.24308721.
MAIN ANCESTRY
EUR

Review-Suhre K, et al-32860016

Summary statistics
PUBMED_LINK
32860016
DESCRIPTION
A Table of all published GWAS with proteomics
URL
http://www.metabolomix.com/a-table-of-all-published-gwas-with-proteomics/
TITLE
Genetics meets proteomics: perspectives for large population-based studies.
Main citation
Suhre K, McCarthy MI, Schwenk JM. (2021) Genetics meets proteomics: perspectives for large population-based studies. Nat Rev Genet, 22 (1) 19-37. doi:10.1038/s41576-020-0268-2. PMID 32860016
ABSTRACT
Proteomic analysis of cells, tissues and body fluids has generated valuable insights into the complex processes influencing human biology. Proteins represent intermediate phenotypes for disease and provide insight into how genetic and non-genetic risk factors are mechanistically linked to clinical outcomes. Associations between protein levels and DNA sequence variants that colocalize with risk alleles for common diseases can expose disease-associated pathways, revealing novel drug targets and translational biomarkers. However, genome-wide, population-scale analyses of proteomic data are only now emerging. Here, we review current findings from studies of the plasma proteome and discuss their potential for advancing biomedical translation through the interpretation of genome-wide association analyses. We highlight the challenges faced by currently available technologies and provide perspectives relevant to their future application in large-scale biobank studies.
DOI
10.1038/s41576-020-0268-2

Ruffieux H, et al-32492067

Summary statistics
PUBMED_LINK
32492067
TITLE
A fully joint Bayesian quantitative trait locus mapping of human protein abundance in plasma.
Main citation
Ruffieux H, Carayol J, Popescu R, Harper ME, ...&, Valsesia A. (2020) A fully joint Bayesian quantitative trait locus mapping of human protein abundance in plasma. PLoS Comput Biol, 16 (6) e1007882. doi:10.1371/journal.pcbi.1007882. PMID 32492067
ABSTRACT
Molecular quantitative trait locus (QTL) analyses are increasingly popular to explore the genetic architecture of complex traits, but existing studies do not leverage shared regulatory patterns and suffer from a large multiplicity burden, which hampers the detection of weak signals such as trans associations. Here, we present a fully multivariate proteomic QTL (pQTL) analysis performed with our recently proposed Bayesian method LOCUS on data from two clinical cohorts, with plasma protein levels quantified by mass-spectrometry and aptamer-based assays. Our two-stage study identifies 136 pQTL associations in the first cohort, of which >80% replicate in the second independent cohort and have significant enrichment with functional genomic elements and disease risk loci. Moreover, 78% of the pQTLs whose protein abundance was quantified by both proteomic techniques are confirmed across assays. Our thorough comparisons with standard univariate QTL mapping on (1) these data and (2) synthetic data emulating the real data show how LOCUS borrows strength across correlated protein levels and markers on a genome-wide scale to effectively increase statistical power. Notably, 15% of the pQTLs uncovered by LOCUS would be missed by the univariate approach, including several trans and pleiotropic hits with successful independent validation. Finally, the analysis of extensive clinical data from the two cohorts indicates that the genetically-driven proteins identified by LOCUS are enriched in associations with low-grade inflammation, insulin resistance and dyslipidemia and might therefore act as endophenotypes for metabolic diseases. While considerations on the clinical role of the pQTLs are beyond the scope of our work, these findings generate useful hypotheses to be explored in future research; all results are accessible online from our searchable database. Thanks to its efficient variational Bayes implementation, LOCUS can analyze jointly thousands of traits and millions of markers. Its applicability goes beyond pQTL studies, opening new perspectives for large-scale genome-wide association and QTL analyses. Diet, Obesity and Genes (DiOGenes) trial registration number: NCT00390637.
DOI
10.1371/journal.pcbi.1007882

SABR

Summary statistics
PUBMED_LINK
40500424
DESCRIPTION
South African Blood Regulatory
URL
https://zenodo.org/records/15334125
TITLE
A map of blood regulatory variation in South Africans enables GWAS interpretation.
Main citation
Castel SE, Tluway FD, Emde AK, Smyth N, ...&, Ramsay M. (2025) A map of blood regulatory variation in South Africans enables GWAS interpretation. Nat Genet, 57 (7) 1628-1637. doi:10.1038/s41588-025-02223-0. PMID 40500424
ABSTRACT
Functional genomics resources are critical for interpreting human genetic studies, but currently they are predominantly from European-ancestry individuals. Here we present the South African Blood Regulatory (SABR) resource, a map of blood regulatory variation that includes three South Eastern Bantu-speaking groups. Using paired whole-genome and blood transcriptome data from over 600 individuals, we map the genetic architecture of 40 blood cell traits derived from deconvolution analysis, as well as expression, splice and cell-type interaction quantitative trait loci. We comprehensively compare SABR to the Genotype Tissue Expression Project and characterize thousands of regulatory variants only observed in African-ancestry individuals. Finally, we demonstrate the increased utility of SABR for interpreting African-ancestry association studies by identifying putatively causal genes and molecular mechanisms through colocalization analysis of blood-relevant traits from the Pan-UK Biobank. Importantly, we make full SABR summary statistics publicly available to support the African genomics community.
DOI
10.1038/s41588-025-02223-0

Said

Summary statistics
PREPRINT_DOI
10.1101/2023.11.13.23298365
SERVER
medrxiv
Main citation
Said, S. et al. Ancestry diversity in the genetic determinants of the human plasma proteome and associated new drug targets. bioRxiv (2023) doi:10.1101/2023.11.13.23298365.
RELATED_BIOBANK
China Kadoorie Biobank
MAIN ANCESTRY
EAS

Sasayama D, et al-28031287

Summary statistics
PUBMED_LINK
28031287
TITLE
Genome-wide quantitative trait loci mapping of the human cerebrospinal fluid proteome.
Main citation
Sasayama D, Hattori K, Ogawa S, Yokota Y, ...&, Kunugi H. (2017) Genome-wide quantitative trait loci mapping of the human cerebrospinal fluid proteome. Hum Mol Genet, 26 (1) 44-51. doi:10.1093/hmg/ddw366. PMID 28031287
ABSTRACT
Cerebrospinal fluid (CSF) is virtually the only one accessible source of proteins derived from the central nervous system (CNS) of living humans and possibly reflects the pathophysiology of a variety of neuropsychiatric diseases. However, little is known regarding the genetic basis of variation in protein levels of human CSF. We examined CSF levels of 1,126 proteins in 133 subjects and performed a genome-wide association analysis of 514,227 single nucleotide polymorphisms (SNPs) to detect protein quantitative trait loci (pQTLs). To be conservative, Spearman's correlation was used to identify an association between genotypes of SNPs and protein levels. A total of 421 cis and 25 trans SNP-protein pairs were significantly correlated at a false discovery rate (FDR) of less than 0.01 (nominal P < 7.66 × 10-9). Cis-only analysis revealed additional 580 SNP-protein pairs with FDR < 0.01 (nominal P < 2.13 × 10-5). pQTL SNPs were more likely, compared to non-pQTL SNPs, to be a disease/trait-associated variants identified by previous genome-wide association studies. The present findings suggest that genetic variations play an important role in the regulation of protein expression in the CNS. The obtained database may serve as a valuable resource to understand the genetic bases for CNS protein expression pattern in humans.
DOI
10.1093/hmg/ddw366

sc-eQTLGen

Summary statistics
PUBMED_LINK
32149610
URL
https://www.eqtlgen.org/sc/
TITLE
The single-cell eQTLGen consortium.
Main citation
van der Wijst M, de Vries DH, Groot HE, Trynka G, ...&, Franke L. (2020) The single-cell eQTLGen consortium. Elife, 9 () . doi:10.7554/eLife.52155. PMID 32149610
ABSTRACT
In recent years, functional genomics approaches combining genetic information with bulk RNA-sequencing data have identified the downstream expression effects of disease-associated genetic risk factors through so-called expression quantitative trait locus (eQTL) analysis. Single-cell RNA-sequencing creates enormous opportunities for mapping eQTLs across different cell types and in dynamic processes, many of which are obscured when using bulk methods. Rapid increase in throughput and reduction in cost per cell now allow this technology to be applied to large-scale population genetics studies. To fully leverage these emerging data resources, we have founded the single-cell eQTLGen consortium (sc-eQTLGen), aimed at pinpointing the cellular contexts in which disease-causing genetic variants affect gene expression. Here, we outline the goals, approach and potential utility of the sc-eQTLGen consortium. We also provide a set of study design considerations for future single-cell eQTL studies.
DOI
10.7554/eLife.52155

SCALLOP

Summary statistics
DESCRIPTION
The SCALLOP consortium (Systematic and Combined AnaLysis of Olink Proteins) is a collaborative framework for discovery and follow-up of genetic associations with proteins on the Olink Proteomics platform. To date, 35 PIs from 28 research institutions have joined the effort, which now comprises summary level data for more than 70,000 patients and controls from 45 cohort studies. SCALLOP welcomes new members.
URL
http://www.scallop-consortium.com/
RELATED_BIOBANK
UK Biobank
MAIN ANCESTRY
EUR

Shah M-37604819

Summary statistics
PUBMED_LINK
37604819
TITLE
Environmental and genetic predictors of human cardiovascular ageing.
Main citation
Shah M, de A Inácio MH, Lu C, Schiratti PR, ...&, O'Regan DP. (2023) Environmental and genetic predictors of human cardiovascular ageing. Nat Commun, 14 (1) 4941. doi:10.1038/s41467-023-40566-6. PMID 37604819
ABSTRACT
Cardiovascular ageing is a process that begins early in life and leads to a progressive change in structure and decline in function due to accumulated damage across diverse cell types, tissues and organs contributing to multi-morbidity. Damaging biophysical, metabolic and immunological factors exceed endogenous repair mechanisms resulting in a pro-fibrotic state, cellular senescence and end-organ damage, however the genetic architecture of cardiovascular ageing is not known. Here we use machine learning approaches to quantify cardiovascular age from image-derived traits of vascular function, cardiac motion and myocardial fibrosis, as well as conduction traits from electrocardiograms, in 39,559 participants of UK Biobank. Cardiovascular ageing is found to be significantly associated with common or rare variants in genes regulating sarcomere homeostasis, myocardial immunomodulation, and tissue responses to biophysical stress. Ageing is accelerated by cardiometabolic risk factors and we also identify prescribed medications that are potential modifiers of ageing. Through large-scale modelling of ageing across multiple traits our results reveal insights into the mechanisms driving premature cardiovascular ageing and reveal potential molecular targets to attenuate age-related processes.
DOI
10.1038/s41467-023-40566-6
MAIN ANCESTRY
EUR

Smith SM-33875891

Summary statistics
PUBMED_LINK
33875891
DESCRIPTION
Oxford Brain Imaging Genetics (BIG40)
URL
https://open.win.ox.ac.uk/ukbiobank/big40/pheweb33k/
TITLE
An expanded set of genome-wide association studies of brain imaging phenotypes in UK Biobank.
Main citation
Smith SM, Douaud G, Chen W, Hanayik T, ...&, Elliott LT. (2021) An expanded set of genome-wide association studies of brain imaging phenotypes in UK Biobank. Nat Neurosci, 24 (5) 737-745. doi:10.1038/s41593-021-00826-4. PMID 33875891
ABSTRACT
UK Biobank is a major prospective epidemiological study, including multimodal brain imaging, genetics and ongoing health outcomes. Previously, we published genome-wide associations of 3,144 brain imaging-derived phenotypes, with a discovery sample of 8,428 individuals. Here we present a new open resource of genome-wide association study summary statistics, using the 2020 data release, almost tripling the discovery sample size. We now include the X chromosome and new classes of imaging-derived phenotypes (subcortical volumes and tissue contrast). Previously, we found 148 replicated clusters of associations between genetic variants and imaging phenotypes; in this study, we found 692, including 12 on the X chromosome. We describe some of the newly found associations, focusing on the X chromosome and autosomal associations involving the new classes of imaging-derived phenotypes. Our novel associations implicate, for example, pathways involved in the rare X-linked STAR (syndactyly, telecanthus and anogenital and renal malformations) syndrome, Alzheimer's disease and mitochondrial disorders.
DOI
10.1038/s41593-021-00826-4
MAIN ANCESTRY
EUR

Suhre K, et al-28240269

Summary statistics
PUBMED_LINK
28240269
TITLE
Connecting genetic risk to disease end points through the human blood plasma proteome.
Main citation
Suhre K, Arnold M, Bhagwat AM, Cotton RJ, ...&, Graumann J. (2017) Connecting genetic risk to disease end points through the human blood plasma proteome. Nat Commun, 8 () 14357. doi:10.1038/ncomms14357. PMID 28240269
ABSTRACT
Genome-wide association studies (GWAS) with intermediate phenotypes, like changes in metabolite and protein levels, provide functional evidence to map disease associations and translate them into clinical applications. However, although hundreds of genetic variants have been associated with complex disorders, the underlying molecular pathways often remain elusive. Associations with intermediate traits are key in establishing functional links between GWAS-identified risk-variants and disease end points. Here we describe a GWAS using a highly multiplexed aptamer-based affinity proteomics platform. We quantify 539 associations between protein levels and gene variants (pQTLs) in a German cohort and replicate over half of them in an Arab and Asian cohort. Fifty-five of the replicated pQTLs are located in trans. Our associations overlap with 57 genetic risk loci for 42 unique disease end points. We integrate this information into a genome-proteome network and provide an interactive web-tool for interrogations. Our results provide a basis for novel approaches to pharmaceutical and diagnostic applications.
DOI
10.1038/ncomms14357

Suhre K, et al-38412862

Summary statistics
PUBMED_LINK
38412862
DESCRIPTION
rQTLs
TITLE
Genetic associations with ratios between protein levels detect new pQTLs and reveal protein-protein interactions.
Main citation
Suhre K. (2024) Genetic associations with ratios between protein levels detect new pQTLs and reveal protein-protein interactions. Cell Genom, 4 (3) 100506. doi:10.1016/j.xgen.2024.100506. PMID 38412862
ABSTRACT
Protein quantitative trait loci (pQTLs) are an invaluable source of information for drug target development because they provide genetic evidence to support protein function, suggest relationships between cis- and trans-associated proteins, and link proteins to disease endpoints. Using Olink proteomics data for 1,463 proteins measured in over 54,000 samples of the UK Biobank, we identified 4,248 associations with 2,821 ratios between protein levels (rQTLs). rQTLs were 7.6-fold enriched in known protein-protein interactions, suggesting that their ratios reflect biological links between the implicated proteins. Conducting a GWAS on ratios increased the number of discovered genetic signals by 24.7%. The approach can identify novel loci of clinical relevance, support causal gene identification, and reveal complex networks of interacting proteins. Taken together, our study adds significant value to the genetic insights that can be derived from the UKB proteomics data and motivates the wider use of ratios in large-scale GWAS.
DOI
10.1016/j.xgen.2024.100506
RELATED_BIOBANK
UK Biobank
MAIN ANCESTRY
EUR

Sun BB, et al-29875488

Summary statistics
PUBMED_LINK
29875488
TITLE
Genomic atlas of the human plasma proteome.
Main citation
Sun BB, Maranville JC, Peters JE, Stacey D, ...&, Butterworth AS. (2018) Genomic atlas of the human plasma proteome. Nature, 558 (7708) 73-79. doi:10.1038/s41586-018-0175-2. PMID 29875488
ABSTRACT
Although plasma proteins have important roles in biological processes and are the direct targets of many drugs, the genetic factors that control inter-individual variation in plasma protein levels are not well understood. Here we characterize the genetic architecture of the human plasma proteome in healthy blood donors from the INTERVAL study. We identify 1,927 genetic associations with 1,478 proteins, a fourfold increase on existing knowledge, including trans associations for 1,104 proteins. To understand the consequences of perturbations in plasma protein levels, we apply an integrated approach that links genetic variation with biological pathway, disease, and drug databases. We show that protein quantitative trait loci overlap with gene expression quantitative trait loci, as well as with disease-associated loci, and find evidence that protein biomarkers have causal roles in disease using Mendelian randomization analysis. By linking genetic factors to diseases via specific proteins, our analyses highlight potential therapeutic targets, opportunities for matching existing drugs with new disease indications, and potential safety concerns for drugs under development.
DOI
10.1038/s41586-018-0175-2

Sun BB, et al-37794186

Summary statistics
PUBMED_LINK
37794186
TITLE
Plasma proteomic associations with genetics and health in the UK Biobank.
Main citation
Sun BB, Chiou J, Traylor M, Benner C, ...&, Whelan CD. (2023) Plasma proteomic associations with genetics and health in the UK Biobank. Nature, 622 (7982) 329-338. doi:10.1038/s41586-023-06592-6. PMID 37794186
ABSTRACT
The Pharma Proteomics Project is a precompetitive biopharmaceutical consortium characterizing the plasma proteomic profiles of 54,219 UK Biobank participants. Here we provide a detailed summary of this initiative, including technical and biological validations, insights into proteomic disease signatures, and prediction modelling for various demographic and health indicators. We present comprehensive protein quantitative trait locus (pQTL) mapping of 2,923 proteins that identifies 14,287 primary genetic associations, of which 81% are previously undescribed, alongside ancestry-specific pQTL mapping in non-European individuals. The study provides an updated characterization of the genetic architecture of the plasma proteome, contextualized with projected pQTL discovery rates as sample sizes and proteomic assay coverages increase over time. We offer extensive insights into trans pQTLs across multiple biological domains, highlight genetic influences on ligand-receptor interactions and pathway perturbations across a diverse collection of cytokines and complement networks, and illustrate long-range epistatic effects of ABO blood group and FUT2 secretor status on proteins with gastrointestinal tissue-enriched expression. We demonstrate the utility of these data for drug discovery by extending the genetic proxied effects of protein targets, such as PCSK9, on additional endpoints, and disentangle specific genes and proteins perturbed at loci associated with COVID-19 susceptibility. This public-private partnership provides the scientific community with an open-access proteomics resource of considerable breadth and depth to help to elucidate the biological mechanisms underlying proteo-genomic discoveries and accelerate the development of biomarkers, predictive models and therapeutics1.
DOI
10.1038/s41586-023-06592-6

Sun BB-36241887

Summary statistics
PUBMED_LINK
36241887
TITLE
Genetic map of regional sulcal morphology in the human brain from UK biobank data.
Main citation
Sun BB, Loomis SJ, Pizzagalli F, Shatokhina N, ...&, Whelan CD. (2022) Genetic map of regional sulcal morphology in the human brain from UK biobank data. Nat Commun, 13 (1) 6071. doi:10.1038/s41467-022-33829-1. PMID 36241887
ABSTRACT
Genetic associations with macroscopic brain structure can provide insights into brain function and disease. However, specific associations with measures of local brain folding are largely under-explored. Here, we conducted large-scale genome- and exome-wide associations of regional cortical sulcal measures derived from magnetic resonance imaging scans of 40,169 individuals in UK Biobank. We discovered 388 regional brain folding associations across 77 genetic loci, with genes in associated loci enriched for expression in the cerebral cortex, neuronal development processes, and differential regulation during early brain development. We integrated brain eQTLs to refine genes for various loci, implicated several genes involved in neurodevelopmental disorders, and highlighted global genetic correlations with neuropsychiatric phenotypes. We provide an interactive 3D visualisation of our summary associations, emphasising added resolution of regional analyses. Our results offer new insights into the genetic architecture of brain folding and provide a resource for future studies of sulcal morphology in health and disease.
DOI
10.1038/s41467-022-33829-1
MAIN ANCESTRY
EUR

Sun W, et al-27532455

Summary statistics
PUBMED_LINK
27532455
TITLE
Common Genetic Polymorphisms Influence Blood Biomarker Measurements in COPD.
Main citation
Sun W, Kechris K, Jacobson S, Drummond MB, ...&, COPDGene Investigators. (2016) Common Genetic Polymorphisms Influence Blood Biomarker Measurements in COPD. PLoS Genet, 12 (8) e1006011. doi:10.1371/journal.pgen.1006011. PMID 27532455
ABSTRACT
Implementing precision medicine for complex diseases such as chronic obstructive lung disease (COPD) will require extensive use of biomarkers and an in-depth understanding of how genetic, epigenetic, and environmental variations contribute to phenotypic diversity and disease progression. A meta-analysis from two large cohorts of current and former smokers with and without COPD [SPIROMICS (N = 750); COPDGene (N = 590)] was used to identify single nucleotide polymorphisms (SNPs) associated with measurement of 88 blood proteins (protein quantitative trait loci; pQTLs). PQTLs consistently replicated between the two cohorts. Features of pQTLs were compared to previously reported expression QTLs (eQTLs). Inference of causal relations of pQTL genotypes, biomarker measurements, and four clinical COPD phenotypes (airflow obstruction, emphysema, exacerbation history, and chronic bronchitis) were explored using conditional independence tests. We identified 527 highly significant (p < 8 X 10-10) pQTLs in 38 (43%) of blood proteins tested. Most pQTL SNPs were novel with low overlap to eQTL SNPs. The pQTL SNPs explained >10% of measured variation in 13 protein biomarkers, with a single SNP (rs7041; p = 10-392) explaining 71%-75% of the measured variation in vitamin D binding protein (gene = GC). Some of these pQTLs [e.g., pQTLs for VDBP, sRAGE (gene = AGER), surfactant protein D (gene = SFTPD), and TNFRSF10C] have been previously associated with COPD phenotypes. Most pQTLs were local (cis), but distant (trans) pQTL SNPs in the ABO blood group locus were the top pQTL SNPs for five proteins. The inclusion of pQTL SNPs improved the clinical predictive value for the established association of sRAGE and emphysema, and the explanation of variance (R2) for emphysema improved from 0.3 to 0.4 when the pQTL SNP was included in the model along with clinical covariates. Causal modeling provided insight into specific pQTL-disease relationships for airflow obstruction and emphysema. In conclusion, given the frequency of highly significant local pQTLs, the large amount of variance potentially explained by pQTL, and the differences observed between pQTLs and eQTLs SNPs, we recommend that protein biomarker-disease association studies take into account the potential effect of common local SNPs and that pQTLs be integrated along with eQTLs to uncover disease mechanisms. Large-scale blood biomarker studies would also benefit from close attention to the ABO blood group.
DOI
10.1371/journal.pgen.1006011

Surapaneni A, et al-35870639

Summary statistics
PUBMED_LINK
35870639
TITLE
Identification of 969 protein quantitative trait loci in an African American population with kidney disease attributed to hypertension.
Main citation
Surapaneni A, Schlosser P, Zhou L, Liu C, ...&, Grams ME. (2022) Identification of 969 protein quantitative trait loci in an African American population with kidney disease attributed to hypertension. Kidney Int, 102 (5) 1167-1177. doi:10.1016/j.kint.2022.07.005. PMID 35870639
ABSTRACT
Investigations into the causal underpinnings of disease processes can be aided by the incorporation of genetic information. Genetic studies require populations varied in both ancestry and prevalent disease in order to optimize discovery and ensure generalizability of findings to the global population. Here, we report the genetic determinants of the serum proteome in 466 African Americans with chronic kidney disease attributed to hypertension from the richly phenotyped African American Study of Kidney Disease and Hypertension (AASK) study. Using the largest aptamer-based protein profiling platform to date (6,790 proteins or protein complexes), we identified 969 genetic associations with 900 unique proteins; including 52 novel cis (local) associations and 379 novel trans (distant) associations. The genetic effects of previously published cis-protein quantitative trait loci (pQTLs) were found to be highly reproducible, and we found evidence that our novel genetic signals colocalize with gene expression and disease processes. Many trans- pQTLs were found to reflect associations mediated by the circulating cis protein, and the common trans-pQTLs are enriched for processes involving extracellular vesicles, highlighting a plausible mechanism for distal regulation of the levels of secreted proteins. Thus, our study generates a valuable resource of genetic associations linking variants to protein levels and disease in an understudied patient population to inform future studies of drug targets and physiology.
DOI
10.1016/j.kint.2022.07.005

Taiwan BioBank Pheweb

Summary statistics
PUBMED_LINK
29149267
DESCRIPTION
Taiwan Biobank PheWeb — GWAS summary statistics for Taiwanese participants.
URL
https://taiwanview.twbiobank.org.tw/pheweb.php
TITLE
Taiwan Biobank: making cross-database convergence possible in the Big Data era.
Main citation
Lin JC, Fan CT, Liao CC, Chen YS. (2018) Taiwan Biobank: making cross-database convergence possible in the Big Data era. Gigascience, 7 (1) 1-4. doi:10.1093/gigascience/gix110. PMID 29149267
ABSTRACT
The Taiwan Biobank (TWB) is a biomedical research database of biopsy data from 200 000 participants. Access to this database has been granted to research communities taking part in the development of precision medicines; however, this has raised issues surrounding TWB's access to electronic medical records (EMRs). The Personal Data Protection Act of Taiwan restricts access to EMRs for purposes not covered by patients' original consent. This commentary explores possible legal solutions to help ensure that the access TWB has to EMR abides with legal obligations, and with governance frameworks associated with ethical, legal, and social implications. We suggest utilizing "hash function" algorithms to create nonretrospective, anonymized data for the purpose of cross-transmission and/or linkage with EMR.
DOI
10.1093/gigascience/gix110
RELATED_BIOBANK
Taiwan Biobank
MAIN ANCESTRY
EAS

TenK10k

Summary statistics
DESCRIPTION
Phase 1: matched WGS and scRNA-seq in ~1.9k individuals; common and rare variant sc-eQTLs in 28 immune cell types (SAIGE-QTL).
URL
https://www.medrxiv.org/content/10.1101/2025.03.20.25324352v2
TITLE
Impact of rare and common genetic variation on cell type-specific gene expression in human blood.
Main citation
Cuomo ASE, Spenceley E, Tanudisastro HA, Bowen B, ...&, Powell JE. (2025) Impact of rare and common genetic variation on cell type-specific gene expression in human blood. medRxiv, () . doi:10.1101/2025.03.20.25324352
ABSTRACT
Understanding the genetic basis of gene expression can shed light on the regulatory mechanisms underlying complex traits and diseases. Single cell-resolved measures of RNA levels and single-cell expression quantitative trait loci (sc-eQTLs) have revealed genetic regulation that drives sub-tissue cell states and types across diverse human tissues. Here, we describe the first phase of TenK10K, the largest-to-date dataset of matched whole-genome sequencing (WGS) and single-cell RNA-sequencing (scRNA-seq). We leverage scRNA-seq data from over 5 million cells across 28 immune cell types, and matched WGS, from 1,925 individuals, which provides power to detect associations between rare and low-frequency genetic variants that have largely been uncharacterised in their impact on cell-specific gene expression. We map the effects of both common and rare variants in a cell type-specific manner using a recently introduced method that increases power by modelling single cells directly rather than relying on aggregated ‘pseudobulk’ counts. We identify putative common regulatory variants for 83% of all 21,404 genes tested and cumulative rare variant signals for 47% of genes. We explore how genetic effects vary across cell type and state spectra, develop a framework to determine the degree to which sc-eQTLs are cell type-specific, and show that about half of the effects are observed only in one or a few cell types. By integrating our results with functional annotations and disease information, we also further characterise the likely molecular modes of action for many disease-variant associations. Finally, we explore the effects that genetic variants have on gene expression across continuous cell states and functions, and effects that vary cell state abundance directly.
DOI
10.1101/2025.03.20.25324352

Thareja G, et al-36168886

Summary statistics
PUBMED_LINK
36168886
TITLE
Differences and commonalities in the genetic architecture of protein quantitative trait loci in European and Arab populations.
Main citation
Thareja G, Belkadi A, Arnold M, Albagha OME, ...&, Suhre K. (2023) Differences and commonalities in the genetic architecture of protein quantitative trait loci in European and Arab populations. Hum Mol Genet, 32 (6) 907-916. doi:10.1093/hmg/ddac243. PMID 36168886
ABSTRACT
Polygenic scores (PGS) can identify individuals at risk of adverse health events and guide genetics-based personalized medicine. However, it is not clear how well PGS translate between different populations, limiting their application to well-studied ethnicities. Proteins are intermediate traits linking genetic predisposition and environmental factors to disease, with numerous blood circulating protein levels representing functional readouts of disease-related processes. We hypothesized that studying the genetic architecture of a comprehensive set of blood-circulating proteins between a European and an Arab population could shed fresh light on the translatability of PGS to understudied populations. We therefore conducted a genome-wide association study with whole-genome sequencing data using 1301 proteins measured on the SOMAscan aptamer-based affinity proteomics platform in 2935 samples of Qatar Biobank and evaluated the replication of protein quantitative traits (pQTLs) from European studies in an Arab population. Then, we investigated the colocalization of shared pQTL signals between the two populations. Finally, we compared the performance of protein PGS derived from a Caucasian population in a European and an Arab cohort. We found that the majority of shared pQTL signals (81.8%) colocalized between both populations. About one-third of the genetic protein heritability was explained by protein PGS derived from a European cohort, with protein PGS performing ~20% better in Europeans when compared to Arabs. Our results are relevant for the translation of PGS to non-Caucasian populations, as well as for future efforts to extend genetic research to understudied populations.
DOI
10.1093/hmg/ddac243

Tohoku Medical Megabank (TMM) Jmorp

Summary statistics
PUBMED_LINK
37930845
DESCRIPTION
Tohoku Medical Megabank / jMorp multi-omics reference and GWAS-related summary data portal.
URL
https://jmorp.megabank.tohoku.ac.jp/202109/gwas/
TITLE
jMorp: Japanese Multi-Omics Reference Panel update report 2023.
Main citation
Tadaka S, Kawashima J, Hishinuma E, Saito S, ...&, Kinoshita K. (2024) jMorp: Japanese Multi-Omics Reference Panel update report 2023. Nucleic Acids Res, 52 (D1) D622-D632. doi:10.1093/nar/gkad978. PMID 37930845
ABSTRACT
Modern medicine is increasingly focused on personalized medicine, and multi-omics data is crucial in understanding biological phenomena and disease mechanisms. Each ethnic group has its unique genetic background with specific genomic variations influencing disease risk and drug response. Therefore, multi-omics data from specific ethnic populations are essential for the effective implementation of personalized medicine. Various prospective cohort studies, such as the UK Biobank, All of Us and Lifelines, have been conducted worldwide. The Tohoku Medical Megabank project was initiated after the Great East Japan Earthquake in 2011. It collects biological specimens and conducts genome and omics analyses to build a basis for personalized medicine. Summary statistical data from these analyses are available in the jMorp web database (https://jmorp.megabank.tohoku.ac.jp), which provides a multidimensional approach to the diversity of the Japanese population. jMorp was launched in 2015 as a public database for plasma metabolome and proteome analyses and has been continuously updated. The current update will significantly expand the scale of the data (metabolome, genome, transcriptome, and metagenome). In addition, the user interface and backend server implementations were rewritten to improve the connectivity between the items stored in jMorp. This paper provides an overview of the new version of the jMorp.
DOI
10.1093/nar/gkad978
RELATED_BIOBANK
Tohoku Medical Megabank
MAIN ANCESTRY
EAS

TPMI PheWeb

Summary statistics
PUBMED_LINK
41092961
DESCRIPTION
Taiwan Precision Medicine Initiative PheWeb — cohort GWAS summary statistics.
URL
https://pheweb.ibms.sinica.edu.tw/
TITLE
The Taiwan Precision Medicine Initiative provides a cohort for large-scale studies.
Main citation
Yang HC, Kwok PY, Li LH, Liu YM, ...&, Wu JY. (2025) The Taiwan Precision Medicine Initiative provides a cohort for large-scale studies. Nature, 648 (8092) 117-127. doi:10.1038/s41586-025-09680-x. PMID 41092961
ABSTRACT
Han Chinese people comprise nearly 20% of the global population but remain under-represented in genetic studies1,2, so there is an urgent need for large-scale cohorts to advance precision medicine. Here we present the Taiwan Precision Medicine Initiative (TPMI), established by Academia Sinica in collaboration with 16 major medical centres around Taiwan, which has recruited 565,390 participants who consent to provide DNA samples for genetic profiling and grant access to their electronic medical records (EMRs) for research. EMR access is both retrospective and prospective, allowing longitudinal studies. Genetic profiling is done with population-optimized arrays of single-nucleotide polymorphisms for people of Han Chinese ancestry, which enable genome-wide association3,4, phenome-wide association5,6 and polygenic risk score7,8 studies to be performed to evaluate common disease risk and pharmacogenetic response. Participants also agreed to be re-contacted for future research and receive personalized genetic risk profiles with health management recommendations. The TPMI has established the TPMI Data Access Platform, a central database and analysis platform that both safeguards the security of the data and facilitates academic research. As a large cohort of individuals with non-European ancestry that merges genetic profiles with EMR data and enables longitudinal follow-up, TPMI provides a unique resource that could be used to validate genetic risk prediction models, perform clinical trials of risk-based health management and inform health policies. Ultimately, the TPMI cohort will contribute to global genetic research and serve as a model for population-based precision medicine.
DOI
10.1038/s41586-025-09680-x
RELATED_BIOBANK
Taiwan Precision Medicine Initiative
MAIN ANCESTRY
EAS

UKB

Summary statistics
PUBMED_LINK
41639462
URL
https://azphewas.com/
TITLE
Phenome-wide analysis of copy number variants in 470,727 UK Biobank genomes.
Main citation
Zou XZ, Hu F, Lou H, Burren OS, ...&, Carss K. (2026) Phenome-wide analysis of copy number variants in 470,727 UK Biobank genomes. Nature, () . doi:10.1038/s41586-025-10087-x. PMID 41639462
ABSTRACT
Copy number variants (CNVs) are key drivers of human diversity and disease risk1. Here we evaluate the role of CNVs across a broad range of human phenotypes and diseases by analysing CNVs from 470,727 UK Biobank whole-genome sequences and conducting a variant- and gene-level phenome-wide association study (PheWAS) with 2,941 plasma protein abundance measurements, 13,336 binary clinical phenotypes and 1,911 quantitative traits. Proteomic analyses validated functional associations of CNVs with nearby genes (cis-protein quantitative trait loci; cis-pQTLs)-with deletions and duplications typically associated with reduced and increased protein levels, respectively-and uncovered previously unknown protein-protein interactions (trans-pQTLs). Our PheWAS recapitulated known associations and uncovered associations in both coding and non-coding regions. Notably, we identified a rare deletion in ZNF451 associated with increased leukocyte telomere length and a non-coding deletion of a SLC2A9 enhancer associated with reduced gout risk. In addition, by combining CNVs with protein-coding single nucleotide variants and indels, we enhanced the power of our study to detect gene-disease associations. Finally, we leveraged this multiomics dataset to identify several pQTLs that constitute candidate biomarkers, including TMPRSS5 for Charcot-Marie-Tooth disease type 1A. This multiancestry whole-genome-sequence CNV PheWAS offers insights into the roles of CNVs in human health outcomes and could serve as a valuable resource for therapeutic development.
DOI
10.1038/s41586-025-10087-x
RELATED_BIOBANK
UK Biobank
MAIN ANCESTRY
EUR

UKB exome

Summary statistics
PUBMED_LINK
34375979
DESCRIPTION
UK Biobank exome sequence-based GWAS summary statistics (gene- and variant-level association resource).
URL
https://azphewas.com/
TITLE
Rare variant contribution to human disease in 281,104 UK Biobank exomes.
Main citation
Wang Q, Dhindsa RS, Carss K, Harper AR, ...&, Petrovski S. (2021) Rare variant contribution to human disease in 281,104 UK Biobank exomes. Nature, 597 (7877) 527-532. doi:10.1038/s41586-021-03855-y. PMID 34375979
ABSTRACT
Genome-wide association studies have uncovered thousands of common variants associated with human disease, but the contribution of rare variants to common disease remains relatively unexplored. The UK Biobank contains detailed phenotypic data linked to medical records for approximately 500,000 participants, offering an unprecedented opportunity to evaluate the effect of rare variation on a broad collection of traits1,2. Here we study the relationships between rare protein-coding variants and 17,361 binary and 1,419 quantitative phenotypes using exome sequencing data from 269,171 UK Biobank participants of European ancestry. Gene-based collapsing analyses revealed 1,703 statistically significant gene-phenotype associations for binary traits, with a median odds ratio of 12.4. Furthermore, 83% of these associations were undetectable via single-variant association tests, emphasizing the power of gene-based collapsing analysis in the setting of high allelic heterogeneity. Gene-phenotype associations were also significantly enriched for loss-of-function-mediated traits and approved drug targets. Finally, we performed ancestry-specific and pan-ancestry collapsing analyses using exome sequencing data from 11,933 UK Biobank participants of African, East Asian or South Asian ancestry. Our results highlight a significant contribution of rare variants to common disease. Summary statistics are publicly available through an interactive portal ( http://azphewas.com/ ).
DOI
10.1038/s41586-021-03855-y
RELATED_BIOBANK
UK Biobank
MAIN ANCESTRY
EUR

UKB fastgwa (Imputation)

Summary statistics
PUBMED_LINK
31768069
DESCRIPTION
UK Biobank GWAS from fastGWA on imputed genotype data (continuous and binary traits).
URL
https://yanglab.westlake.edu.cn/data/ukb_fastgwa/imp/
TITLE
A resource-efficient tool for mixed model association analysis of large-scale data.
Main citation
Jiang L, Zheng Z, Qi T, Kemper KE, ...&, Yang J. (2019) A resource-efficient tool for mixed model association analysis of large-scale data. Nat Genet, 51 (12) 1749-1755. doi:10.1038/s41588-019-0530-8. PMID 31768069
ABSTRACT
The genome-wide association study (GWAS) has been widely used as an experimental design to detect associations between genetic variants and a phenotype. Two major confounding factors, population stratification and relatedness, could potentially lead to inflated GWAS test statistics and hence to spurious associations. Mixed linear model (MLM)-based approaches can be used to account for sample structure. However, genome-wide association (GWA) analyses in biobank samples such as the UK Biobank (UKB) often exceed the capability of most existing MLM-based tools especially if the number of traits is large. Here, we develop an MLM-based tool (fastGWA) that controls for population stratification by principal components and for relatedness by a sparse genetic relationship matrix for GWA analyses of biobank-scale data. We demonstrate by extensive simulations that fastGWA is reliable, robust and highly resource-efficient. We then apply fastGWA to 2,173 traits on array-genotyped and imputed samples from 456,422 individuals and to 2,048 traits on whole-exome-sequenced samples from 46,191 individuals in the UKB.
DOI
10.1038/s41588-019-0530-8
RELATED_BIOBANK
UK Biobank
MAIN ANCESTRY
EUR

UKB fastgwa (WES)

Summary statistics
PUBMED_LINK
31768069
DESCRIPTION
UK Biobank GWAS from fastGWA on whole-exome sequence data.
URL
https://yanglab.westlake.edu.cn/data/ukb_fastgwa/wes/
TITLE
A resource-efficient tool for mixed model association analysis of large-scale data.
Main citation
Jiang L, Zheng Z, Qi T, Kemper KE, ...&, Yang J. (2019) A resource-efficient tool for mixed model association analysis of large-scale data. Nat Genet, 51 (12) 1749-1755. doi:10.1038/s41588-019-0530-8. PMID 31768069
ABSTRACT
The genome-wide association study (GWAS) has been widely used as an experimental design to detect associations between genetic variants and a phenotype. Two major confounding factors, population stratification and relatedness, could potentially lead to inflated GWAS test statistics and hence to spurious associations. Mixed linear model (MLM)-based approaches can be used to account for sample structure. However, genome-wide association (GWA) analyses in biobank samples such as the UK Biobank (UKB) often exceed the capability of most existing MLM-based tools especially if the number of traits is large. Here, we develop an MLM-based tool (fastGWA) that controls for population stratification by principal components and for relatedness by a sparse genetic relationship matrix for GWA analyses of biobank-scale data. We demonstrate by extensive simulations that fastGWA is reliable, robust and highly resource-efficient. We then apply fastGWA to 2,173 traits on array-genotyped and imputed samples from 456,422 individuals and to 2,048 traits on whole-exome-sequenced samples from 46,191 individuals in the UKB.
DOI
10.1038/s41588-019-0530-8
RELATED_BIOBANK
UK Biobank
MAIN ANCESTRY
EUR

UKB fastgwa-glmm (Binary)

Summary statistics
PUBMED_LINK
34737426
DESCRIPTION
UK Biobank binary-trait GWAS from SAIGE-style GLMM analysis (fastGWA-glmm pipeline).
URL
https://yanglab.westlake.edu.cn/data/ukb_fastgwa/imp_binary/
TITLE
A generalized linear mixed model association tool for biobank-scale data.
Main citation
Jiang L, Zheng Z, Fang H, Yang J. (2021) A generalized linear mixed model association tool for biobank-scale data. Nat Genet, 53 (11) 1616-1621. doi:10.1038/s41588-021-00954-4. PMID 34737426
ABSTRACT
Compared with linear mixed model-based genome-wide association (GWA) methods, generalized linear mixed model (GLMM)-based methods have better statistical properties when applied to binary traits but are computationally much slower. In the present study, leveraging efficient sparse matrix-based algorithms, we developed a GLMM-based GWA tool, fastGWA-GLMM, that is severalfold to orders of magnitude faster than the state-of-the-art tools when applied to the UK Biobank (UKB) data and scalable to cohorts with millions of individuals. We show by simulation that the fastGWA-GLMM test statistics of both common and rare variants are well calibrated under the null, even for traits with extreme case-control ratios. We applied fastGWA-GLMM to the UKB data of 456,348 individuals, 11,842,647 variants and 2,989 binary traits (full summary statistics available at http://fastgwa.info/ukbimpbin ), and identified 259 rare variants associated with 75 traits, demonstrating the use of imputed genotype data in a large cohort to discover rare variants for binary complex traits.
DOI
10.1038/s41588-021-00954-4
RELATED_BIOBANK
UK Biobank
MAIN ANCESTRY
EUR

UKB gene-based (Genebass)

Summary statistics
PUBMED_LINK
36778668
DESCRIPTION
UK Biobank gene-based association results from the Genebass / exome analysis resource.
URL
https://genebass.org/
TITLE
Systematic single-variant and gene-based association testing of thousands of phenotypes in 394,841 UK Biobank exomes.
Main citation
Karczewski KJ, Solomonson M, Chao KR, Goodrich JK, ...&, Neale BM. (2022) Systematic single-variant and gene-based association testing of thousands of phenotypes in 394,841 UK Biobank exomes. Cell Genom, 2 (9) 100168. doi:10.1016/j.xgen.2022.100168. PMID 36778668
ABSTRACT
Genome-wide association studies have successfully discovered thousands of common variants associated with human diseases and traits, but the landscape of rare variations in human disease has not been explored at scale. Exome-sequencing studies of population biobanks provide an opportunity to systematically evaluate the impact of rare coding variations across a wide range of phenotypes to discover genes and allelic series relevant to human health and disease. Here, we present results from systematic association analyses of 4,529 phenotypes using single-variant and gene tests of 394,841 individuals in the UK Biobank with exome-sequence data. We find that the discovery of genetic associations is tightly linked to frequency and is correlated with metrics of deleteriousness and natural selection. We highlight biological findings elucidated by these data and release the dataset as a public resource alongside the Genebass browser for rapidly exploring rare-variant association results.
DOI
10.1016/j.xgen.2022.100168
RELATED_BIOBANK
UK Biobank
MAIN ANCESTRY
EUR

UKB saige

Summary statistics
PUBMED_LINK
30104761
DESCRIPTION
UK Biobank GWAS with SAIGE (mixed-model association for biobank-scale binary and quantitative traits).
URL
https://pheweb.org/UKB-SAIGE/
TITLE
Efficiently controlling for case-control imbalance and sample relatedness in large-scale genetic association studies.
Main citation
Zhou W, Nielsen JB, Fritsche LG, Dey R, ...&, Lee S. (2018) Efficiently controlling for case-control imbalance and sample relatedness in large-scale genetic association studies. Nat Genet, 50 (9) 1335-1341. doi:10.1038/s41588-018-0184-y. PMID 30104761
ABSTRACT
In genome-wide association studies (GWAS) for thousands of phenotypes in large biobanks, most binary traits have substantially fewer cases than controls. Both of the widely used approaches, the linear mixed model and the recently proposed logistic mixed model, perform poorly; they produce large type I error rates when used to analyze unbalanced case-control phenotypes. Here we propose a scalable and accurate generalized mixed model association test that uses the saddlepoint approximation to calibrate the distribution of score test statistics. This method, SAIGE (Scalable and Accurate Implementation of GEneralized mixed model), provides accurate P values even when case-control ratios are extremely unbalanced. SAIGE uses state-of-art optimization strategies to reduce computational costs; hence, it is applicable to GWAS for thousands of phenotypes by large biobanks. Through the analysis of UK Biobank data of 408,961 samples from white British participants with European ancestry for > 1,400 binary phenotypes, we show that SAIGE can efficiently analyze large sample data, controlling for unbalanced case-control ratios and sample relatedness.
DOI
10.1038/s41588-018-0184-y
RELATED_BIOBANK
UK Biobank
MAIN ANCESTRY
EUR

UKB TOPMed

Summary statistics
PUBMED_LINK
33568819
DESCRIPTION
UK Biobank GWAS using TOPMed-imputed genotypes (multi-ancestry imputation panel).
URL
https://pheweb.org/UKB-TOPMed/
TITLE
Sequencing of 53,831 diverse genomes from the NHLBI TOPMed Program.
Main citation
Taliun D, Harris DN, Kessler MD, Carlson J, ...&, Abecasis GR. (2021) Sequencing of 53,831 diverse genomes from the NHLBI TOPMed Program. Nature, 590 (7845) 290-299. doi:10.1038/s41586-021-03205-y. PMID 33568819
ABSTRACT
The Trans-Omics for Precision Medicine (TOPMed) programme seeks to elucidate the genetic architecture and biology of heart, lung, blood and sleep disorders, with the ultimate goal of improving diagnosis, treatment and prevention of these diseases. The initial phases of the programme focused on whole-genome sequencing of individuals with rich phenotypic data and diverse backgrounds. Here we describe the TOPMed goals and design as well as the available resources and early insights obtained from the sequence data. The resources include a variant browser, a genotype imputation server, and genomic and phenotypic data that are available through dbGaP (Database of Genotypes and Phenotypes)1. In the first 53,831 TOPMed samples, we detected more than 400 million single-nucleotide and insertion or deletion variants after alignment with the reference genome. Additional previously undescribed variants were detected through assembly of unmapped reads and customized analysis in highly variable loci. Among the more than 400 million detected variants, 97% have frequencies of less than 1% and 46% are singletons that are present in only one individual (53% among unrelated individuals). These rare variants provide insights into mutational processes and recent human evolutionary history. The extensive catalogue of genetic variation in TOPMed studies provides unique opportunities for exploring the contributions of rare and noncoding sequence variants to phenotypic variation. Furthermore, combining TOPMed haplotypes with modern imputation methods improves the power and reach of genome-wide association studies to include variants down to a frequency of approximately 0.01%.
DOI
10.1038/s41586-021-03205-y
RELATED_BIOBANK
UK Biobank
MAIN ANCESTRY
EUR

Wang C-35606419

Summary statistics
PUBMED_LINK
35606419
DESCRIPTION
Quantitative susceptibility mapping
TITLE
Phenotypic and genetic associations of quantitative magnetic susceptibility in UK Biobank brain imaging.
Main citation
Wang C, Martins-Bach AB, Alfaro-Almagro F, Douaud G, ...&, Miller KL. (2022) Phenotypic and genetic associations of quantitative magnetic susceptibility in UK Biobank brain imaging. Nat Neurosci, 25 (6) 818-831. doi:10.1038/s41593-022-01074-w. PMID 35606419
ABSTRACT
A key aim in epidemiological neuroscience is identification of markers to assess brain health and monitor therapeutic interventions. Quantitative susceptibility mapping (QSM) is an emerging magnetic resonance imaging technique that measures tissue magnetic susceptibility and has been shown to detect pathological changes in tissue iron, myelin and calcification. We present an open resource of QSM-based imaging measures of multiple brain structures in 35,273 individuals from the UK Biobank prospective epidemiological study. We identify statistically significant associations of 251 phenotypes with magnetic susceptibility that include body iron, disease, diet and alcohol consumption. Genome-wide associations relate magnetic susceptibility to 76 replicating clusters of genetic variants with biological functions involving iron, calcium, myelin and extracellular matrix. These patterns of associations include relationships that are unique to QSM, in particular being complementary to T2* signal decay time measures. These new imaging phenotypes are being integrated into the core UK Biobank measures provided to researchers worldwide, creating the potential to discover new, non-invasive markers of brain health.
DOI
10.1038/s41593-022-01074-w
MAIN ANCESTRY
EUR

Wang QS, et al-39317738

Summary statistics
PUBMED_LINK
39317738
TITLE
Statistically and functionally fine-mapped blood eQTLs and pQTLs from 1,405 humans reveal distinct regulation patterns and disease relevance.
Main citation
Wang QS, Hasegawa T, Namkoong H, Saiki R, ...&, Japan COVID-19 Task Force. (2024) Statistically and functionally fine-mapped blood eQTLs and pQTLs from 1,405 humans reveal distinct regulation patterns and disease relevance. Nat Genet, 56 (10) 2054-2067. doi:10.1038/s41588-024-01896-3. PMID 39317738
ABSTRACT
Studying the genetic regulation of protein expression (through protein quantitative trait loci (pQTLs)) offers a deeper understanding of regulatory variants uncharacterized by mRNA expression regulation (expression QTLs (eQTLs)) studies. Here we report cis-eQTL and cis-pQTL statistical fine-mapping from 1,405 genotyped samples with blood mRNA and 2,932 plasma samples of protein expression, as part of the Japan COVID-19 Task Force (JCTF). Fine-mapped eQTLs (n = 3,464) were enriched for 932 variants validated with a massively parallel reporter assay. Fine-mapped pQTLs (n = 582) were enriched for missense variations on structured and extracellular domains, although the possibility of epitope-binding artifacts remains. Trans-eQTL and trans-pQTL analysis highlighted associations of class I HLA allele variation with KIR genes. We contrast the multi-tissue origin of plasma protein with blood mRNA, contributing to the limited colocalization level, distinct regulatory mechanisms and trait relevance of eQTLs and pQTLs. We report a negative correlation between ABO mRNA and protein expression because of linkage disequilibrium between distinct nearby eQTLs and pQTLs.
DOI
10.1038/s41588-024-01896-3
MAIN ANCESTRY
EAS

Warrier V-37592024

Summary statistics
PUBMED_LINK
37592024
TITLE
Genetic insights into human cortical organization and development through genome-wide analyses of 2,347 neuroimaging phenotypes.
Main citation
Warrier V, Stauffer EM, Huang QQ, Wigdor EM, ...&, Bethlehem RAI. (2023) Genetic insights into human cortical organization and development through genome-wide analyses of 2,347 neuroimaging phenotypes. Nat Genet, 55 (9) 1483-1493. doi:10.1038/s41588-023-01475-y. PMID 37592024
ABSTRACT
Our understanding of the genetics of the human cerebral cortex is limited both in terms of the diversity and the anatomical granularity of brain structural phenotypes. Here we conducted a genome-wide association meta-analysis of 13 structural and diffusion magnetic resonance imaging-derived cortical phenotypes, measured globally and at 180 bilaterally averaged regions in 36,663 individuals and identified 4,349 experiment-wide significant loci. These phenotypes include cortical thickness, surface area, gray matter volume, measures of folding, neurite density and water diffusion. We identified four genetic latent structures and causal relationships between surface area and some measures of cortical folding. These latent structures partly relate to different underlying gene expression trajectories during development and are enriched for different cell types. We also identified differential enrichment for neurodevelopmental and constrained genes and demonstrate that common genetic variants associated with cortical expansion are associated with cephalic disorders. Finally, we identified complex interphenotype and inter-regional genetic relationships among the 13 phenotypes, reflecting the developmental differences among them. Together, these analyses identify distinct genetic organizational principles of the cortex and their correlates with neurodevelopment.
DOI
10.1038/s41588-023-01475-y
MAIN ANCESTRY
EUR

Westlake gut bacteria GWAS

Summary statistics
DESCRIPTION
Genome-wide association analyses for human gut bacteria in Han Chinese (n = 7,935) from the Westlake Chinese Multi-omics GWAS Catalog. N_MICROBES matches the bacteria phenotypes.tsv table on the portal (2026).
URL
https://omics.lab.westlake.edu.cn/data/bacteria/phenotypes ,https://omics.lab.westlake.edu.cn/collect.html
Main citation
Laboratory of Precision Nutrition and Computational Medicine, Westlake University. Gut bacteria GWAS summary statistics, Han Chinese (n = 7,935). Westlake Chinese Multi-omics GWAS Catalog. https://omics.lab.westlake.edu.cn/data.html (accessed 2026).
MAIN ANCESTRY
EAS
METAGENOME
Gut bacteria

Westlake gut fungi GWAS

Summary statistics
DESCRIPTION
Genome-wide association analyses for human gut fungi (mycobiome) in Han Chinese (n = 7,350) from the Westlake Chinese Multi-omics GWAS Catalog. N_MICROBES matches the fungi phenotypes.tsv table on the portal (2026). Companion publication was listed as unpublished on the catalog data page as of 2026.
URL
https://omics.lab.westlake.edu.cn/data/fungi/phenotypes ,https://omics.lab.westlake.edu.cn/collect.html
Main citation
Laboratory of Precision Nutrition and Computational Medicine, Westlake University. Gut fungi GWAS summary statistics, Han Chinese (n = 7,350). Westlake Chinese Multi-omics GWAS Catalog. https://omics.lab.westlake.edu.cn/data.html (accessed 2026). Associated manuscript cited on the portal as “Genetic architecture of the human gut mycobiome” (unpublished).
MAIN ANCESTRY
EAS
METAGENOME
Gut fungi

Xu F, et al-36797296

Summary statistics
PUBMED_LINK
36797296
URL
https://omics.lab.westlake.edu.cn/data/proteins/phenotypes ,https://omics.lab.westlake.edu.cn/collect.html
TITLE
Genome-wide genotype-serum proteome mapping provides insights into the cross-ancestry differences in cardiometabolic disease susceptibility.
Main citation
Xu F, Yu EY, Cai X, Yue L, ...&, Zheng JS. (2023) Genome-wide genotype-serum proteome mapping provides insights into the cross-ancestry differences in cardiometabolic disease susceptibility. Nat Commun, 14 (1) 896. doi:10.1038/s41467-023-36491-3. PMID 36797296
ABSTRACT
Identification of protein quantitative trait loci (pQTL) helps understand the underlying mechanisms of diseases and discover promising targets for pharmacological intervention. For most important class of drug targets, genetic evidence needs to be generalizable to diverse populations. Given that the majority of the previous studies were conducted in European ancestry populations, little is known about the protein-associated genetic variants in East Asians. Based on data-independent acquisition mass spectrometry technique, we conduct genome-wide association analyses for 304 unique proteins in 2,958 Han Chinese participants. We identify 195 genetic variant-protein associations. Colocalization and Mendelian randomization analyses highlight 60 gene-protein-phenotype associations, 45 of which (75%) have not been prioritized in Europeans previously. Further cross-ancestry analyses uncover key proteins that contributed to the differences in the obesity-induced diabetes and coronary artery disease susceptibility. These findings provide novel druggable proteins as well as a unique resource for the trans-ancestry evaluation of protein-targeted drug discovery.
DOI
10.1038/s41467-023-36491-3

Xu Z-28736311

Summary statistics
PUBMED_LINK
28736311
TITLE
Imaging-wide association study: Integrating imaging endophenotypes in GWAS.
Main citation
Xu Z, Wu C, Pan W, Alzheimer's Disease Neuroimaging Initiative. (2017) Imaging-wide association study: Integrating imaging endophenotypes in GWAS. Neuroimage, 159 () 159-169. doi:10.1016/j.neuroimage.2017.07.036. PMID 28736311
ABSTRACT
A new and powerful approach, called imaging-wide association study (IWAS), is proposed to integrate imaging endophenotypes with GWAS to boost statistical power and enhance biological interpretation for GWAS discoveries. IWAS extends the promising transcriptome-wide association study (TWAS) from using gene expression endophenotypes to using imaging and other endophenotypes with a much wider range of possible applications. As illustration, we use gray-matter volumes of several brain regions of interest (ROIs) drawn from the ADNI-1 structural MRI data as imaging endophenotypes, which are then applied to the individual-level GWAS data of ADNI-GO/2 and a large meta-analyzed GWAS summary statistics dataset (based on about 74,000 individuals), uncovering some novel genes significantly associated with Alzheimer's disease (AD). We also compare the performance of IWAS with TWAS, showing much larger numbers of significant AD-associated genes discovered by IWAS, presumably due to the stronger link between brain atrophy and AD than that between gene expression of normal individuals and the risk for AD. The proposed IWAS is general and can be applied to other imaging endophenotypes, and GWAS individual-level or summary association data.
DOI
10.1016/j.neuroimage.2017.07.036

Yang C, et al-34239129

Summary statistics
PUBMED_LINK
34239129
TITLE
Genomic atlas of the proteome from brain, CSF and plasma prioritizes proteins implicated in neurological disorders.
Main citation
Yang C, Farias FHG, Ibanez L, Suhy A, ...&, Cruchaga C. (2021) Genomic atlas of the proteome from brain, CSF and plasma prioritizes proteins implicated in neurological disorders. Nat Neurosci, 24 (9) 1302-1312. doi:10.1038/s41593-021-00886-6. PMID 34239129
ABSTRACT
Understanding the tissue-specific genetic controls of protein levels is essential to uncover mechanisms of post-transcriptional gene regulation. In this study, we generated a genomic atlas of protein levels in three tissues relevant to neurological disorders (brain, cerebrospinal fluid and plasma) by profiling thousands of proteins from participants with and without Alzheimer's disease. We identified 274, 127 and 32 protein quantitative trait loci (pQTLs) for cerebrospinal fluid, plasma and brain, respectively. cis-pQTLs were more likely to be tissue shared, but trans-pQTLs tended to be tissue specific. Between 48.0% and 76.6% of pQTLs did not co-localize with expression, splicing, DNA methylation or histone acetylation QTLs. Using Mendelian randomization, we nominated proteins implicated in neurological diseases, including Alzheimer's disease, Parkinson's disease and stroke. This first multi-tissue study will be instrumental to map signals from genome-wide association studies onto functional genes, to discover pathways and to identify drug targets for neurological diseases.
DOI
10.1038/s41593-021-00886-6

Yang Lab xQTL

Summary statistics
PUBMED_LINK
39623049
DESCRIPTION
Yang lab SMR/xQTL data resource — public GWAS and molecular QTL summary statistics for integrative analysis.
URL
https://yanglab.westlake.edu.cn/software/smr/#DataResource
TITLE
SMR-Portal: an online platform for integrative analysis of GWAS and xQTL data to identify complex trait genes.
Main citation
Guo Y, Xu T, Luo J, Jiang Z, ...&, Yang J. (2025) SMR-Portal: an online platform for integrative analysis of GWAS and xQTL data to identify complex trait genes. Nat Methods, 22 (2) 220-222. doi:10.1038/s41592-024-02561-7. PMID 39623049
DOI
10.1038/s41592-024-02561-7
MAIN ANCESTRY
EUR

Yao C, et al-30111768

Summary statistics
PUBMED_LINK
30111768
TITLE
Genome-wide mapping of plasma protein QTLs identifies putatively causal genes and pathways for cardiovascular disease.
Main citation
Yao C, Chen G, Song C, Keefe J, ...&, Levy D. (2018) Genome-wide mapping of plasma protein QTLs identifies putatively causal genes and pathways for cardiovascular disease. Nat Commun, 9 (1) 3268. doi:10.1038/s41467-018-05512-x. PMID 30111768
ABSTRACT
Identifying genetic variants associated with circulating protein concentrations (protein quantitative trait loci; pQTLs) and integrating them with variants from genome-wide association studies (GWAS) may illuminate the proteome's causal role in disease and bridge a knowledge gap regarding SNP-disease associations. We provide the results of GWAS of 71 high-value cardiovascular disease proteins in 6861 Framingham Heart Study participants and independent external replication. We report the mapping of over 16,000 pQTL variants and their functional relevance. We provide an integrated plasma protein-QTL database. Thirteen proteins harbor pQTL variants that match coronary disease-risk variants from GWAS or test causal for coronary disease by Mendelian randomization. Eight of these proteins predict new-onset cardiovascular disease events in Framingham participants. We demonstrate that identifying pQTLs, integrating them with GWAS results, employing Mendelian randomization, and prospectively testing protein-trait associations holds potential for elucidating causal genes, proteins, and pathways for cardiovascular disease and may identify targets for its prevention and treatment.
DOI
10.1038/s41467-018-05512-x

Yu S

Summary statistics
PREPRINT_DOI
10.1101/2024.01.11.575251
SERVER
biorxiv
Main citation
Yu, S. et al. A novel classification framework for genome-wide association study of whole brain MRI images using deep learning. bioRxiv 2024.01.11.575251 (2024) doi:10.1101/2024.01.11.575251.

Zhong W, et al-32576278

Summary statistics
PUBMED_LINK
32576278
TITLE
Whole-genome sequence association analysis of blood proteins in a longitudinal wellness cohort.
Main citation
Zhong W, Gummesson A, Tebani A, Karlsson MJ, ...&, Uhlén M. (2020) Whole-genome sequence association analysis of blood proteins in a longitudinal wellness cohort. Genome Med, 12 (1) 53. doi:10.1186/s13073-020-00755-0. PMID 32576278
ABSTRACT
BACKGROUND: The human plasma proteome is important for many biological processes and targets for diagnostics and therapy. It is therefore of great interest to understand the interplay of genetic and environmental factors to determine the specific protein levels in individuals and to gain a deeper insight of the importance of genetic architecture related to the individual variability of plasma levels of proteins during adult life. METHODS: We have combined whole-genome sequencing, multiplex plasma protein profiling, and extensive clinical phenotyping in a longitudinal 2-year wellness study of 101 healthy individuals with repeated sampling. Analyses of genetic and non-genetic associations related to the variability of blood levels of proteins in these individuals were performed. RESULTS: The analyses showed that each individual has a unique protein profile, and we report on the intra-individual as well as inter-individual variation for 794 plasma proteins. A genome-wide association study (GWAS) using 7.3 million genetic variants identified by whole-genome sequencing revealed 144 independent variants across 107 proteins that showed strong association (P < 6 × 10-11) between genetics and the inter-individual variability on protein levels. Many proteins not reported before were identified (67 out of 107) with individual plasma level affected by genetics. Our longitudinal analysis further demonstrates that these levels are stable during the 2-year study period. The variability of protein profiles as a consequence of environmental factors was also analyzed with focus on the effects of weight loss and infections. CONCLUSIONS: We show that the adult blood levels of many proteins are determined at birth by genetics, which is important for efforts aimed to understand the relationship between plasma proteome profiles and human biology and disease.
DOI
10.1186/s13073-020-00755-0