Summary statistics

https://www.diagram-consortium.org/downloads.html

MAIN ANCESTRY

EUR

DIAGRAM

Summary statistics

PUBMED_LINK

22885922

DESCRIPTION

Type 2 diabetes GWAS meta-analysis summary statistics from the DIAGRAM consortium.

Show full descriptionShow less

URL

TITLE

Large-scale association analysis provides insights into the genetic architecture and pathophysiology of type 2 diabetes.

Main citation

Morris AP, Voight BF, Teslovich TM, Ferreira T, ...&, DIAbetes Genetics Replication And Meta-analysis (DIAGRAM) Consortium. (2012) Large-scale association analysis provides insights into the genetic architecture and pathophysiology of type 2 diabetes. Nat Genet, 44 (9) 981-90. doi:10.1038/ng.2383. PMID 22885922

ABSTRACT

To extend understanding of the genetic architecture and molecular basis of type 2 diabetes (T2D), we conducted a meta-analysis of genetic variants on the Metabochip, including 34,840 cases and 114,981 controls, overwhelmingly of European descent. We identified ten previously unreported T2D susceptibility loci, including two showing sex-differentiated association. Genome-wide analyses of these data are consistent with a long tail of additional common variant loci explaining much of the variation in susceptibility to T2D. Exploration of the enlarged set of susceptibility loci implicates several processes, including CREBBP-related transcription, adipocytokine signaling and cell cycle regulation, in diabetes pathogenesis.

Show full abstractShow less

DOI

10.1038/ng.2383

MAIN ANCESTRY

EUR

eGTEx

Summary statistics

PUBMED_LINK

36510025

DESCRIPTION

Enhanceing GTEx

Show full descriptionShow less

URL

https://gtexportal.org/home/downloads/egtex/methylation

TITLE

DNA methylation QTL mapping across diverse human tissues provides molecular links between genetic variation and complex traits.

Main citation

Oliva M, Demanelis K, Lu Y, Chernoff M, ...&, Pierce BL. (2023) DNA methylation QTL mapping across diverse human tissues provides molecular links between genetic variation and complex traits. Nat Genet, 55 (1) 112-122. doi:10.1038/s41588-022-01248-z. PMID 36510025

ABSTRACT

Studies of DNA methylation (DNAm) in solid human tissues are relatively scarce; tissue-specific characterization of DNAm is needed to understand its role in gene regulation and its relevance to complex traits. We generated array-based DNAm profiles for 987 human samples from the Genotype-Tissue Expression (GTEx) project, representing 9 tissue types and 424 subjects. We characterized methylome and transcriptome correlations (eQTMs), genetic regulation in cis (mQTLs and eQTLs) across tissues and e/mQTLs links to complex traits. We identified mQTLs for 286,152 CpG sites, many of which (>5%) show tissue specificity, and mQTL colocalizations with 2,254 distinct GWAS hits across 83 traits. For 91% of these loci, a candidate gene link was identified by integration of functional maps, including eQTMs, and/or eQTL colocalization, but only 33% of loci involved an eQTL and mQTL present in the same tissue type. With this DNAm-focused integrative analysis, we contribute to the understanding of molecular regulatory mechanisms in human tissues and their impact on complex traits.

Show full abstractShow less

DOI

10.1038/s41588-022-01248-z

Eldjarn GH, et al-37794188

Summary statistics

PUBMED_LINK

37794188

TITLE

Large-scale plasma proteomics comparisons through genetics and disease associations.

Main citation

Eldjarn GH, Ferkingstad E, Lund SH, Helgason H, ...&, Stefansson K. (2023) Large-scale plasma proteomics comparisons through genetics and disease associations. Nature, 622 (7982) 348-358. doi:10.1038/s41586-023-06563-x. PMID 37794188

ABSTRACT

High-throughput proteomics platforms measuring thousands of proteins in plasma combined with genomic and phenotypic information have the power to bridge the gap between the genome and diseases. Here we performed association studies of Olink Explore 3072 data generated by the UK Biobank Pharma Proteomics Project1 on plasma samples from more than 50,000 UK Biobank participants with phenotypic and genotypic data, stratifying on British or Irish, African and South Asian ancestries. We compared the results with those of a SomaScan v4 study on plasma from 36,000 Icelandic people2, for 1,514 of whom Olink data were also available. We found modest correlation between the two platforms. Although cis protein quantitative trait loci were detected for a similar absolute number of assays on the two platforms (2,101 on Olink versus 2,120 on SomaScan), the proportion of assays with such supporting evidence for assay performance was higher on the Olink platform (72% versus 43%). A considerable number of proteins had genomic associations that differed between the platforms. We provide examples where differences between platforms may influence conclusions drawn from the integration of protein levels with the study of diseases. We demonstrate how leveraging the diverse ancestries of participants in the UK Biobank helps to detect novel associations and refine genomic location. Our results show the value of the information provided by the two most commonly used high-throughput proteomics platforms and demonstrate the differences between them that at times provides useful complementarity.

Show full abstractShow less

DOI

10.1038/s41586-023-06563-x

RELATED_BIOBANK

https://pheweb.sph.umich.edu/FinMetSeq/

MAIN ANCESTRY

EUR

Elliott LT-30305740

Summary statistics

PUBMED_LINK

30305740

TITLE

Genome-wide association studies of brain imaging phenotypes in UK Biobank.

Main citation

Elliott LT, Sharp K, Alfaro-Almagro F, Shi S, ...&, Smith SM. (2018) Genome-wide association studies of brain imaging phenotypes in UK Biobank. Nature, 562 (7726) 210-216. doi:10.1038/s41586-018-0571-7. PMID 30305740

ABSTRACT

The genetic architecture of brain structure and function is largely unknown. To investigate this, we carried out genome-wide association studies of 3,144 functional and structural brain imaging phenotypes from UK Biobank (discovery dataset 8,428 subjects). Here we show that many of these phenotypes are heritable. We identify 148 clusters of associations between single nucleotide polymorphisms and imaging phenotypes that replicate at P < 0.05, when we would expect 21 to replicate by chance. Notable significant, interpretable associations include: iron transport and storage genes, related to magnetic susceptibility of subcortical brain tissue; extracellular matrix and epidermal growth factor genes, associated with white matter micro-structure and lesions; genes that regulate mid-line axon development, associated with organization of the pontine crossing tract; and overall 17 genes involved in development, pathway signalling and plasticity. Our results provide insights into the genetic architecture of the brain that are relevant to neurological and psychiatric disorders, brain development and ageing.

Show full abstractShow less

DOI

10.1038/s41586-018-0571-7

MAIN ANCESTRY

EUR

Emilsson V, et al-30072576

Summary statistics

PUBMED_LINK

30072576

TITLE

Co-regulatory networks of human serum proteins link genetics to disease.

Main citation

Emilsson V, Ilkov M, Lamb JR, Finkel N, ...&, Gudnason V. (2018) Co-regulatory networks of human serum proteins link genetics to disease. Science, 361 (6404) 769-773. doi:10.1126/science.aaq1327. PMID 30072576

ABSTRACT

Proteins circulating in the blood are critical for age-related disease processes; however, the serum proteome has remained largely unexplored. To this end, 4137 proteins covering most predicted extracellular proteins were measured in the serum of 5457 Icelanders over 65 years of age. Pairwise correlation between proteins as they varied across individuals revealed 27 different network modules of serum proteins, many of which were associated with cardiovascular and metabolic disease states, as well as overall survival. The protein modules were controlled by cis- and trans-acting genetic variants, which in many cases were also associated with complex disease. This revealed co-regulated groups of circulating proteins that incorporated regulatory control between tissues and demonstrated close relationships to past, current, and future disease states.

Show full abstractShow less

DOI

10.1126/science.aaq1327

Enroth S, et al-25147954

Summary statistics

PUBMED_LINK

25147954

TITLE

Strong effects of genetic and lifestyle factors on biomarker variation and use of personalized cutoffs.

Main citation

Enroth S, Johansson A, Enroth SB, Gyllensten U. (2014) Strong effects of genetic and lifestyle factors on biomarker variation and use of personalized cutoffs. Nat Commun, 5 () 4684. doi:10.1038/ncomms5684. PMID 25147954

ABSTRACT

Ideal biomarkers used for disease diagnosis should display deviating levels in affected individuals only and be robust to factors unrelated to the disease. Here we show the impact of genetic, clinical and lifestyle factors on circulating levels of 92 protein biomarkers for cancer and inflammation, using a population-based cohort of 1,005 individuals. For 75% of the biomarkers, the levels are significantly heritable and genome-wide association studies identifies 16 novel loci and replicate 2 previously known loci with strong effects on one or several of the biomarkers with P-values down to 4.4 × 10(-58). Integrative analysis attributes as much as 56.3% of the observed variance to non-disease factors. We propose that information on the biomarker-specific profile of major genetic, clinical and lifestyle factors should be used to establish personalized clinical cutoffs, and that this would increase the sensitivity of using biomarkers for prediction of clinical end points.

Show full abstractShow less

DOI

10.1038/ncomms5684

eQTLGen Phase I

Summary statistics

PUBMED_LINK

34475573

URL

https://www.eqtlgen.org/

TITLE

Large-scale cis- and trans-eQTL analyses identify thousands of genetic loci and polygenic scores that regulate blood gene expression.

Main citation

Võsa U, Claringbould A, Westra HJ, Bonder MJ, ...&, Franke L. (2021) Large-scale cis- and trans-eQTL analyses identify thousands of genetic loci and polygenic scores that regulate blood gene expression. Nat Genet, 53 (9) 1300-1310. doi:10.1038/s41588-021-00913-z. PMID 34475573

ABSTRACT

Trait-associated genetic variants affect complex phenotypes primarily via regulatory mechanisms on the transcriptome. To investigate the genetics of gene expression, we performed cis- and trans-expression quantitative trait locus (eQTL) analyses using blood-derived expression from 31,684 individuals through the eQTLGen Consortium. We detected cis-eQTL for 88% of genes, and these were replicable in numerous tissues. Distal trans-eQTL (detected for 37% of 10,317 trait-associated variants tested) showed lower replication rates, partially due to low replication power and confounding by cell type composition. However, replication analyses in single-cell RNA-seq data prioritized intracellular trans-eQTL. Trans-eQTL exerted their effects via several mechanisms, primarily through regulation by transcription factors. Expression of 13% of the genes correlated with polygenic scores for 1,263 phenotypes, pinpointing potential drivers for those traits. In summary, this work represents a large eQTL resource, and its results serve as a starting point for in-depth interpretation of complex phenotypes.

Show full abstractShow less

DOI

10.1038/s41588-021-00913-z

eQTLGen Phase II

Summary statistics

DESCRIPTION

Expanded blood eQTL meta-analysis and genome-wide summary statistics across cohorts; consortium coordination, cookbook, and downloads via the Phase II portal.

Show full descriptionShow less

URL

https://www.eqtlgen.org/

TITLE

eQTLGen Phase II (blood eQTL consortium resource).

Main citation

eQTLGen Consortium. eQTLGen Phase II (blood eQTL consortium resource).

Ferkingstad E, et al-34857953

Summary statistics

PUBMED_LINK

34857953

TITLE

Large-scale integration of the plasma proteome with genetics and disease.

Main citation

Ferkingstad E, Sulem P, Atlason BA, Sveinbjornsson G, ...&, Stefansson K. (2021) Large-scale integration of the plasma proteome with genetics and disease. Nat Genet, 53 (12) 1712-1721. doi:10.1038/s41588-021-00978-w. PMID 34857953

ABSTRACT

The plasma proteome can help bridge the gap between the genome and diseases. Here we describe genome-wide association studies (GWASs) of plasma protein levels measured with 4,907 aptamers in 35,559 Icelanders. We found 18,084 associations between sequence variants and levels of proteins in plasma (protein quantitative trait loci; pQTL), of which 19% were with rare variants (minor allele frequency (MAF) < 1%). We tested plasma protein levels for association with 373 diseases and other traits and identified 257,490 associations. We integrated pQTL and genetic associations with diseases and other traits and found that 12% of 45,334 lead associations in the GWAS Catalog are with variants in high linkage disequilibrium with pQTL. We identified 938 genes encoding potential drug targets with variants that influence levels of possible biomarkers. Combining proteomics, genomics and transcriptomics, we provide a valuable resource that can be used to improve understanding of disease pathogenesis and to assist with drug discovery and development.

Show full abstractShow less

DOI

10.1038/s41588-021-00978-w

FinMetSeq

Summary statistics

DESCRIPTION

Finnish metabolic sequencing cohort GWAS results (FinMetSeq) on the Michigan PheWeb.

Show full descriptionShow less

URL

MAIN ANCESTRY

EUR

FinnGen Kanta 1st Lab values (October 14 2025 )

Summary statistics

DESCRIPTION

FinnGen GWAS of laboratory measurements from Finnish register data (first public release, Oct 2025).

Show full descriptionShow less

URL

https://labvalues.finngen.fi/

RELATED_BIOBANK

MAIN ANCESTRY

EUR

FinnGen R10 (December 18 2023)

Summary statistics

PUBMED_LINK

DESCRIPTION

FinnGen data freeze R10 (18 Dec 2023) GWAS summary statistics; flagship FinnGen resource described in Kurki et al., Nature 2023.

Show full descriptionShow less

URL

https://r10.finngen.fi/

TITLE

FinnGen provides genetic insights from a well-phenotyped isolated population.

Main citation

Kurki MI, Karjalainen J, Palta P, Sipilä TP, ...&, Palotie A. (2023) FinnGen provides genetic insights from a well-phenotyped isolated population. Nature, 613 (7944) 508-518. doi:10.1038/s41586-022-05473-8. PMID 36653562

ABSTRACT

Population isolates such as those in Finland benefit genetic research because deleterious alleles are often concentrated on a small number of low-frequency variants (0.1% ≤ minor allele frequency < 5%). These variants survived the founding bottleneck rather than being distributed over a large number of ultrarare variants. Although this effect is well established in Mendelian genetics, its value in common disease genetics is less explored1,2. FinnGen aims to study the genome and national health register data of 500,000 Finnish individuals. Given the relatively high median age of participants (63 years) and the substantial fraction of hospital-based recruitment, FinnGen is enriched for disease end points. Here we analyse data from 224,737 participants from FinnGen and study 15 diseases that have previously been investigated in large genome-wide association studies (GWASs). We also include meta-analyses of biobank data from Estonia and the United Kingdom. We identified 30 new associations, primarily low-frequency variants, enriched in the Finnish population. A GWAS of 1,932 diseases also identified 2,733 genome-wide significant associations (893 phenome-wide significant (PWS), P < 2.6 × 10-11) at 2,496 (771 PWS) independent loci with 807 (247 PWS) end points. Among these, fine-mapping implicated 148 (73 PWS) coding variants associated with 83 (42 PWS) end points. Moreover, 91 (47 PWS) had an allele frequency of <5% in non-Finnish European individuals, of which 62 (32 PWS) were enriched by more than twofold in Finland. These findings demonstrate the power of bottlenecked populations to find entry points into the biology of common diseases through low-frequency, high impact variants.

Show full abstractShow less

DOI

10.1038/s41586-022-05473-8

RELATED_BIOBANK

MAIN ANCESTRY

EUR

FinnGen R10-UKBB meta-analysis

Summary statistics

PUBMED_LINK

https://public-metaresults-fg-ukbb.finngen.fi

DESCRIPTION

Meta-analysis of FinnGen R10 with UK Biobank GWAS summary statistics (FinnGen distribution).

Show full descriptionShow less

URL

TITLE

FinnGen provides genetic insights from a well-phenotyped isolated population.

Main citation

Kurki MI, Karjalainen J, Palta P, Sipilä TP, ...&, Palotie A. (2023) FinnGen provides genetic insights from a well-phenotyped isolated population. Nature, 613 (7944) 508-518. doi:10.1038/s41586-022-05473-8. PMID 36653562

ABSTRACT

Population isolates such as those in Finland benefit genetic research because deleterious alleles are often concentrated on a small number of low-frequency variants (0.1% ≤ minor allele frequency < 5%). These variants survived the founding bottleneck rather than being distributed over a large number of ultrarare variants. Although this effect is well established in Mendelian genetics, its value in common disease genetics is less explored1,2. FinnGen aims to study the genome and national health register data of 500,000 Finnish individuals. Given the relatively high median age of participants (63 years) and the substantial fraction of hospital-based recruitment, FinnGen is enriched for disease end points. Here we analyse data from 224,737 participants from FinnGen and study 15 diseases that have previously been investigated in large genome-wide association studies (GWASs). We also include meta-analyses of biobank data from Estonia and the United Kingdom. We identified 30 new associations, primarily low-frequency variants, enriched in the Finnish population. A GWAS of 1,932 diseases also identified 2,733 genome-wide significant associations (893 phenome-wide significant (PWS), P < 2.6 × 10-11) at 2,496 (771 PWS) independent loci with 807 (247 PWS) end points. Among these, fine-mapping implicated 148 (73 PWS) coding variants associated with 83 (42 PWS) end points. Moreover, 91 (47 PWS) had an allele frequency of <5% in non-Finnish European individuals, of which 62 (32 PWS) were enriched by more than twofold in Finland. These findings demonstrate the power of bottlenecked populations to find entry points into the biology of common diseases through low-frequency, high impact variants.

Show full abstractShow less

DOI

10.1038/s41586-022-05473-8

RELATED_BIOBANK

UK Biobank ,FinnGen

MAIN ANCESTRY

EUR

FinnGen R11 (June 24 2024)

Summary statistics

PUBMED_LINK

DESCRIPTION

FinnGen data freeze R11 (24 Jun 2024) GWAS summary statistics; resource overview in Kurki et al., Nature 2023.

Show full descriptionShow less

URL

https://r11.finngen.fi/

TITLE

FinnGen provides genetic insights from a well-phenotyped isolated population.

Main citation

Kurki MI, Karjalainen J, Palta P, Sipilä TP, ...&, Palotie A. (2023) FinnGen provides genetic insights from a well-phenotyped isolated population. Nature, 613 (7944) 508-518. doi:10.1038/s41586-022-05473-8. PMID 36653562

ABSTRACT

Population isolates such as those in Finland benefit genetic research because deleterious alleles are often concentrated on a small number of low-frequency variants (0.1% ≤ minor allele frequency < 5%). These variants survived the founding bottleneck rather than being distributed over a large number of ultrarare variants. Although this effect is well established in Mendelian genetics, its value in common disease genetics is less explored1,2. FinnGen aims to study the genome and national health register data of 500,000 Finnish individuals. Given the relatively high median age of participants (63 years) and the substantial fraction of hospital-based recruitment, FinnGen is enriched for disease end points. Here we analyse data from 224,737 participants from FinnGen and study 15 diseases that have previously been investigated in large genome-wide association studies (GWASs). We also include meta-analyses of biobank data from Estonia and the United Kingdom. We identified 30 new associations, primarily low-frequency variants, enriched in the Finnish population. A GWAS of 1,932 diseases also identified 2,733 genome-wide significant associations (893 phenome-wide significant (PWS), P < 2.6 × 10-11) at 2,496 (771 PWS) independent loci with 807 (247 PWS) end points. Among these, fine-mapping implicated 148 (73 PWS) coding variants associated with 83 (42 PWS) end points. Moreover, 91 (47 PWS) had an allele frequency of <5% in non-Finnish European individuals, of which 62 (32 PWS) were enriched by more than twofold in Finland. These findings demonstrate the power of bottlenecked populations to find entry points into the biology of common diseases through low-frequency, high impact variants.

Show full abstractShow less

DOI

10.1038/s41586-022-05473-8

RELATED_BIOBANK

MAIN ANCESTRY

EUR

FinnGen R12 (November 4 2024)

Summary statistics

PUBMED_LINK

DESCRIPTION

FinnGen data freeze R12 (4 Nov 2024) GWAS summary statistics; resource overview in Kurki et al., Nature 2023.

Show full descriptionShow less

URL

https://r12.finngen.fi/

TITLE

FinnGen provides genetic insights from a well-phenotyped isolated population.

Main citation

Kurki MI, Karjalainen J, Palta P, Sipilä TP, ...&, Palotie A. (2023) FinnGen provides genetic insights from a well-phenotyped isolated population. Nature, 613 (7944) 508-518. doi:10.1038/s41586-022-05473-8. PMID 36653562

ABSTRACT

Population isolates such as those in Finland benefit genetic research because deleterious alleles are often concentrated on a small number of low-frequency variants (0.1% ≤ minor allele frequency < 5%). These variants survived the founding bottleneck rather than being distributed over a large number of ultrarare variants. Although this effect is well established in Mendelian genetics, its value in common disease genetics is less explored1,2. FinnGen aims to study the genome and national health register data of 500,000 Finnish individuals. Given the relatively high median age of participants (63 years) and the substantial fraction of hospital-based recruitment, FinnGen is enriched for disease end points. Here we analyse data from 224,737 participants from FinnGen and study 15 diseases that have previously been investigated in large genome-wide association studies (GWASs). We also include meta-analyses of biobank data from Estonia and the United Kingdom. We identified 30 new associations, primarily low-frequency variants, enriched in the Finnish population. A GWAS of 1,932 diseases also identified 2,733 genome-wide significant associations (893 phenome-wide significant (PWS), P < 2.6 × 10-11) at 2,496 (771 PWS) independent loci with 807 (247 PWS) end points. Among these, fine-mapping implicated 148 (73 PWS) coding variants associated with 83 (42 PWS) end points. Moreover, 91 (47 PWS) had an allele frequency of <5% in non-Finnish European individuals, of which 62 (32 PWS) were enriched by more than twofold in Finland. These findings demonstrate the power of bottlenecked populations to find entry points into the biology of common diseases through low-frequency, high impact variants.

Show full abstractShow less

DOI

10.1038/s41586-022-05473-8

RELATED_BIOBANK

MAIN ANCESTRY

EUR

FinnGen R12-UKBB meta-analysis

Summary statistics

PUBMED_LINK

https://metaresults-ukbb.finngen.fi/

DESCRIPTION

Meta-analysis of FinnGen R12 with UK Biobank GWAS summary statistics (FinnGen distribution).

Show full descriptionShow less

URL

TITLE

FinnGen provides genetic insights from a well-phenotyped isolated population.

Main citation

Kurki MI, Karjalainen J, Palta P, Sipilä TP, ...&, Palotie A. (2023) FinnGen provides genetic insights from a well-phenotyped isolated population. Nature, 613 (7944) 508-518. doi:10.1038/s41586-022-05473-8. PMID 36653562

ABSTRACT

Population isolates such as those in Finland benefit genetic research because deleterious alleles are often concentrated on a small number of low-frequency variants (0.1% ≤ minor allele frequency < 5%). These variants survived the founding bottleneck rather than being distributed over a large number of ultrarare variants. Although this effect is well established in Mendelian genetics, its value in common disease genetics is less explored1,2. FinnGen aims to study the genome and national health register data of 500,000 Finnish individuals. Given the relatively high median age of participants (63 years) and the substantial fraction of hospital-based recruitment, FinnGen is enriched for disease end points. Here we analyse data from 224,737 participants from FinnGen and study 15 diseases that have previously been investigated in large genome-wide association studies (GWASs). We also include meta-analyses of biobank data from Estonia and the United Kingdom. We identified 30 new associations, primarily low-frequency variants, enriched in the Finnish population. A GWAS of 1,932 diseases also identified 2,733 genome-wide significant associations (893 phenome-wide significant (PWS), P < 2.6 × 10-11) at 2,496 (771 PWS) independent loci with 807 (247 PWS) end points. Among these, fine-mapping implicated 148 (73 PWS) coding variants associated with 83 (42 PWS) end points. Moreover, 91 (47 PWS) had an allele frequency of <5% in non-Finnish European individuals, of which 62 (32 PWS) were enriched by more than twofold in Finland. These findings demonstrate the power of bottlenecked populations to find entry points into the biology of common diseases through low-frequency, high impact variants.

Show full abstractShow less

DOI

10.1038/s41586-022-05473-8

RELATED_BIOBANK

UK Biobank ,FinnGen

MAIN ANCESTRY

EUR

FinnGen R4 (November 30 2020)

Summary statistics

PUBMED_LINK

https://r4.finngen.fi/about

DESCRIPTION

FinnGen data freeze R4 (30 Nov 2020) GWAS summary statistics; resource overview in Kurki et al., Nature 2023.

Show full descriptionShow less

URL

TITLE

FinnGen provides genetic insights from a well-phenotyped isolated population.

Main citation

Kurki MI, Karjalainen J, Palta P, Sipilä TP, ...&, Palotie A. (2023) FinnGen provides genetic insights from a well-phenotyped isolated population. Nature, 613 (7944) 508-518. doi:10.1038/s41586-022-05473-8. PMID 36653562

ABSTRACT

Population isolates such as those in Finland benefit genetic research because deleterious alleles are often concentrated on a small number of low-frequency variants (0.1% ≤ minor allele frequency < 5%). These variants survived the founding bottleneck rather than being distributed over a large number of ultrarare variants. Although this effect is well established in Mendelian genetics, its value in common disease genetics is less explored1,2. FinnGen aims to study the genome and national health register data of 500,000 Finnish individuals. Given the relatively high median age of participants (63 years) and the substantial fraction of hospital-based recruitment, FinnGen is enriched for disease end points. Here we analyse data from 224,737 participants from FinnGen and study 15 diseases that have previously been investigated in large genome-wide association studies (GWASs). We also include meta-analyses of biobank data from Estonia and the United Kingdom. We identified 30 new associations, primarily low-frequency variants, enriched in the Finnish population. A GWAS of 1,932 diseases also identified 2,733 genome-wide significant associations (893 phenome-wide significant (PWS), P < 2.6 × 10-11) at 2,496 (771 PWS) independent loci with 807 (247 PWS) end points. Among these, fine-mapping implicated 148 (73 PWS) coding variants associated with 83 (42 PWS) end points. Moreover, 91 (47 PWS) had an allele frequency of <5% in non-Finnish European individuals, of which 62 (32 PWS) were enriched by more than twofold in Finland. These findings demonstrate the power of bottlenecked populations to find entry points into the biology of common diseases through low-frequency, high impact variants.

Show full abstractShow less

DOI

10.1038/s41586-022-05473-8

RELATED_BIOBANK

MAIN ANCESTRY

EUR

FinnGen R5 (May 11 2021)

Summary statistics

PUBMED_LINK

https://r5.finngen.fi/about

DESCRIPTION

FinnGen data freeze R5 (11 May 2021) GWAS summary statistics; resource overview in Kurki et al., Nature 2023.

Show full descriptionShow less

URL

TITLE

FinnGen provides genetic insights from a well-phenotyped isolated population.

Main citation

Kurki MI, Karjalainen J, Palta P, Sipilä TP, ...&, Palotie A. (2023) FinnGen provides genetic insights from a well-phenotyped isolated population. Nature, 613 (7944) 508-518. doi:10.1038/s41586-022-05473-8. PMID 36653562

ABSTRACT

Population isolates such as those in Finland benefit genetic research because deleterious alleles are often concentrated on a small number of low-frequency variants (0.1% ≤ minor allele frequency < 5%). These variants survived the founding bottleneck rather than being distributed over a large number of ultrarare variants. Although this effect is well established in Mendelian genetics, its value in common disease genetics is less explored1,2. FinnGen aims to study the genome and national health register data of 500,000 Finnish individuals. Given the relatively high median age of participants (63 years) and the substantial fraction of hospital-based recruitment, FinnGen is enriched for disease end points. Here we analyse data from 224,737 participants from FinnGen and study 15 diseases that have previously been investigated in large genome-wide association studies (GWASs). We also include meta-analyses of biobank data from Estonia and the United Kingdom. We identified 30 new associations, primarily low-frequency variants, enriched in the Finnish population. A GWAS of 1,932 diseases also identified 2,733 genome-wide significant associations (893 phenome-wide significant (PWS), P < 2.6 × 10-11) at 2,496 (771 PWS) independent loci with 807 (247 PWS) end points. Among these, fine-mapping implicated 148 (73 PWS) coding variants associated with 83 (42 PWS) end points. Moreover, 91 (47 PWS) had an allele frequency of <5% in non-Finnish European individuals, of which 62 (32 PWS) were enriched by more than twofold in Finland. These findings demonstrate the power of bottlenecked populations to find entry points into the biology of common diseases through low-frequency, high impact variants.

Show full abstractShow less

DOI

10.1038/s41586-022-05473-8

RELATED_BIOBANK

MAIN ANCESTRY

EUR

FinnGen R6 (January 24 2022)

Summary statistics

PUBMED_LINK

https://r6.finngen.fi/about

DESCRIPTION

FinnGen data freeze R6 (24 Jan 2022) GWAS summary statistics; resource overview in Kurki et al., Nature 2023.

Show full descriptionShow less

URL

TITLE

FinnGen provides genetic insights from a well-phenotyped isolated population.

Main citation

Kurki MI, Karjalainen J, Palta P, Sipilä TP, ...&, Palotie A. (2023) FinnGen provides genetic insights from a well-phenotyped isolated population. Nature, 613 (7944) 508-518. doi:10.1038/s41586-022-05473-8. PMID 36653562

ABSTRACT

Population isolates such as those in Finland benefit genetic research because deleterious alleles are often concentrated on a small number of low-frequency variants (0.1% ≤ minor allele frequency < 5%). These variants survived the founding bottleneck rather than being distributed over a large number of ultrarare variants. Although this effect is well established in Mendelian genetics, its value in common disease genetics is less explored1,2. FinnGen aims to study the genome and national health register data of 500,000 Finnish individuals. Given the relatively high median age of participants (63 years) and the substantial fraction of hospital-based recruitment, FinnGen is enriched for disease end points. Here we analyse data from 224,737 participants from FinnGen and study 15 diseases that have previously been investigated in large genome-wide association studies (GWASs). We also include meta-analyses of biobank data from Estonia and the United Kingdom. We identified 30 new associations, primarily low-frequency variants, enriched in the Finnish population. A GWAS of 1,932 diseases also identified 2,733 genome-wide significant associations (893 phenome-wide significant (PWS), P < 2.6 × 10-11) at 2,496 (771 PWS) independent loci with 807 (247 PWS) end points. Among these, fine-mapping implicated 148 (73 PWS) coding variants associated with 83 (42 PWS) end points. Moreover, 91 (47 PWS) had an allele frequency of <5% in non-Finnish European individuals, of which 62 (32 PWS) were enriched by more than twofold in Finland. These findings demonstrate the power of bottlenecked populations to find entry points into the biology of common diseases through low-frequency, high impact variants.

Show full abstractShow less

DOI

10.1038/s41586-022-05473-8

RELATED_BIOBANK

MAIN ANCESTRY

EUR

FinnGen R7 (June 1 2022)

Summary statistics

PUBMED_LINK

https://r7.finngen.fi/about

DESCRIPTION

FinnGen data freeze R7 (1 Jun 2022) GWAS summary statistics; resource overview in Kurki et al., Nature 2023.

Show full descriptionShow less

URL

TITLE

FinnGen provides genetic insights from a well-phenotyped isolated population.

Main citation

Kurki MI, Karjalainen J, Palta P, Sipilä TP, ...&, Palotie A. (2023) FinnGen provides genetic insights from a well-phenotyped isolated population. Nature, 613 (7944) 508-518. doi:10.1038/s41586-022-05473-8. PMID 36653562

ABSTRACT

Population isolates such as those in Finland benefit genetic research because deleterious alleles are often concentrated on a small number of low-frequency variants (0.1% ≤ minor allele frequency < 5%). These variants survived the founding bottleneck rather than being distributed over a large number of ultrarare variants. Although this effect is well established in Mendelian genetics, its value in common disease genetics is less explored1,2. FinnGen aims to study the genome and national health register data of 500,000 Finnish individuals. Given the relatively high median age of participants (63 years) and the substantial fraction of hospital-based recruitment, FinnGen is enriched for disease end points. Here we analyse data from 224,737 participants from FinnGen and study 15 diseases that have previously been investigated in large genome-wide association studies (GWASs). We also include meta-analyses of biobank data from Estonia and the United Kingdom. We identified 30 new associations, primarily low-frequency variants, enriched in the Finnish population. A GWAS of 1,932 diseases also identified 2,733 genome-wide significant associations (893 phenome-wide significant (PWS), P < 2.6 × 10-11) at 2,496 (771 PWS) independent loci with 807 (247 PWS) end points. Among these, fine-mapping implicated 148 (73 PWS) coding variants associated with 83 (42 PWS) end points. Moreover, 91 (47 PWS) had an allele frequency of <5% in non-Finnish European individuals, of which 62 (32 PWS) were enriched by more than twofold in Finland. These findings demonstrate the power of bottlenecked populations to find entry points into the biology of common diseases through low-frequency, high impact variants.

Show full abstractShow less

DOI

10.1038/s41586-022-05473-8

RELATED_BIOBANK

MAIN ANCESTRY

EUR

FinnGen R8 (Dec 1 2022)

Summary statistics

PUBMED_LINK

https://r8.finngen.fi/about

DESCRIPTION

FinnGen data freeze R8 (1 Dec 2022) GWAS summary statistics; resource overview in Kurki et al., Nature 2023.

Show full descriptionShow less

URL

TITLE

FinnGen provides genetic insights from a well-phenotyped isolated population.

Main citation

Kurki MI, Karjalainen J, Palta P, Sipilä TP, ...&, Palotie A. (2023) FinnGen provides genetic insights from a well-phenotyped isolated population. Nature, 613 (7944) 508-518. doi:10.1038/s41586-022-05473-8. PMID 36653562

ABSTRACT

Population isolates such as those in Finland benefit genetic research because deleterious alleles are often concentrated on a small number of low-frequency variants (0.1% ≤ minor allele frequency < 5%). These variants survived the founding bottleneck rather than being distributed over a large number of ultrarare variants. Although this effect is well established in Mendelian genetics, its value in common disease genetics is less explored1,2. FinnGen aims to study the genome and national health register data of 500,000 Finnish individuals. Given the relatively high median age of participants (63 years) and the substantial fraction of hospital-based recruitment, FinnGen is enriched for disease end points. Here we analyse data from 224,737 participants from FinnGen and study 15 diseases that have previously been investigated in large genome-wide association studies (GWASs). We also include meta-analyses of biobank data from Estonia and the United Kingdom. We identified 30 new associations, primarily low-frequency variants, enriched in the Finnish population. A GWAS of 1,932 diseases also identified 2,733 genome-wide significant associations (893 phenome-wide significant (PWS), P < 2.6 × 10-11) at 2,496 (771 PWS) independent loci with 807 (247 PWS) end points. Among these, fine-mapping implicated 148 (73 PWS) coding variants associated with 83 (42 PWS) end points. Moreover, 91 (47 PWS) had an allele frequency of <5% in non-Finnish European individuals, of which 62 (32 PWS) were enriched by more than twofold in Finland. These findings demonstrate the power of bottlenecked populations to find entry points into the biology of common diseases through low-frequency, high impact variants.

Show full abstractShow less

DOI

10.1038/s41586-022-05473-8

RELATED_BIOBANK

MAIN ANCESTRY

EUR

FinnGen R9 (May 11 2023)

Summary statistics

PUBMED_LINK

https://r9.finngen.fi/about

DESCRIPTION

FinnGen data freeze R9 (11 May 2023) GWAS summary statistics; resource overview in Kurki et al., Nature 2023.

Show full descriptionShow less

URL

TITLE

FinnGen provides genetic insights from a well-phenotyped isolated population.

Main citation

Kurki MI, Karjalainen J, Palta P, Sipilä TP, ...&, Palotie A. (2023) FinnGen provides genetic insights from a well-phenotyped isolated population. Nature, 613 (7944) 508-518. doi:10.1038/s41586-022-05473-8. PMID 36653562

ABSTRACT

Population isolates such as those in Finland benefit genetic research because deleterious alleles are often concentrated on a small number of low-frequency variants (0.1% ≤ minor allele frequency < 5%). These variants survived the founding bottleneck rather than being distributed over a large number of ultrarare variants. Although this effect is well established in Mendelian genetics, its value in common disease genetics is less explored1,2. FinnGen aims to study the genome and national health register data of 500,000 Finnish individuals. Given the relatively high median age of participants (63 years) and the substantial fraction of hospital-based recruitment, FinnGen is enriched for disease end points. Here we analyse data from 224,737 participants from FinnGen and study 15 diseases that have previously been investigated in large genome-wide association studies (GWASs). We also include meta-analyses of biobank data from Estonia and the United Kingdom. We identified 30 new associations, primarily low-frequency variants, enriched in the Finnish population. A GWAS of 1,932 diseases also identified 2,733 genome-wide significant associations (893 phenome-wide significant (PWS), P < 2.6 × 10-11) at 2,496 (771 PWS) independent loci with 807 (247 PWS) end points. Among these, fine-mapping implicated 148 (73 PWS) coding variants associated with 83 (42 PWS) end points. Moreover, 91 (47 PWS) had an allele frequency of <5% in non-Finnish European individuals, of which 62 (32 PWS) were enriched by more than twofold in Finland. These findings demonstrate the power of bottlenecked populations to find entry points into the biology of common diseases through low-frequency, high impact variants.

Show full abstractShow less

DOI

10.1038/s41586-022-05473-8

RELATED_BIOBANK

https://datashare.ed.ac.uk/handle/10283/844

MAIN ANCESTRY

EUR

Folkersen L, et al-28369058

Summary statistics

PUBMED_LINK

28369058

TITLE

Mapping of 79 loci for 83 plasma protein biomarkers in cardiovascular disease.

Main citation

Folkersen L, Fauman E, Sabater-Lleal M, Strawbridge RJ, ...&, Mälarstig A. (2017) Mapping of 79 loci for 83 plasma protein biomarkers in cardiovascular disease. PLoS Genet, 13 (4) e1006706. doi:10.1371/journal.pgen.1006706. PMID 28369058

ABSTRACT

Recent advances in highly multiplexed immunoassays have allowed systematic large-scale measurement of hundreds of plasma proteins in large cohort studies. In combination with genotyping, such studies offer the prospect to 1) identify mechanisms involved with regulation of protein expression in plasma, and 2) determine whether the plasma proteins are likely to be causally implicated in disease. We report here the results of genome-wide association (GWA) studies of 83 proteins considered relevant to cardiovascular disease (CVD), measured in 3,394 individuals with multiple CVD risk factors. We identified 79 genome-wide significant (p<5e-8) association signals, 55 of which replicated at P<0.0007 in separate validation studies (n = 2,639 individuals). Using automated text mining, manual curation, and network-based methods incorporating information on expression quantitative trait loci (eQTL), we propose plausible causal mechanisms for 25 trans-acting loci, including a potential post-translational regulation of stem cell factor by matrix metalloproteinase 9 and receptor-ligand pairs such as RANK-RANK ligand. Using public GWA study data, we further evaluate all 79 loci for their causal effect on coronary artery disease, and highlight several potentially causal associations. Overall, a majority of the plasma proteins studied showed evidence of regulation at the genetic level. Our results enable future studies of the causal architecture of human disease, which in turn should aid discovery of new drug targets.

Show full abstractShow less

DOI

10.1371/journal.pgen.1006706

Folkersen L, et al-33067605

Summary statistics

PUBMED_LINK

33067605

TITLE

Genomic and drug target evaluation of 90 cardiovascular proteins in 30,931 individuals.

Main citation

Folkersen L, Gustafsson S, Wang Q, Hansen DH, ...&, Mälarstig A. (2020) Genomic and drug target evaluation of 90 cardiovascular proteins in 30,931 individuals. Nat Metab, 2 (10) 1135-1148. doi:10.1038/s42255-020-00287-2. PMID 33067605

ABSTRACT

Circulating proteins are vital in human health and disease and are frequently used as biomarkers for clinical decision-making or as targets for pharmacological intervention. Here, we map and replicate protein quantitative trait loci (pQTL) for 90 cardiovascular proteins in over 30,000 individuals, resulting in 451 pQTLs for 85 proteins. For each protein, we further perform pathway mapping to obtain trans-pQTL gene and regulatory designations. We substantiate these regulatory findings with orthogonal evidence for trans-pQTLs using mouse knockdown experiments (ABCA1 and TRIB1) and clinical trial results (chemokine receptors CCR2 and CCR5), with consistent regulation. Finally, we evaluate known drug targets, and suggest new target candidates or repositioning opportunities using Mendelian randomization. This identifies 11 proteins with causal evidence of involvement in human disease that have not previously been targeted, including EGF, IL-16, PAPPA, SPON1, F3, ADM, CASP-8, CHI3L1, CXCL16, GDF15 and MMP-12. Taken together, these findings demonstrate the utility of large-scale mapping of the genetics of the proteome and provide a resource for future precision studies of circulating proteins in human health.

Show full abstractShow less

DOI

10.1038/s42255-020-00287-2

Fu

Summary statistics

PUBMED_LINK

41386230

TITLE

Single-cell eQTL mapping reveals cell-type-specific genetic regulation in lung cancer.

Main citation

Fu Y, Wang Y, Jin C, Zhang C, ...&, Ma H. (2026) Single-cell eQTL mapping reveals cell-type-specific genetic regulation in lung cancer. Cell Genom, 6 (3) 101100. doi:10.1016/j.xgen.2025.101100. PMID 41386230

ABSTRACT

Genome-wide association studies (GWASs) have identified over 50 lung cancer risk loci; however, the precise cellular context of these genetic mechanisms remains unclear due to limitations in bulk tissue expression quantitative trait locus (eQTL) analyses. Here, we present the largest single-cell eQTL (sc-eQTL) atlas of human lung tissue to date, profiling 222 donors using multiplexed single-cell RNA sequencing (scRNA-seq). We identified 4,341 independent eQTLs across 17 cell types, with over 60% of sc-eQTLs and 51% of eGenes being cell-type specific, and fewer than 52% were detectable in paired bulk datasets. Integration with GWASs for non-small cell lung cancer highlighted epithelial and immune cells as key contributors to genetic susceptibility, identifying 28 candidate genes within known risk loci and 24 in novel regions. Notably, 47% of established non-small cell lung cancer (NSCLC) susceptibility loci exhibited cell-type-specific pleiotropic genetic regulation. This study provides a valuable resource of lung sc-eQTLs and illuminates how genetic variation modulates gene expression in a cell-type-specific fashion, contributing to lung cancer susceptibility.

Show full abstractShow less

DOI

10.1016/j.xgen.2025.101100

Fu J-38811844

Summary statistics

PUBMED_LINK

38811844

TITLE

Cross-ancestry genome-wide association studies of brain imaging phenotypes.

Main citation

Fu J, Zhang Q, Wang J, Wang M, ...&, CHIMGEN Consortium. (2024) Cross-ancestry genome-wide association studies of brain imaging phenotypes. Nat Genet, 56 (6) 1110-1120. doi:10.1038/s41588-024-01766-y. PMID 38811844

ABSTRACT

Genome-wide association studies of brain imaging phenotypes are mainly performed in European populations, but other populations are severely under-represented. Here, we conducted Chinese-alone and cross-ancestry genome-wide association studies of 3,414 brain imaging phenotypes in 7,058 Chinese Han and 33,224 white British participants. We identified 38 new associations in Chinese-alone analyses and 486 additional new associations in cross-ancestry meta-analyses at P < 1.46 × 10-11 for discovery and P < 0.05 for replication. We pooled significant autosomal associations identified by single- or cross-ancestry analyses into 6,443 independent associations, which showed uneven distribution in the genome and the phenotype subgroups. We further divided them into 44 associations with different effect sizes and 3,557 associations with similar effect sizes between ancestries. Loci of these associations were shared with 15 brain-related non-imaging traits including cognition and neuropsychiatric disorders. Our results provide a valuable catalog of genetic associations for brain imaging phenotypes in more diverse populations.

Show full abstractShow less

DOI

10.1038/s41588-024-01766-y

MAIN ANCESTRY

EAS,EUR

Generation Scotland

Summary statistics

DESCRIPTION

Generation Scotland cohort GWAS summary statistics and related downloads.

Show full descriptionShow less

URL

MAIN ANCESTRY

EUR

GENOA

Summary statistics

PUBMED_LINK

37169753

URL

http://mqtldb.godmc.org.uk/

TITLE

meQTL mapping in the GENOA study reveals genetic determinants of DNA methylation in African Americans.

Main citation

Shang L, Zhao W, Wang YZ, Li Z, ...&, Zhou X. (2023) meQTL mapping in the GENOA study reveals genetic determinants of DNA methylation in African Americans. Nat Commun, 14 (1) 2711. doi:10.1038/s41467-023-37961-4. PMID 37169753

ABSTRACT

Identifying genetic variants that are associated with variation in DNA methylation, an analysis commonly referred to as methylation quantitative trait locus (meQTL) mapping, is an important first step towards understanding the genetic architecture underlying epigenetic variation. Most existing meQTL mapping studies have focused on individuals of European ancestry and are underrepresented in other populations, with a particular absence of large studies in populations with African ancestry. We fill this critical knowledge gap by performing a large-scale cis-meQTL mapping study in 961 African Americans from the Genetic Epidemiology Network of Arteriopathy (GENOA) study. We identify a total of 4,565,687 cis-acting meQTLs in 320,965 meCpGs. We find that 45% of meCpGs harbor multiple independent meQTLs, suggesting potential polygenic genetic architecture underlying methylation variation. A large percentage of the cis-meQTLs also colocalize with cis-expression QTLs (eQTLs) in the same population. Importantly, the identified cis-meQTLs explain a substantial proportion (median = 24.6%) of methylation variation. In addition, the cis-meQTL associated CpG sites mediate a substantial proportion (median = 24.9%) of SNP effects underlying gene expression. Overall, our results represent an important step toward revealing the co-regulation of methylation and gene expression, facilitating the functional interpretation of epigenetic and gene regulation underlying common diseases in African Americans.

Show full abstractShow less

DOI

10.1038/s41467-023-37961-4

GIANT (Genetic Investigation of ANthropometric Traits)

Summary statistics

PUBMED_LINK

20881960

DESCRIPTION

Anthropometric trait GWAS meta-analysis summary statistics from the GIANT consortium.

Show full descriptionShow less

URL

https://portals.broadinstitute.org/collaboration/giant/index.php/Main_Page

TITLE

Hundreds of variants clustered in genomic loci and biological pathways affect human height.

Main citation

Lango Allen H, Estrada K, Lettre G, Berndt SI, ...&, Hirschhorn JN. (2010) Hundreds of variants clustered in genomic loci and biological pathways affect human height. Nature, 467 (7317) 832-8. doi:10.1038/nature09410. PMID 20881960

ABSTRACT

Most common human traits and diseases have a polygenic pattern of inheritance: DNA sequence variants at many genetic loci influence the phenotype. Genome-wide association (GWA) studies have identified more than 600 variants associated with human traits, but these typically explain small fractions of phenotypic variation, raising questions about the use of further studies. Here, using 183,727 individuals, we show that hundreds of genetic variants, in at least 180 loci, influence adult height, a highly heritable and classic polygenic trait. The large number of loci reveals patterns with important implications for genetic studies of common human diseases and traits. First, the 180 loci are not random, but instead are enriched for genes that are connected in biological pathways (P = 0.016) and that underlie skeletal growth defects (P < 0.001). Second, the likely causal gene is often located near the most strongly associated variant: in 13 of 21 loci containing a known skeletal growth gene, that gene was closest to the associated variant. Third, at least 19 loci have multiple independently associated variants, suggesting that allelic heterogeneity is a frequent feature of polygenic traits, that comprehensive explorations of already-discovered loci should discover additional variants and that an appreciable fraction of associated loci may have been identified. Fourth, associated variants are enriched for likely functional effects on genes, being over-represented among variants that alter amino-acid structure of proteins and expression levels of nearby genes. Our data explain approximately 10% of the phenotypic variation in height, and we estimate that unidentified common variants of similar effect sizes would increase this figure to approximately 16% of phenotypic variation (approximately 20% of heritable variation). Although additional approaches are needed to dissect the genetic architecture of polygenic human traits fully, our findings indicate that GWA studies can identify large numbers of loci that implicate biologically relevant genes and pathways.

Show full abstractShow less

DOI

10.1038/nature09410

MAIN ANCESTRY

Multi-ancestry

Gilly A, et al-37778719

Summary statistics

PUBMED_LINK

37778719

TITLE

Genome-wide meta-analysis of 92 cardiometabolic protein serum levels.

Main citation

Gilly A, Park YC, Tsafantakis E, Karaleftheri M, ...&, Zeggini E. (2023) Genome-wide meta-analysis of 92 cardiometabolic protein serum levels. Mol Metab, 78 () 101810. doi:10.1016/j.molmet.2023.101810. PMID 37778719

ABSTRACT

OBJECTIVES: Global cardiometabolic disease prevalence has grown rapidly over the years, making it the leading cause of death worldwide. Proteins are crucial components in biological pathways dysregulated in disease states. Identifying genetic components that influence circulating protein levels may lead to the discovery of biomarkers for early stages of disease or offer opportunities as therapeutic targets. METHODS: Here, we carry out a genome-wide association study (GWAS) utilising whole genome sequencing data in 3,005 individuals from the HELIC founder populations cohort, across 92 proteins of cardiometabolic relevance. RESULTS: We report 322 protein quantitative trait loci (pQTL) signals across 92 proteins, of which 76 are located in or near the coding gene (cis-pQTL). We link those association signals with changes in protein expression and cardiometabolic disease risk using colocalisation and Mendelian randomisation (MR) analyses. CONCLUSIONS: The majority of previously unknown signals we describe point to proteins or protein interactions involved in inflammation and immune response, providing genetic evidence for the contributing role of inflammation in cardiometabolic disease processes.

Show full abstractShow less

DOI

10.1016/j.molmet.2023.101810

MAIN ANCESTRY

EUR

GLGC (Global Lipids Genetics Consortium)

Summary statistics

PUBMED_LINK

24097068

DESCRIPTION

Blood lipid trait GWAS meta-analysis summary statistics from the GLGC.

Show full descriptionShow less

URL

http://csg.sph.umich.edu/willer/public/glgc-lipids2021/

TITLE

Discovery and refinement of loci associated with lipid levels.

Main citation

Willer CJ, Schmidt EM, Sengupta S, Peloso GM, ...&, Global Lipids Genetics Consortium. (2013) Discovery and refinement of loci associated with lipid levels. Nat Genet, 45 (11) 1274-1283. doi:10.1038/ng.2797. PMID 24097068

ABSTRACT

Levels of low-density lipoprotein (LDL) cholesterol, high-density lipoprotein (HDL) cholesterol, triglycerides and total cholesterol are heritable, modifiable risk factors for coronary artery disease. To identify new loci and refine known loci influencing these lipids, we examined 188,577 individuals using genome-wide and custom genotyping arrays. We identify and annotate 157 loci associated with lipid levels at P < 5 × 10(-8), including 62 loci not previously associated with lipid levels in humans. Using dense genotyping in individuals of European, East Asian, South Asian and African ancestry, we narrow association signals in 12 loci. We find that loci associated with blood lipid levels are often associated with cardiovascular and metabolic traits, including coronary artery disease, type 2 diabetes, blood pressure, waist-hip ratio and body mass index. Our results demonstrate the value of using genetic data from individuals of diverse ancestry and provide insights into the biological mechanisms regulating blood lipids to guide future genetic, biological and therapeutic research.

Show full abstractShow less

DOI

10.1038/ng.2797

MAIN ANCESTRY

Multi-ancestry

Global Biobank

Summary statistics

PUBMED_LINK

36777996

DESCRIPTION

Global Biobank Meta-analysis Initiative (GBMI) harmonized GWAS across many biobanks.

Show full descriptionShow less

URL

http://results.globalbiobankmeta.org/

TITLE

Global Biobank Meta-analysis Initiative: Powering genetic discovery across human disease.

Main citation

Zhou W, Kanai M, Wu KH, Rasheed H, ...&, Neale BM. (2022) Global Biobank Meta-analysis Initiative: Powering genetic discovery across human disease. Cell Genom, 2 (10) 100192. doi:10.1016/j.xgen.2022.100192. PMID 36777996

ABSTRACT

Biobanks facilitate genome-wide association studies (GWASs), which have mapped genomic loci across a range of human diseases and traits. However, most biobanks are primarily composed of individuals of European ancestry. We introduce the Global Biobank Meta-analysis Initiative (GBMI)-a collaborative network of 23 biobanks from 4 continents representing more than 2.2 million consented individuals with genetic data linked to electronic health records. GBMI meta-analyzes summary statistics from GWASs generated using harmonized genotypes and phenotypes from member biobanks for 14 exemplar diseases and endpoints. This strategy validates that GWASs conducted in diverse biobanks can be integrated despite heterogeneity in case definitions, recruitment strategies, and baseline characteristics. This collaborative effort improves GWAS power for diseases, benefits understudied diseases, and improves risk prediction while also enabling the nomination of disease genes and drug candidates by incorporating gene and protein expression data and providing insight into the underlying biology of human diseases and traits.

Show full abstractShow less

DOI

10.1016/j.xgen.2022.100192

MAIN ANCESTRY

ALL

GTEx

Summary statistics

PUBMED_LINK

32913098

DESCRIPTION

V11 GTEx V11 updates the GTEx V10 data to use GENCODE 47 annotation. It contains no new samples or donors compared to V10.

Show full descriptionShow less

URL

https://gtexportal.org/home/

TITLE

The GTEx Consortium atlas of genetic regulatory effects across human tissues.

Main citation

GTEx Consortium. (2020) The GTEx Consortium atlas of genetic regulatory effects across human tissues. Science, 369 (6509) 1318-1330. doi:10.1126/science.aaz1776. PMID 32913098

ABSTRACT

The Genotype-Tissue Expression (GTEx) project was established to characterize genetic effects on the transcriptome across human tissues and to link these regulatory mechanisms to trait and disease associations. Here, we present analyses of the version 8 data, examining 15,201 RNA-sequencing samples from 49 tissues of 838 postmortem donors. We comprehensively characterize genetic associations for gene expression and splicing in cis and trans, showing that regulatory associations are found for almost all genes, and describe the underlying molecular mechanisms and their contribution to allelic heterogeneity and pleiotropy of complex traits. Leveraging the large diversity of tissues, we provide insights into the tissue specificity of genetic effects and show that cell type composition is a key factor in understanding gene regulatory mechanisms in human tissues.

Show full abstractShow less

DOI

10.1126/science.aaz1776

GTEx

Summary statistics

PUBMED_LINK

32913098

DESCRIPTION

V8

Show full descriptionShow less

TITLE

The GTEx Consortium atlas of genetic regulatory effects across human tissues.

Main citation

GTEx Consortium. (2020) The GTEx Consortium atlas of genetic regulatory effects across human tissues. Science, 369 (6509) 1318-1330. doi:10.1126/science.aaz1776. PMID 32913098

ABSTRACT

The Genotype-Tissue Expression (GTEx) project was established to characterize genetic effects on the transcriptome across human tissues and to link these regulatory mechanisms to trait and disease associations. Here, we present analyses of the version 8 data, examining 15,201 RNA-sequencing samples from 49 tissues of 838 postmortem donors. We comprehensively characterize genetic associations for gene expression and splicing in cis and trans, showing that regulatory associations are found for almost all genes, and describe the underlying molecular mechanisms and their contribution to allelic heterogeneity and pleiotropy of complex traits. Leveraging the large diversity of tissues, we provide insights into the tissue specificity of genetic effects and show that cell type composition is a key factor in understanding gene regulatory mechanisms in human tissues.

Show full abstractShow less

DOI

10.1126/science.aaz1776

GTEx

Summary statistics

PUBMED_LINK

35922509

DESCRIPTION

V9 long-read RNA-seq data

Show full descriptionShow less

TITLE

Transcriptome variation in human tissues revealed by long-read sequencing.

Main citation

Glinos DA, Garborcauskas G, Hoffman P, Ehsan N, ...&, Cummings BB. (2022) Transcriptome variation in human tissues revealed by long-read sequencing. Nature, 608 (7922) 353-359. doi:10.1038/s41586-022-05035-y. PMID 35922509

ABSTRACT

Regulation of transcript structure generates transcript diversity and plays an important role in human disease1-7. The advent of long-read sequencing technologies offers the opportunity to study the role of genetic variation in transcript structure8-16. In this Article, we present a large human long-read RNA-seq dataset using the Oxford Nanopore Technologies platform from 88 samples from Genotype-Tissue Expression (GTEx) tissues and cell lines, complementing the GTEx resource. We identified just over 70,000 novel transcripts for annotated genes, and validated the protein expression of 10% of novel transcripts. We developed a new computational package, LORALS, to analyse the genetic effects of rare and common variants on the transcriptome by allele-specific analysis of long reads. We characterized allele-specific expression and transcript structure events, providing new insights into the specific transcript alterations caused by common and rare genetic variants and highlighting the resolution gained from long-read data. We were able to perturb the transcript structure upon knockdown of PTBP1, an RNA binding protein that mediates splicing, thereby finding genetic regulatory effects that are modified by the cellular environment. Finally, we used this dataset to enhance variant interpretation and study rare variants leading to aberrant splicing patterns.

Show full abstractShow less

DOI

10.1038/s41586-022-05035-y

GTEx

Summary statistics

PUBMED_LINK

23715323

DESCRIPTION

Project overview

Show full descriptionShow less

TITLE

The Genotype-Tissue Expression (GTEx) project.

Main citation

GTEx Consortium. (2013) The Genotype-Tissue Expression (GTEx) project. Nat Genet, 45 (6) 580-5. doi:10.1038/ng.2653. PMID 23715323

ABSTRACT

Genome-wide association studies have identified thousands of loci for common diseases, but, for the majority of these, the mechanisms underlying disease susceptibility remain unknown. Most associated variants are not correlated with protein-coding changes, suggesting that polymorphisms in regulatory regions probably contribute to many disease phenotypes. Here we describe the Genotype-Tissue Expression (GTEx) project, which will establish a resource database and associated tissue bank for the scientific community to study the relationship between genetic variation and gene expression in human tissues.

Show full abstractShow less

DOI

10.1038/ng.2653

GTEx

Summary statistics

PUBMED_LINK

35549429

DESCRIPTION

V9 snRNA-Seq

Show full descriptionShow less

TITLE

Single-nucleus cross-tissue molecular reference maps toward understanding disease gene function.

Main citation

Eraslan G, Drokhlyansky E, Anand S, Fiskin E, ...&, Regev A. (2022) Single-nucleus cross-tissue molecular reference maps toward understanding disease gene function. Science, 376 (6594) eabl4290. doi:10.1126/science.abl4290. PMID 35549429

ABSTRACT

Understanding gene function and regulation in homeostasis and disease requires knowledge of the cellular and tissue contexts in which genes are expressed. Here, we applied four single-nucleus RNA sequencing methods to eight diverse, archived, frozen tissue types from 16 donors and 25 samples, generating a cross-tissue atlas of 209,126 nuclei profiles, which we integrated across tissues, donors, and laboratory methods with a conditional variational autoencoder. Using the resulting cross-tissue atlas, we highlight shared and tissue-specific features of tissue-resident cell populations; identify cell types that might contribute to neuromuscular, metabolic, and immune components of monogenic diseases and the biological processes involved in their pathology; and determine cell types and gene modules that might underlie disease mechanisms for complex traits analyzed by genome-wide association studies.

Show full abstractShow less

DOI

10.1126/science.abl4290

Gudjonsson A, et al-35078996

Summary statistics

PUBMED_LINK

35078996

TITLE

A genome-wide association study of serum proteins reveals shared loci with common diseases.

Main citation

Gudjonsson A, Gudmundsdottir V, Axelsson GT, Gudmundsson EF, ...&, Gudnason V. (2022) A genome-wide association study of serum proteins reveals shared loci with common diseases. Nat Commun, 13 (1) 480. doi:10.1038/s41467-021-27850-z. PMID 35078996

ABSTRACT

With the growing number of genetic association studies, the genotype-phenotype atlas has become increasingly more complex, yet the functional consequences of most disease associated alleles is not understood. The measurement of protein level variation in solid tissues and biofluids integrated with genetic variants offers a path to deeper functional insights. Here we present a large-scale proteogenomic study in 5,368 individuals, revealing 4,035 independent associations between genetic variants and 2,091 serum proteins, of which 36% are previously unreported. The majority of both cis- and trans-acting genetic signals are unique for a single protein, although our results also highlight numerous highly pleiotropic genetic effects on protein levels and demonstrate that a protein's genetic association profile reflects certain characteristics of the protein, including its location in protein networks, tissue specificity and intolerance to loss of function mutations. Integrating protein measurements with deep phenotyping of the cohort, we observe substantial enrichment of phenotype associations for serum proteins regulated by established GWAS loci, and offer new insights into the interplay between genetics, serum protein levels and complex disease.

Show full abstractShow less

DOI

10.1038/s41467-021-27850-z

GWAS catalog

Summary statistics

PUBMED_LINK

36350656

DESCRIPTION

NHGRI–EBI GWAS Catalog — curated SNP–trait associations and deposition hub for full summary statistics.

Show full descriptionShow less

URL

https://www.ebi.ac.uk/gwas/

TITLE

The NHGRI-EBI GWAS Catalog: knowledgebase and deposition resource.

Main citation

Sollis E, Mosaku A, Abid A, Buniello A, ...&, Harris LW. (2023) The NHGRI-EBI GWAS Catalog: knowledgebase and deposition resource. Nucleic Acids Res, 51 (D1) D977-D985. doi:10.1093/nar/gkac1010. PMID 36350656

ABSTRACT

The NHGRI-EBI GWAS Catalog (www.ebi.ac.uk/gwas) is a FAIR knowledgebase providing detailed, structured, standardised and interoperable genome-wide association study (GWAS) data to >200 000 users per year from academic research, healthcare and industry. The Catalog contains variant-trait associations and supporting metadata for >45 000 published GWAS across >5000 human traits, and >40 000 full P-value summary statistics datasets. Content is curated from publications or acquired via author submission of prepublication summary statistics through a new submission portal and validation tool. GWAS data volume has vastly increased in recent years. We have updated our software to meet this scaling challenge and to enable rapid release of submitted summary statistics. The scope of the repository has expanded to include additional data types of high interest to the community, including sequencing-based GWAS, gene-based analyses and copy number variation analyses. Community outreach has increased the number of shared datasets from under-represented traits, e.g. cancer, and we continue to contribute to awareness of the lack of population diversity in GWAS. Interoperability of the Catalog has been enhanced through links to other resources including the Polygenic Score Catalog and the International Mouse Phenotyping Consortium, refinements to GWAS trait annotation, and the development of a standard format for GWAS data.

Show full abstractShow less

DOI

10.1093/nar/gkac1010

MAIN ANCESTRY

Multi-ancestry

Haas ME-34957434

Summary statistics

PUBMED_LINK

34957434

TITLE

Machine learning enables new insights into genetic contributions to liver fat accumulation.

Main citation

Haas ME, Pirruccello JP, Friedman SN, Wang M, ...&, Khera AV. (2021) Machine learning enables new insights into genetic contributions to liver fat accumulation. Cell Genom, 1 (3) . doi:10.1016/j.xgen.2021.100066. PMID 34957434

ABSTRACT

Excess liver fat, called hepatic steatosis, is a leading risk factor for end-stage liver disease and cardiometabolic diseases but often remains undiagnosed in clinical practice because of the need for direct imaging assessments. We developed an abdominal MRI-based machine-learning algorithm to accurately estimate liver fat (correlation coefficients, 0.97-0.99) from a truth dataset of 4,511 middle-aged UK Biobank participants, enabling quantification in 32,192 additional individuals. 17% of participants had predicted liver fat levels indicative of steatosis, and liver fat could not have been reliably estimated based on clinical factors such as BMI. A genome-wide association study of common genetic variants and liver fat replicated three known associations and identified five newly associated variants in or near the MTARC1, ADH1B, TRIB1, GPAM, and MAST3 genes (p < 3 × 10-8). A polygenic score integrating these eight genetic variants was strongly associated with future risk of chronic liver disease (hazard ratio > 1.32 per SD score, p < 9 × 10-17). Rare inactivating variants in the APOB or MTTP genes were identified in 0.8% of individuals with steatosis and conferred more than 6-fold risk (p < 2 × 10-5), highlighting a molecular subtype of hepatic steatosis characterized by defective secretion of apolipoprotein B-containing lipoproteins. We demonstrate that our imaging-based machine-learning model accurately estimates liver fat and may be useful in epidemiological and genetic studies of hepatic steatosis.

Show full abstractShow less

DOI

10.1016/j.xgen.2021.100066

MAIN ANCESTRY

EUR

Hannon

Summary statistics

PUBMED_LINK

26619357

URL

https://epigenetics.essex.ac.uk/mQTL/

TITLE

Methylation QTLs in the developing brain and their enrichment in schizophrenia risk loci.

Main citation

Hannon E, Spiers H, Viana J, Pidsley R, ...&, Mill J. (2016) Methylation QTLs in the developing brain and their enrichment in schizophrenia risk loci. Nat Neurosci, 19 (1) 48-54. doi:10.1038/nn.4182. PMID 26619357

ABSTRACT

We characterized DNA methylation quantitative trait loci (mQTLs) in a large collection (n = 166) of human fetal brain samples spanning 56-166 d post-conception, identifying >16,000 fetal brain mQTLs. Fetal brain mQTLs were primarily cis-acting, enriched in regulatory chromatin domains and transcription factor binding sites, and showed substantial overlap with genetic variants that were also associated with gene expression in the brain. Using tissue from three distinct regions of the adult brain (prefrontal cortex, striatum and cerebellum), we found that most fetal brain mQTLs were developmentally stable, although a subset was characterized by fetal-specific effects. Fetal brain mQTLs were enriched amongst risk loci identified in a recent large-scale genome-wide association study (GWAS) of schizophrenia, a severe psychiatric disorder with a hypothesized neurodevelopmental component. Finally, we found that mQTLs can be used to refine GWAS loci through the identification of discrete sites of variable fetal brain methylation associated with schizophrenia risk variants.

Show full abstractShow less

DOI

10.1038/nn.4182

Hansson O, et al-36504281

Summary statistics

PUBMED_LINK

36504281

TITLE

The genetic regulation of protein expression in cerebrospinal fluid.

Main citation

Hansson O, Kumar A, Janelidze S, Stomrud E, ...&, Mattsson-Carlgren N. (2023) The genetic regulation of protein expression in cerebrospinal fluid. EMBO Mol Med, 15 (1) e16359. doi:10.15252/emmm.202216359. PMID 36504281

ABSTRACT

Studies of the genetic regulation of cerebrospinal fluid (CSF) proteins may reveal pathways for treatment of neurological diseases. 398 proteins in CSF were measured in 1,591 participants from the BioFINDER study. Protein quantitative trait loci (pQTL) were identified as associations between genetic variants and proteins, with 176 pQTLs for 145 CSF proteins (P < 1.25 × 10-10 , 117 cis-pQTLs and 59 trans-pQTLs). Ventricular volume (measured with brain magnetic resonance imaging) was a confounder for several pQTLs. pQTLs for CSF and plasma proteins were overall correlated, but CSF-specific pQTLs were also observed. Mendelian randomization analyses suggested causal roles for several proteins, for example, ApoE, CD33, and GRN in Alzheimer's disease, MMP-10 in preclinical Alzheimer's disease, SIGLEC9 in amyotrophic lateral sclerosis, and CD38, GPNMB, and ADAM15 in Parkinson's disease. CSF levels of GRN, MMP-10, and GPNMB were altered in Alzheimer's disease, preclinical Alzheimer's disease, and Parkinson's disease, respectively. These findings point to pathways to be explored for novel therapies. The novel finding that ventricular volume confounded pQTLs has implications for design of future studies of the genetic regulation of the CSF proteome.

Show full abstractShow less

DOI

10.15252/emmm.202216359

Hatton

Summary statistics

PUBMED_LINK

38548728

DESCRIPTION

cis DNAm QTLs in three European (n = 3701) and two East Asian (n = 2099) cohorts

Show full descriptionShow less

URL

https://yanglab.westlake.edu.cn/software/smr/#mQTLsummarydata

TITLE

Genetic control of DNA methylation is largely shared across European and East Asian populations.

Main citation

Hatton AA, Cheng FF, Lin T, Shen RJ, ...&, McRae AF. (2024) Genetic control of DNA methylation is largely shared across European and East Asian populations. Nat Commun, 15 (1) 2713. doi:10.1038/s41467-024-47005-0. PMID 38548728

ABSTRACT

DNA methylation is an ideal trait to study the extent of the shared genetic control across ancestries, effectively providing hundreds of thousands of model molecular traits with large QTL effect sizes. We investigate cis DNAm QTLs in three European (n = 3701) and two East Asian (n = 2099) cohorts to quantify the similarities and differences in the genetic architecture across populations. We observe 80,394 associated mQTLs (62.2% of DNAm probes with significant mQTL) to be significant in both ancestries, while 28,925 mQTLs (22.4%) are identified in only a single ancestry. mQTL effect sizes are highly conserved across populations, with differences in mQTL discovery likely due to differences in allele frequency of associated variants and differing linkage disequilibrium between causal variants and assayed SNPs. This study highlights the overall similarity of genetic control across ancestries and the value of ancestral diversity in increasing the power to detect associations and enhancing fine mapping resolution.

Show full abstractShow less

DOI

10.1038/s41467-024-47005-0

Hillary RF, et al-31320639

Summary statistics

PUBMED_LINK

31320639

TITLE

Genome and epigenome wide studies of neurological protein biomarkers in the Lothian Birth Cohort 1936.

Main citation

Hillary RF, McCartney DL, Harris SE, Stevenson AJ, ...&, Marioni RE. (2019) Genome and epigenome wide studies of neurological protein biomarkers in the Lothian Birth Cohort 1936. Nat Commun, 10 (1) 3160. doi:10.1038/s41467-019-11177-x. PMID 31320639

ABSTRACT

Although plasma proteins may serve as markers of neurological disease risk, the molecular mechanisms responsible for inter-individual variation in plasma protein levels are poorly understood. Therefore, we conduct genome- and epigenome-wide association studies on the levels of 92 neurological proteins to identify genetic and epigenetic loci associated with their plasma concentrations (n = 750 healthy older adults). We identify 41 independent genome-wide significant (P < 5.4 × 10-10) loci for 33 proteins and 26 epigenome-wide significant (P < 3.9 × 10-10) sites associated with the levels of 9 proteins. Using this information, we identify biological pathways in which putative neurological biomarkers are implicated (neurological, immunological and extracellular matrix metabolic pathways). We also observe causal relationships (by Mendelian randomisation analysis) between changes in gene expression (DRAXIN, MDGA1 and KYNU), or DNA methylation profiles (MATN3, MDGA1 and NEP), and altered plasma protein levels. Together, this may help inform causal relationships between biomarkers and neurological diseases.

Show full abstractShow less

DOI

10.1038/s41467-019-11177-x

Hillary RF, et al-32641083

Summary statistics

PUBMED_LINK

32641083

TITLE

Multi-method genome- and epigenome-wide studies of inflammatory protein levels in healthy older adults.

Main citation

Hillary RF, Trejo-Banos D, Kousathanas A, McCartney DL, ...&, Marioni RE. (2020) Multi-method genome- and epigenome-wide studies of inflammatory protein levels in healthy older adults. Genome Med, 12 (1) 60. doi:10.1186/s13073-020-00754-1. PMID 32641083

ABSTRACT

BACKGROUND: The molecular factors which control circulating levels of inflammatory proteins are not well understood. Furthermore, association studies between molecular probes and human traits are often performed by linear model-based methods which may fail to account for complex structure and interrelationships within molecular datasets. METHODS: In this study, we perform genome- and epigenome-wide association studies (GWAS/EWAS) on the levels of 70 plasma-derived inflammatory protein biomarkers in healthy older adults (Lothian Birth Cohort 1936; n = 876; Olink® inflammation panel). We employ a Bayesian framework (BayesR+) which can account for issues pertaining to data structure and unknown confounding variables (with sensitivity analyses using ordinary least squares- (OLS) and mixed model-based approaches). RESULTS: We identified 13 SNPs associated with 13 proteins (n = 1 SNP each) concordant across OLS and Bayesian methods. We identified 3 CpG sites spread across 3 proteins (n = 1 CpG each) that were concordant across OLS, mixed-model and Bayesian analyses. Tagged genetic variants accounted for up to 45% of variance in protein levels (for MCP2, 36% of variance alone attributable to 1 polymorphism). Methylation data accounted for up to 46% of variation in protein levels (for CXCL10). Up to 66% of variation in protein levels (for VEGFA) was explained using genetic and epigenetic data combined. We demonstrated putative causal relationships between CD6 and IL18R1 with inflammatory bowel disease and between IL12B and Crohn's disease. CONCLUSIONS: Our data may aid understanding of the molecular regulation of the circulating inflammatory proteome as well as causal relationships between inflammatory mediators and disease.

Show full abstractShow less

DOI

10.1186/s13073-020-00754-1

Huang YJ-38762475

Summary statistics

PUBMED_LINK

38762475

DESCRIPTION

ABD,
carotid artery ultrasonography (CAU), BMD, ECG, and thyroid ultra- sonography (TU) : 28 ABD features, 29 CAU features, 85 BMD features, and 10 ECG features

Show full descriptionShow less

TITLE

AI-enhanced integration of genetic and medical imaging data for risk assessment of Type 2 diabetes.

Main citation

Huang YJ, Chen CH, Yang HC. (2024) AI-enhanced integration of genetic and medical imaging data for risk assessment of Type 2 diabetes. Nat Commun, 15 (1) 4230. doi:10.1038/s41467-024-48618-1. PMID 38762475

ABSTRACT

Type 2 diabetes (T2D) presents a formidable global health challenge, highlighted by its escalating prevalence, underscoring the critical need for precision health strategies and early detection initiatives. Leveraging artificial intelligence, particularly eXtreme Gradient Boosting (XGBoost), we devise robust risk assessment models for T2D. Drawing upon comprehensive genetic and medical imaging datasets from 68,911 individuals in the Taiwan Biobank, our models integrate Polygenic Risk Scores (PRS), Multi-image Risk Scores (MRS), and demographic variables, such as age, sex, and T2D family history. Here, we show that our model achieves an Area Under the Receiver Operating Curve (AUC) of 0.94, effectively identifying high-risk T2D subgroups. A streamlined model featuring eight key variables also maintains a high AUC of 0.939. This high accuracy for T2D risk assessment promises to catalyze early detection and preventive strategies. Moreover, we introduce an accessible online risk assessment tool for T2D, facilitating broader applicability and dissemination of our findings.

Show full abstractShow less

DOI

10.1038/s41467-024-48618-1

MAIN ANCESTRY

EAS

Ikram M.-28627999

Summary statistics

PUBMED_LINK

28627999

TITLE

Heritability and genome-wide associations studies of cerebral blood flow in the general population.

Main citation

Ikram MA, Zonneveld HI, Roshchupkin G, Smith AV, ...&, Adams HH. (2018) Heritability and genome-wide associations studies of cerebral blood flow in the general population. J Cereb Blood Flow Metab, 38 (9) 1598-1608. doi:10.1177/0271678X17715861. PMID 28627999

ABSTRACT

Cerebral blood flow is an important process for brain functioning and its dysregulation is implicated in multiple neurological disorders. While environmental risk factors have been identified, it remains unclear to what extent the flow is regulated by genetics. Here we performed heritability and genome-wide association analyses of cerebral blood flow in a population-based cohort study. We included 4472 persons free of cortical infarcts who underwent genotyping and phase-contrast magnetic resonance flow imaging (mean age 64.8 ± 10.8 years). The flow rate, cross-sectional area of the vessel, and flow velocity through the vessel were measured in the basilar artery and bilateral carotids. We found that the flow rate of the basilar artery is most heritable (h2 (SE) = 24.1 (9.8), p-value = 0.0056), and this increased over age. The association studies revealed two significant loci for the right carotid artery area (rs12546630, p-value = 2.0 × 10-8) and velocity (rs2971609, p-value = 1.4 × 10-8), with the latter showing a concordant effect in an independent sample (N = 1350, p-value = 0.057, meta-analyzed p-value = 2.5 × 10-9). These loci were also associated with other cerebral blood flow parameters below genome-wide significance, and rs2971609 lies in a known migraine locus. These findings establish that cerebral blood flow is under genetic control with potential relevance for neurological diseases.

Show full abstractShow less

DOI

10.1177/0271678X17715861

MAIN ANCESTRY

EUR

Ishigaki

Summary statistics

PUBMED_LINK

28553958

TITLE

Polygenic burdens on cell-specific pathways underlie the risk of rheumatoid arthritis.

Main citation

Ishigaki K, Kochi Y, Suzuki A, Tsuchida Y, ...&, Yamamoto K. (2017) Polygenic burdens on cell-specific pathways underlie the risk of rheumatoid arthritis. Nat Genet, 49 (7) 1120-1125. doi:10.1038/ng.3885. PMID 28553958

ABSTRACT

Recent evidence suggests that a substantial portion of complex disease risk alleles modify gene expression in a cell-specific manner. To identify candidate causal genes and biological pathways of immune-related complex diseases, we conducted expression quantitative trait loci (eQTL) analysis on five subsets of immune cells (CD4+ T cells, CD8+ T cells, B cells, natural killer (NK) cells and monocytes) and unfractionated peripheral blood from 105 healthy Japanese volunteers. We developed a three-step analytical pipeline comprising (i) prediction of individual gene expression using our eQTL database and public epigenomic data, (ii) gene-level association analysis and (iii) prediction of cell-specific pathway activity by integrating the direction of eQTL effects. By applying this pipeline to rheumatoid arthritis data sets, we identified candidate causal genes and a cytokine pathway (upregulation of tumor necrosis factor (TNF) in CD4+ T cells). Our approach is an efficient way to characterize the polygenic contributions and potential biological mechanisms of complex diseases.

Show full abstractShow less

DOI

10.1038/ng.3885

Japan Omics Browser

Summary statistics

PUBMED_LINK

40335902

DESCRIPTION

Japan Omics Browser (JOB) for browsing omics and GWAS-style association results in Japanese cohorts.

Show full descriptionShow less

URL

https://japan-omics.jp/

TITLE

JOB: Japan Omics Browser provides integrative visualization of multi-omics data.

Main citation

Takahashi Y, Wang QS, Hasegawa T, Namkoong H, ...&, Japan COVID-19 Task Force. (2025) JOB: Japan Omics Browser provides integrative visualization of multi-omics data. BMC Genomics, 26 (1) 451. doi:10.1186/s12864-025-11639-1. PMID 40335902

ABSTRACT

We present the Japan Omics Browser (JOB), which enables integrative analysis of human omics at different layers. JOB offers visualization of per-variant regulatory effects in the human blood at mRNA and protein level distinctively, quantified from statistical fine-mapping of mRNA-expression quantitative loci (eQTL) and protein QTLs (pQTLs) in 1,405 Japanese, together with fine-mapping results of 94 complex traits in UK Biobank. In addition, JOB shows per-tissue regulatory effect prediction score (EMS), trained via multi-task learning. Furthermore, validation scores from Massively Parallel Reporter Assay (MPRA) in two cell types are available for over 10,000 variants. JOB is publicly available at https://japan-omics.jp/ .

Show full abstractShow less

DOI

10.1186/s12864-025-11639-1

RELATED_BIOBANK

The Japan COVID-19 Task Force study

MAIN ANCESTRY

EAS

JCTF

Summary statistics

PUBMED_LINK

39317738

DESCRIPTION

Japan COVID-19 Task Force

Show full descriptionShow less

TITLE

Statistically and functionally fine-mapped blood eQTLs and pQTLs from 1,405 humans reveal distinct regulation patterns and disease relevance.

Main citation

Wang QS, Hasegawa T, Namkoong H, Saiki R, ...&, Japan COVID-19 Task Force. (2024) Statistically and functionally fine-mapped blood eQTLs and pQTLs from 1,405 humans reveal distinct regulation patterns and disease relevance. Nat Genet, 56 (10) 2054-2067. doi:10.1038/s41588-024-01896-3. PMID 39317738

ABSTRACT

Studying the genetic regulation of protein expression (through protein quantitative trait loci (pQTLs)) offers a deeper understanding of regulatory variants uncharacterized by mRNA expression regulation (expression QTLs (eQTLs)) studies. Here we report cis-eQTL and cis-pQTL statistical fine-mapping from 1,405 genotyped samples with blood mRNA and 2,932 plasma samples of protein expression, as part of the Japan COVID-19 Task Force (JCTF). Fine-mapped eQTLs (n = 3,464) were enriched for 932 variants validated with a massively parallel reporter assay. Fine-mapped pQTLs (n = 582) were enriched for missense variations on structured and extracellular domains, although the possibility of epitope-binding artifacts remains. Trans-eQTL and trans-pQTL analysis highlighted associations of class I HLA allele variation with KIR genes. We contrast the multi-tissue origin of plasma protein with blood mRNA, contributing to the limited colocalization level, distinct regulatory mechanisms and trait relevance of eQTLs and pQTLs. We report a negative correlation between ABO mRNA and protein expression because of linkage disequilibrium between distinct nearby eQTLs and pQTLs.

Show full abstractShow less

DOI

10.1038/s41588-024-01896-3

Johansson Å, et al-23487758

Summary statistics

PUBMED_LINK

23487758

TITLE

Identification of genetic variants influencing the human plasma proteome.

Main citation

Johansson Å, Enroth S, Palmblad M, Deelder AM, ...&, Gyllensten U. (2013) Identification of genetic variants influencing the human plasma proteome. Proc Natl Acad Sci U S A, 110 (12) 4673-8. doi:10.1073/pnas.1217238110. PMID 23487758

ABSTRACT

Genetic variants influencing the transcriptome have been extensively studied. However, the impact of the genetic factors on the human proteome is largely unexplored, mainly due to lack of suitable high-throughput methods. Here we present unique and comprehensive identification of genetic variants affecting the human plasma protein profile by combining high-throughput and high-resolution mass spectrometry (MS) with genome-wide SNP data. We identified and quantified the abundance of 1,056 tryptic-digested peptides, representing 163 proteins in the plasma of 1,060 individuals from two population-based cohorts. The abundance level of almost one-fifth (19%) of the peptides was found to be heritable, with heritability ranging from 0.08 to 0.43. The levels of 60 peptides from 25 proteins, 15% of the proteins studied, were influenced by cis-acting SNPs. We identified and replicated individual cis-acting SNPs (combined P value ranging from 3.1 × 10(-52) to 2.9 × 10(-12)) influencing 11 peptides from 5 individual proteins. These SNPs represent both regulatory SNPs and nonsynonymous changes defining well-studied disease alleles such as the ε4 allele of apolipoprotein E (APOE), which has been shown to increase risk of Alzheimer's disease. Our results show that high-throughput mass spectrometry represents a promising method for large-scale characterization of the human proteome, allowing for both quantification and sequencing of individual proteins. Abundance and peptide composition of a protein plays an important role in the etiology, diagnosis, and treatment of a number of diseases. A better understanding of the genetic impact on the plasma proteome is therefore important for evaluating potential biomarkers and therapeutic agents for common diseases.

Show full abstractShow less

DOI

10.1073/pnas.1217238110

Karkar S-33664500

Summary statistics

PUBMED_LINK

33664500

TITLE

Genome-wide haplotype association study in imaging genetics using whole-brain sulcal openings of 16,304 UK Biobank subjects.

Main citation

Karkar S, Dandine-Roulland C, Mangin JF, Le Guen Y, ...&, Frouin V. (2021) Genome-wide haplotype association study in imaging genetics using whole-brain sulcal openings of 16,304 UK Biobank subjects. Eur J Hum Genet, 29 (9) 1424-1437. doi:10.1038/s41431-021-00827-8. PMID 33664500

ABSTRACT

Neuroimaging-genetics cohorts gather two types of data: brain imaging and genetic data. They allow the discovery of associations between genetic variants and brain imaging features. They are invaluable resources to study the influence of genetics and environment in the brain features variance observed in normal and pathological populations. This study presents a genome-wide haplotype analysis for 123 brain sulcus opening value (a measure of sulcal width) across the whole brain that include 16,304 subjects from UK Biobank. Using genetic maps, we defined 119,548 blocks of low recombination rate distributed along the 22 autosomal chromosomes and analyzed 1,051,316 haplotypes. To test associations between haplotypes and complex traits, we designed three statistical approaches. Two of them use a model that includes all the haplotypes for a single block, while the last approach considers each haplotype independently. All the statistics produced were assessed as rigorously as possible. Thanks to the rich imaging dataset at hand, we used resampling techniques to assess False Positive Rate for each statistical approach in a genome-wide and brain-wide context. The results on real data show that genome-wide haplotype analyses are more sensitive than single-SNP approach and account for local complex Linkage Disequilibrium (LD) structure, which makes genome-wide haplotype analysis an interesting and statistically sound alternative to the single-SNP counterpart.

Show full abstractShow less

DOI

10.1038/s41431-021-00827-8

MAIN ANCESTRY

EUR

Katz DH, et al-34814699

Summary statistics

PUBMED_LINK

34814699

TITLE

Whole Genome Sequence Analysis of the Plasma Proteome in Black Adults Provides Novel Insights Into Cardiovascular Disease.

Main citation

Katz DH, Tahir UA, Bick AG, Pampana A, ...&, and Blood Institute TOPMed (Trans-Omics for Precision Medicine) Consortium†. (2022) Whole Genome Sequence Analysis of the Plasma Proteome in Black Adults Provides Novel Insights Into Cardiovascular Disease. Circulation, 145 (5) 357-370. doi:10.1161/CIRCULATIONAHA.121.055117. PMID 34814699

ABSTRACT

BACKGROUND: Plasma proteins are critical mediators of cardiovascular processes and are the targets of many drugs. Previous efforts to characterize the genetic architecture of the plasma proteome have been limited by a focus on individuals of European descent and leveraged genotyping arrays and imputation. Here we describe whole genome sequence analysis of the plasma proteome in individuals with greater African ancestry, increasing our power to identify novel genetic determinants. METHODS: Proteomic profiling of 1301 proteins was performed in 1852 Black adults from the Jackson Heart Study using aptamer-based proteomics (SomaScan). Whole genome sequencing association analysis was ascertained for all variants with minor allele count ≥5. Results were validated using an alternative, antibody-based, proteomic platform (Olink) as well as replicated in the Multi-Ethnic Study of Atherosclerosis and the HERITAGE Family Study (Health, Risk Factors, Exercise Training and Genetics). RESULTS: We identify 569 genetic associations between 479 proteins and 438 unique genetic regions at a Bonferroni-adjusted significance level of 3.8×10-11. These associations include 114 novel locus-protein relationships and an additional 217 novel sentinel variant-protein relationships. Novel cardiovascular findings include new protein associations at the APOE gene locus including ZAP70 (sentinel single nucleotide polymorphism [SNP] rs7412-T, β=0.61±0.05, P=3.27×10-30) and MMP-3 (β=-0.60±0.05, P=1.67×10-32), as well as a completely novel pleiotropic locus at the HPX gene, associated with 9 proteins. Further, the associations suggest new mechanisms of genetically mediated cardiovascular disease linked to African ancestry; we identify a novel association between variants linked to APOL1-associated chronic kidney and heart disease and the protein CKAP2 (rs73885319-G, β=0.34±0.04, P=1.34×10-17) as well as an association between ATTR amyloidosis and RBP4 levels in community-dwelling individuals without heart failure. CONCLUSIONS: Taken together, these results provide evidence for the functional importance of variants in non-European populations, and suggest new biological mechanisms for ancestry-specific determinants of lipids, coagulation, and myocardial function.

Show full abstractShow less

DOI

10.1161/CIRCULATIONAHA.121.055117

Katz DH, et al-35984888

Summary statistics

PUBMED_LINK

35984888

TITLE

Proteomic profiling platforms head to head: Leveraging genetics and clinical traits to compare aptamer- and antibody-based methods.

Main citation

Katz DH, Robbins JM, Deng S, Tahir UA, ...&, Gerszten RE. (2022) Proteomic profiling platforms head to head: Leveraging genetics and clinical traits to compare aptamer- and antibody-based methods. Sci Adv, 8 (33) eabm5164. doi:10.1126/sciadv.abm5164. PMID 35984888

ABSTRACT

High-throughput proteomic profiling using antibody or aptamer-based affinity reagents is used increasingly in human studies. However, direct analyses to address the relative strengths and weaknesses of these platforms are lacking. We assessed findings from the SomaScan1.3K (N = 1301 reagents), the SomaScan5K platform (N = 4979 reagents), and the Olink Explore (N = 1472 reagents) profiling techniques in 568 adults from the Jackson Heart Study and 219 participants in the HERITAGE Family Study across four performance domains: precision, accuracy, analytic breadth, and phenotypic associations leveraging detailed clinical phenotyping and genetic data. Across these studies, we show evidence supporting more reliable protein target specificity and a higher number of phenotypic associations for the Olink platform, while the Soma platforms benefit from greater measurement precision and analytic breadth across the proteome.

Show full abstractShow less

DOI

10.1126/sciadv.abm5164

Kauwe JS, et al-25340798

Summary statistics

PUBMED_LINK

25340798

TITLE

Genome-wide association study of CSF levels of 59 alzheimer's disease candidate proteins: significant associations with proteins involved in amyloid processing and inflammation.

Main citation

Kauwe JS, Bailey MH, Ridge PG, Perry R, ...&, Goate AM. (2014) Genome-wide association study of CSF levels of 59 alzheimer's disease candidate proteins: significant associations with proteins involved in amyloid processing and inflammation. PLoS Genet, 10 (10) e1004758. doi:10.1371/journal.pgen.1004758. PMID 25340798

ABSTRACT

Cerebrospinal fluid (CSF) 42 amino acid species of amyloid beta (Aβ42) and tau levels are strongly correlated with the presence of Alzheimer's disease (AD) neuropathology including amyloid plaques and neurodegeneration and have been successfully used as endophenotypes for genetic studies of AD. Additional CSF analytes may also serve as useful endophenotypes that capture other aspects of AD pathophysiology. Here we have conducted a genome-wide association study of CSF levels of 59 AD-related analytes. All analytes were measured using the Rules Based Medicine Human DiscoveryMAP Panel, which includes analytes relevant to several disease-related processes. Data from two independently collected and measured datasets, the Knight Alzheimer's Disease Research Center (ADRC) and Alzheimer's Disease Neuroimaging Initiative (ADNI), were analyzed separately, and combined results were obtained using meta-analysis. We identified genetic associations with CSF levels of 5 proteins (Angiotensin-converting enzyme (ACE), Chemokine (C-C motif) ligand 2 (CCL2), Chemokine (C-C motif) ligand 4 (CCL4), Interleukin 6 receptor (IL6R) and Matrix metalloproteinase-3 (MMP3)) with study-wide significant p-values (p<1.46×10-10) and significant, consistent evidence for association in both the Knight ADRC and the ADNI samples. These proteins are involved in amyloid processing and pro-inflammatory signaling. SNPs associated with ACE, IL6R and MMP3 protein levels are located within the coding regions of the corresponding structural gene. The SNPs associated with CSF levels of CCL4 and CCL2 are located in known chemokine binding proteins. The genetic associations reported here are novel and suggest mechanisms for genetic control of CSF and plasma levels of these disease-related proteins. Significant SNPs in ACE and MMP3 also showed association with AD risk. Our findings suggest that these proteins/pathways may be valuable therapeutic targets for AD. Robust associations in cognitively normal individuals suggest that these SNPs also influence regulation of these proteins more generally and may therefore be relevant to other diseases.

Show full abstractShow less

DOI

10.1371/journal.pgen.1004758

Khurshid S-36944631

Summary statistics

PUBMED_LINK

36944631

TITLE

Clinical and genetic associations of deep learning-derived cardiac magnetic resonance-based left ventricular mass.

Main citation

Khurshid S, Lazarte J, Pirruccello JP, Weng LC, ...&, Lubitz SA. (2023) Clinical and genetic associations of deep learning-derived cardiac magnetic resonance-based left ventricular mass. Nat Commun, 14 (1) 1558. doi:10.1038/s41467-023-37173-w. PMID 36944631

ABSTRACT

Left ventricular mass is a risk marker for cardiovascular events, and may indicate an underlying cardiomyopathy. Cardiac magnetic resonance is the gold-standard for left ventricular mass estimation, but is challenging to obtain at scale. Here, we use deep learning to enable genome-wide association study of cardiac magnetic resonance-derived left ventricular mass indexed to body surface area within 43,230 UK Biobank participants. We identify 12 genome-wide associations (1 known at TTN and 11 novel for left ventricular mass), implicating genes previously associated with cardiac contractility and cardiomyopathy. Cardiac magnetic resonance-derived indexed left ventricular mass is associated with incident dilated and hypertrophic cardiomyopathies, and implantable cardioverter-defibrillator implant. An indexed left ventricular mass polygenic risk score ≥90th percentile is also associated with incident implantable cardioverter-defibrillator implant in separate UK Biobank (hazard ratio 1.22, 95% CI 1.05-1.44) and Mass General Brigham (hazard ratio 1.75, 95% CI 1.12-2.74) samples. Here, we perform a genome-wide association study of cardiac magnetic resonance-derived indexed left ventricular mass to identify 11 novel variants and demonstrate that cardiac magnetic resonance-derived and genetically predicted indexed left ventricular mass are associated with incident cardiomyopathy.

Show full abstractShow less

DOI

10.1038/s41467-023-37173-w

MAIN ANCESTRY

EUR

Kim S, et al-23894628

Summary statistics

PUBMED_LINK

23894628

TITLE

Influence of genetic variation on plasma protein levels in older adults using a multi-analyte panel.

Main citation

Kim S, Swaminathan S, Inlow M, Risacher SL, ...&, Alzheimer’s Disease Neuroimaging Initiative (ADNI). (2013) Influence of genetic variation on plasma protein levels in older adults using a multi-analyte panel. PLoS One, 8 (7) e70269. doi:10.1371/journal.pone.0070269. PMID 23894628

ABSTRACT

Proteins, widely studied as potential biomarkers, play important roles in numerous physiological functions and diseases. Genetic variation may modulate corresponding protein levels and point to the role of these variants in disease pathophysiology. Effects of individual single nucleotide polymorphisms (SNPs) within a gene were analyzed for corresponding plasma protein levels using genome-wide association study (GWAS) genotype data and proteomic panel data with 132 quality-controlled analytes from 521 Caucasian participants in the Alzheimer's Disease Neuroimaging Initiative (ADNI) cohort. Linear regression analysis detected 112 significant (Bonferroni threshold p=2.44×10(-5)) associations between 27 analytes and 112 SNPs. 107 out of these 112 associations were tested in the Indiana Memory and Aging Study (IMAS) cohort for replication and 50 associations were replicated at uncorrected p<0.05 in the same direction of effect as those in the ADNI. We identified multiple novel associations including the association of rs7517126 with plasma complement factor H-related protein 1 (CFHR1) level at p<1.46×10(-60), accounting for 40 percent of total variation of the protein level. We serendipitously found the association of rs6677604 with the same protein at p<9.29×10(-112). Although these two SNPs were not in the strong linkage disequilibrium, 61 percent of total variation of CFHR1 was accounted for by rs6677604 without additional variation by rs7517126 when both SNPs were tested together. 78 other SNP-protein associations in the ADNI sample exceeded genome-wide significance (5×10(-8)). Our results confirmed previously identified gene-protein associations for interleukin-6 receptor, chemokine CC-4, angiotensin-converting enzyme, and angiotensinogen, although the direction of effect was reversed in some cases. This study is among the first analyses of gene-protein product relationships integrating multiplex-panel proteomics and targeted genes extracted from a GWAS array. With intensive searches taking place for proteomic biomarkers for many diseases, the role of genetic variation takes on new importance and should be considered in interpretation of proteomic results.

Show full abstractShow less

DOI

10.1371/journal.pone.0070269

Kirchler M-35640976 (transferGWAS)

Summary statistics

PUBMED_LINK

35640976

DESCRIPTION

transferGWAS is a method for performing genome-wide association studies on whole images.

Show full descriptionShow less

URL

https://github.com/mkirchler/transferGWAS/

TITLE

transferGWAS: GWAS of images using deep transfer learning.

Main citation

Kirchler M, Konigorski S, Norden M, Meltendorf C, ...&, Lippert C. (2022) transferGWAS: GWAS of images using deep transfer learning. Bioinformatics, 38 (14) 3621-3628. doi:10.1093/bioinformatics/btac369. PMID 35640976

ABSTRACT

MOTIVATION: Medical images can provide rich information about diseases and their biology. However, investigating their association with genetic variation requires non-standard methods. We propose transferGWAS, a novel approach to perform genome-wide association studies directly on full medical images. First, we learn semantically meaningful representations of the images based on a transfer learning task, during which a deep neural network is trained on independent but similar data. Then, we perform genetic association tests with these representations. RESULTS: We validate the type I error rates and power of transferGWAS in simulation studies of synthetic images. Then we apply transferGWAS in a genome-wide association study of retinal fundus images from the UK Biobank. This first-of-a-kind GWAS of full imaging data yielded 60 genomic regions associated with retinal fundus images, of which 7 are novel candidate loci for eye-related traits and diseases. AVAILABILITY AND IMPLEMENTATION: Our method is implemented in Python and available at https://github.com/mkirchler/transferGWAS/. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.

Show full abstractShow less

DOI

10.1093/bioinformatics/btac369

KoGES Pheweb

Summary statistics

DESCRIPTION

PheWeb instance for KoGES (Korean Genome and Epidemiology Study) GWAS summary statistics.

Show full descriptionShow less

URL

https://koges.leelabsg.org/

MAIN ANCESTRY

EAS

Koprulu M, et al-36823471

Summary statistics

PUBMED_LINK

36823471

TITLE

Proteogenomic links to human metabolic diseases.

Main citation

Koprulu M, Carrasco-Zanini J, Wheeler E, Lockhart S, ...&, Langenberg C. (2023) Proteogenomic links to human metabolic diseases. Nat Metab, 5 (3) 516-528. doi:10.1038/s42255-023-00753-7. PMID 36823471

ABSTRACT

Studying the plasma proteome as the intermediate layer between the genome and the phenome has the potential to identify new disease processes. Here, we conducted a cis-focused proteogenomic analysis of 2,923 plasma proteins measured in 1,180 individuals using antibody-based assays. We (1) identify 256 unreported protein quantitative trait loci (pQTL); (2) demonstrate shared genetic regulation of 224 cis-pQTLs with 575 specific health outcomes, revealing examples for notable metabolic diseases (such as gastrin-releasing peptide as a potential therapeutic target for type 2 diabetes); (3) improve causal gene assignment at 40% (n = 192) of overlapping risk loci; and (4) observe convergence of phenotypic consequences of cis-pQTLs and rare loss-of-function gene burden for 12 proteins, such as TIMD4 for lipoprotein metabolism. Our findings demonstrate the value of integrating complementary proteomic technologies with genomics even at moderate scale to identify new mediators of metabolic diseases with the potential for therapeutic interventions.

Show full abstractShow less

DOI

10.1038/s42255-023-00753-7

KoreanChip

Summary statistics

PUBMED_LINK

30718733

DESCRIPTION

GWAS summary statistics based on the Korea Biobank Array (KoreanChip / KoGES).

Show full descriptionShow less

URL

https://www.koreanchip.org/downloads

TITLE

The Korea Biobank Array: Design and Identification of Coding Variants Associated with Blood Biochemical Traits.

Main citation

Moon S, Kim YJ, Han S, Hwang MY, ...&, Kim BJ. (2019) The Korea Biobank Array: Design and Identification of Coding Variants Associated with Blood Biochemical Traits. Sci Rep, 9 (1) 1382. doi:10.1038/s41598-018-37832-9. PMID 30718733

ABSTRACT

We introduce the design and implementation of a new array, the Korea Biobank Array (referred to as KoreanChip), optimized for the Korean population and demonstrate findings from GWAS of blood biochemical traits. KoreanChip comprised >833,000 markers including >247,000 rare-frequency or functional variants estimated from >2,500 sequencing data in Koreans. Of the 833 K markers, 208 K functional markers were directly genotyped. Particularly, >89 K markers were presented in East Asians. KoreanChip achieved higher imputation performance owing to the excellent genomic coverage of 95.38% for common and 73.65% for low-frequency variants. From GWAS (Genome-wide association study) using 6,949 individuals, 28 associations were successfully recapitulated. Moreover, 9 missense variants were newly identified, of which we identified new associations between a common population-specific missense variant, rs671 (p.Glu457Lys) of ALDH2, and two traits including aspartate aminotransferase (P = 5.20 × 10-13) and alanine aminotransferase (P = 4.98 × 10-8). Furthermore, two novel missense variants of GPT with rare frequency in East Asians but extreme rarity in other populations were associated with alanine aminotransferase (rs200088103; p.Arg133Trp, P = 2.02 × 10-9 and rs748547625; p.Arg143Cys, P = 1.41 × 10-6). These variants were successfully replicated in 6,000 individuals (P = 5.30 × 10-8 and P = 1.24 × 10-6). GWAS results suggest the promising utility of KoreanChip with a substantial number of damaging variants to identify new population-specific disease-associated rare/functional variants.

Show full abstractShow less

DOI

10.1038/s41598-018-37832-9

MAIN ANCESTRY

EAS

Krishna C, et al-39085222

Summary statistics

PUBMED_LINK

39085222

TITLE

The influence of HLA genetic variation on plasma protein expression.

Main citation

Krishna C, Chiou J, Sakaue S, Kang JB, ...&, Hu X. (2024) The influence of HLA genetic variation on plasma protein expression. Nat Commun, 15 (1) 6469. doi:10.1038/s41467-024-50583-8. PMID 39085222

ABSTRACT

Genetic variation in the human leukocyte antigen (HLA) loci is associated with risk of immune-mediated diseases, but the molecular effects of HLA polymorphism are unclear. Here we examined the effects of HLA genetic variation on the expression of 2940 plasma proteins across 45,330 Europeans in the UK Biobank, with replication analyses across multiple ancestry groups. We detected 504 proteins affected by HLA variants (HLA-pQTL), including widespread trans effects by autoimmune disease risk alleles. More than 80% of the HLA-pQTL fine-mapped to amino acid positions in the peptide binding groove. HLA-I and II affected proteins expressed in similar cell types but in different pathways of both adaptive and innate immunity. Finally, we investigated potential HLA-pQTL effects on disease by integrating HLA-pQTL with fine-mapped HLA-disease signals in the UK Biobank. Our data reveal the diverse effects of HLA genetic variation and aid the interpretation of associations between HLA alleles and immune-mediated diseases.

Show full abstractShow less

DOI

10.1038/s41467-024-50583-8

RELATED_BIOBANK

https://db.cngb.org/MANE.PheWeb/

MAIN ANCESTRY

EUR

Littlejohns TJ-32457287

Summary statistics

PUBMED_LINK

32457287

DESCRIPTION

brain, cardiac and abdominal magnetic resonance imaging, dual-energy X-ray absorptiometry and carotid ultrasound

Show full descriptionShow less

TITLE

The UK Biobank imaging enhancement of 100,000 participants: rationale, data collection, management and future directions.

Main citation

Littlejohns TJ, Holliday J, Gibson LM, Garratt S, ...&, Allen NE. (2020) The UK Biobank imaging enhancement of 100,000 participants: rationale, data collection, management and future directions. Nat Commun, 11 (1) 2624. doi:10.1038/s41467-020-15948-9. PMID 32457287

ABSTRACT

UK Biobank is a population-based cohort of half a million participants aged 40-69 years recruited between 2006 and 2010. In 2014, UK Biobank started the world's largest multi-modal imaging study, with the aim of re-inviting 100,000 participants to undergo brain, cardiac and abdominal magnetic resonance imaging, dual-energy X-ray absorptiometry and carotid ultrasound. The combination of large-scale multi-modal imaging with extensive phenotypic and genetic data offers an unprecedented resource for scientists to conduct health-related research. This article provides an in-depth overview of the imaging enhancement, including the data collected, how it is managed and processed, and future directions.

Show full abstractShow less

DOI

10.1038/s41467-020-15948-9

MAIN ANCESTRY

EUR

Liu F-23028347

Summary statistics

PUBMED_LINK

23028347

TITLE

A genome-wide association study identifies five loci influencing facial morphology in Europeans.

Main citation

Liu F, van der Lijn F, Schurmann C, Zhu G, ...&, Kayser M. (2012) A genome-wide association study identifies five loci influencing facial morphology in Europeans. PLoS Genet, 8 (9) e1002932. doi:10.1371/journal.pgen.1002932. PMID 23028347

ABSTRACT

Inter-individual variation in facial shape is one of the most noticeable phenotypes in humans, and it is clearly under genetic regulation; however, almost nothing is known about the genetic basis of normal human facial morphology. We therefore conducted a genome-wide association study for facial shape phenotypes in multiple discovery and replication cohorts, considering almost ten thousand individuals of European descent from several countries. Phenotyping of facial shape features was based on landmark data obtained from three-dimensional head magnetic resonance images (MRIs) and two-dimensional portrait images. We identified five independent genetic loci associated with different facial phenotypes, suggesting the involvement of five candidate genes--PRDM16, PAX3, TP63, C5orf50, and COL17A1--in the determination of the human face. Three of them have been implicated previously in vertebrate craniofacial development and disease, and the remaining two genes potentially represent novel players in the molecular networks governing facial development. Our finding at PAX3 influencing the position of the nasion replicates a recent GWAS of facial features. In addition to the reported GWA findings, we established links between common DNA variants previously associated with NSCL/P at 2p21, 8q24, 13q31, and 17q22 and normal facial-shape variations based on a candidate gene approach. Overall our study implies that DNA variants in genes essential for craniofacial development contribute with relatively small effect size to the spectrum of normal variation in human facial morphology. This observation has important consequences for future studies aiming to identify more genes involved in the human facial morphology, as well as for potential applications of DNA prediction of facial shape such as in future forensic applications.

Show full abstractShow less

DOI

10.1371/journal.pgen.1002932

MAIN ANCESTRY

EUR

Liu M-38038215

Summary statistics

PUBMED_LINK

38038215

TITLE

Chromosome 10q24.32 Variants Associate With Brain Arterial Diameters in Diverse Populations: A Genome-Wide Association Study.

Main citation

Liu M, Khasiyev F, Sariya S, Spagnolo-Allende A, ...&, Gutierrez J. (2023) Chromosome 10q24.32 Variants Associate With Brain Arterial Diameters in Diverse Populations: A Genome-Wide Association Study. J Am Heart Assoc, 12 (23) e030935. doi:10.1161/JAHA.123.030935. PMID 38038215

ABSTRACT

BACKGROUND: Brain arterial diameters (BADs) are novel imaging biomarkers of cerebrovascular disease, cognitive decline, and dementia. Traditional vascular risk factors have been associated with BADs, but whether there may be genetic determinants of BADs is unknown. METHODS AND RESULTS: The authors studied 4150 participants from 6 geographically diverse population-based cohorts (40% European, 14% African, 22% Hispanic, 24% Asian ancestries). Brain arterial diameters for 13 segments were measured and averaged to obtain a global measure of BADs as well as the posterior and anterior circulations. A genome-wide association study revealed 14 variants at one locus associated with global BAD at genome-wide significance (P<5×10-8) (top single-nucleotide polymorphism, rs7921574; β=0.06 [P=1.54×10-8]). This locus mapped to an intron of CNNM2. A trans-ancestry genome-wide association study meta-analysis identified 2 more loci at NT5C2 (rs10748839; P=2.54×10-8) and AS3MT (rs10786721; P=4.97×10-8), associated with global BAD. In addition, 2 single-nucleotide polymorphisms colocalized with expression of CNNM2 (rs7897654; β=0.12 [P=6.17×10-7]) and AL356608.1 (rs10786719; β=-0.17 [P=6.60×10-6]) in brain tissue. For the posterior BAD, 2 variants at one locus mapped to an intron of TCF25 were identified (top single-nucleotide polymorphism, rs35994878; β=0.11 [P=2.94×10-8]). For the anterior BAD, one locus at ADAP1 was identified in trans-ancestry genome-wide association analysis (rs34217249; P=3.11×10-8). CONCLUSIONS: The current study reveals 3 novel risk loci (CNNM2, NT5C2, and AS3MT) associated with BADs. These findings may help elucidate the mechanism by which BADs may influence cerebrovascular health.

Show full abstractShow less

DOI

10.1161/JAHA.123.030935

MAIN ANCESTRY

Cross-ancestry

Liu Y-34128465

Summary statistics

PUBMED_LINK

34128465

TITLE

Genetic architecture of 11 organ traits derived from abdominal MRI using deep learning.

Main citation

Liu Y, Basty N, Whitcher B, Bell JD, ...&, Cule M. (2021) Genetic architecture of 11 organ traits derived from abdominal MRI using deep learning. Elife, 10 () . doi:10.7554/eLife.65554. PMID 34128465

ABSTRACT

Cardiometabolic diseases are an increasing global health burden. While socioeconomic, environmental, behavioural, and genetic risk factors have been identified, a better understanding of the underlying mechanisms is required to develop more effective interventions. Magnetic resonance imaging (MRI) has been used to assess organ health, but biobank-scale studies are still in their infancy. Using over 38,000 abdominal MRI scans in the UK Biobank, we used deep learning to quantify volume, fat, and iron in seven organs and tissues, and demonstrate that imaging-derived phenotypes reflect health status. We show that these traits have a substantial heritable component (8-44%) and identify 93 independent genome-wide significant associations, including four associations with liver traits that have not previously been reported. Our work demonstrates the tractability of deep learning to systematically quantify health parameters from high-throughput MRI across a range of organs and tissues, and use the largest-ever study of its kind to generate new insights into the genetic architecture of these traits.

Show full abstractShow less

DOI

10.7554/eLife.65554

MAIN ANCESTRY

EUR

Macdonald-Dunlop

Summary statistics

PREPRINT_DOI

2021.08.03.21261494

SERVER

medrxiv

Main citation

Macdonald-Dunlop, E. et al. Mapping genetic determinants of 184 circulating proteins in 26,494 individuals to connect proteins and diseases. bioRxiv (2021) doi:10.1101/2021.08.03.21261494.

MAIN ANCESTRY

EUR

MANE PheWeb

Summary statistics

PUBMED_LINK

39389017

DESCRIPTION

MANE PheWeb — Chinese maternal cohort GWAS summary statistics browser.

Show full descriptionShow less

URL

TITLE

Genetic analyses of 104 phenotypes in 20,900 Chinese pregnant women reveal pregnancy-specific discoveries.

Main citation

Xiao H, Li L, Yang M, Zhang X, ...&, Jin X. (2024) Genetic analyses of 104 phenotypes in 20,900 Chinese pregnant women reveal pregnancy-specific discoveries. Cell Genom, 4 (10) 100633. doi:10.1016/j.xgen.2024.100633. PMID 39389017

ABSTRACT

Monitoring biochemical phenotypes during pregnancy is vital for maternal and fetal health, allowing early detection and management of pregnancy-related conditions to ensure safety for both. Here, we conducted a genetic analysis of 104 pregnancy phenotypes in 20,900 Chinese women. The genome-wide association study (GWAS) identified a total of 410 trait-locus associations, with 71.71% reported previously. Among the 116 novel hits for 45 phenotypes, 83 were successfully replicated. Among them, 31 were defined as potentially pregnancy-specific associations, including creatine and HELLPAR and neutrophils and ESR1, with subsequent analysis revealing enrichments in estrogen-related pathways and female reproductive tissues. The partitioning heritability underscored the significant roles of fetal blood, embryoid bodies, and female reproductive organs in pregnancy hematology and birth outcomes. Pathway analysis confirmed the intricate interplay of hormone and immune regulation, metabolism, and cell cycle during pregnancy. This study contributes to the understanding of genetic influences on pregnancy phenotypes and their implications for maternal health.

Show full abstractShow less

DOI

10.1016/j.xgen.2024.100633

MAIN ANCESTRY

EAS

Megastroke

Summary statistics

PUBMED_LINK

29531354

DESCRIPTION

MEGASTROKE multi-ancestry stroke GWAS meta-analysis summary statistics and portal.

Show full descriptionShow less

URL

https://www.megastroke.org/index.html

TITLE

Multiancestry genome-wide association study of 520,000 subjects identifies 32 loci associated with stroke and stroke subtypes.

Main citation

Malik R, Chauhan G, Traylor M, Sargurupremraj M, ...&, Dichgans M. (2018) Multiancestry genome-wide association study of 520,000 subjects identifies 32 loci associated with stroke and stroke subtypes. Nat Genet, 50 (4) 524-537. doi:10.1038/s41588-018-0058-3. PMID 29531354

ABSTRACT

Stroke has multiple etiologies, but the underlying genes and pathways are largely unknown. We conducted a multiancestry genome-wide-association meta-analysis in 521,612 individuals (67,162 cases and 454,450 controls) and discovered 22 new stroke risk loci, bringing the total to 32. We further found shared genetic variation with related vascular traits, including blood pressure, cardiac traits, and venous thromboembolism, at individual loci (n = 18), and using genetic risk scores and linkage-disequilibrium-score regression. Several loci exhibited distinct association and pleiotropy patterns for etiological stroke subtypes. Eleven new susceptibility loci indicate mechanisms not previously implicated in stroke pathophysiology, with prioritization of risk variants and genes accomplished through bioinformatics analyses using extensive functional datasets. Stroke risk loci were significantly enriched in drug targets for antithrombotic therapy.

Show full abstractShow less

DOI

10.1038/s41588-018-0058-3

MAIN ANCESTRY

Multi-ancestry

Melzer D, et al-18464913

Summary statistics

PUBMED_LINK

18464913

TITLE

A genome-wide association study identifies protein quantitative trait loci (pQTLs).

Main citation

Melzer D, Perry JR, Hernandez D, Corsi AM, ...&, Ferrucci L. (2008) A genome-wide association study identifies protein quantitative trait loci (pQTLs). PLoS Genet, 4 (5) e1000072. doi:10.1371/journal.pgen.1000072. PMID 18464913

ABSTRACT

There is considerable evidence that human genetic variation influences gene expression. Genome-wide studies have revealed that mRNA levels are associated with genetic variation in or close to the gene coding for those mRNA transcripts - cis effects, and elsewhere in the genome - trans effects. The role of genetic variation in determining protein levels has not been systematically assessed. Using a genome-wide association approach we show that common genetic variation influences levels of clinically relevant proteins in human serum and plasma. We evaluated the role of 496,032 polymorphisms on levels of 42 proteins measured in 1200 fasting individuals from the population based InCHIANTI study. Proteins included insulin, several interleukins, adipokines, chemokines, and liver function markers that are implicated in many common diseases including metabolic, inflammatory, and infectious conditions. We identified eight Cis effects, including variants in or near the IL6R (p = 1.8x10(-57)), CCL4L1 (p = 3.9x10(-21)), IL18 (p = 6.8x10(-13)), LPA (p = 4.4x10(-10)), GGT1 (p = 1.5x10(-7)), SHBG (p = 3.1x10(-7)), CRP (p = 6.4x10(-6)) and IL1RN (p = 7.3x10(-6)) genes, all associated with their respective protein products with effect sizes ranging from 0.19 to 0.69 standard deviations per allele. Mechanisms implicated include altered rates of cleavage of bound to unbound soluble receptor (IL6R), altered secretion rates of different sized proteins (LPA), variation in gene copy number (CCL4L1) and altered transcription (GGT1). We identified one novel trans effect that was an association between ABO blood group and tumour necrosis factor alpha (TNF-alpha) levels (p = 6.8x10(-40)), but this finding was not present when TNF-alpha was measured using a different assay , or in a second study, suggesting an assay-specific association. Our results show that protein levels share some of the features of the genetics of gene expression. These include the presence of strong genetic effects in cis locations. The identification of protein quantitative trait loci (pQTLs) may be a powerful complementary method of improving our understanding of disease pathways.

Show full abstractShow less

DOI

10.1371/journal.pgen.1000072

MGI 1

Summary statistics

DESCRIPTION

Michigan Genomics Initiative PheWeb freeze 1 — GWAS summary statistics.

Show full descriptionShow less

URL

https://pheweb.org/MGI-freeze1/

MAIN ANCESTRY

EUR

MGI 2

Summary statistics

DESCRIPTION

Michigan Genomics Initiative PheWeb freeze 2 — GWAS summary statistics.

Show full descriptionShow less

URL

https://pheweb.org/MGI-freeze2/

MAIN ANCESTRY

EUR

MGI BioUV

Summary statistics

DESCRIPTION

Michigan Genomics Initiative PheWeb BioUV freeze — GWAS summary statistics.

Show full descriptionShow less

URL

https://pheweb.org/MGI-BioVU/

MAIN ANCESTRY

EUR

Min

Summary statistics

PUBMED_LINK

34493871

DESCRIPTION

Cis and trans meta-analysis results from genome-wide scans of 420,509 DNA methylation sites

Show full descriptionShow less

URL

http://mqtldb.godmc.org.uk/

TITLE

Genomic and phenotypic insights from an atlas of genetic effects on DNA methylation.

Main citation

Min JL, Hemani G, Hannon E, Dekkers KF, ...&, Relton CL. (2021) Genomic and phenotypic insights from an atlas of genetic effects on DNA methylation. Nat Genet, 53 (9) 1311-1321. doi:10.1038/s41588-021-00923-x. PMID 34493871

ABSTRACT

Characterizing genetic influences on DNA methylation (DNAm) provides an opportunity to understand mechanisms underpinning gene regulation and disease. In the present study, we describe results of DNAm quantitative trait locus (mQTL) analyses on 32,851 participants, identifying genetic variants associated with DNAm at 420,509 DNAm sites in blood. We present a database of >270,000 independent mQTLs, of which 8.5% comprise long-range (trans) associations. Identified mQTL associations explain 15-17% of the additive genetic variance of DNAm. We show that the genetic architecture of DNAm levels is highly polygenic. Using shared genetic control between distal DNAm sites, we constructed networks, identifying 405 discrete genomic communities enriched for genomic annotations and complex traits. Shared genetic variants are associated with both DNAm levels and complex diseases, but only in a minority of cases do these associations reflect causal relationships from DNAm to trait or vice versa, indicating a more complex genotype-phenotype map than previously anticipated.

Show full abstractShow less

DOI

10.1038/s41588-021-00923-x

MVP-Finngen-UKBB meta-analysis

Summary statistics

PUBMED_LINK

39974076

DESCRIPTION

Cross-biobank GWAS meta-analysis across MVP, FinnGen, and UK Biobank (phenome-wide association resource).

Show full descriptionShow less

URL

https://mvp-ukbb.finngen.fi/

TITLE

Prevalence and disease risks for male and female sex chromosome trisomies: a registry-based phenome-wide association study in 1.5 million participants of MVP, FinnGen, and UK Biobank.

Main citation

Davis SM, Liu A, Teerlink CC, Lapato DM, ...&, Hauger RL. (2025) Prevalence and disease risks for male and female sex chromosome trisomies: a registry-based phenome-wide association study in 1.5 million participants of MVP, FinnGen, and UK Biobank. medRxiv, () . doi:10.1101/2025.01.31.25321488. PMID 39974076

ABSTRACT

Sex chromosome trisomies (SCT) are the most common whole chromosome aneuploidy in humans. Yet, our understanding of the prevalence and associated health outcomes is largely driven by observational studies of clinically diagnosed cases, resulting in a disproportionate focus on 47,XXY and associated hypogonadism. We analyzed microarray intensity data of sex chromosomes for 1.5 million individuals enrolled in three large cohorts-Million Veteran Program, FinnGen, and UK Biobank-to identify individuals with 47,XXY, 47,XYY, and 47,XXX. We examined disease conditions associated with SCTs by performing phenome-wide association studies (PheWAS) using electronic health records (EHR) data for each cohort, followed by meta-analysis across cohorts. Association results are presented for each SCT and also stratified by presence or absence of a documented clinical diagnosis for 47,XXY. We identified 2,769 individuals with (47,XXY: 1,319; 47,XYY: 1,108; 47,XXX: 342), most of whom had no documented clinical diagnosis (47,XXY: 73.8%; 47,XYY: 98.6%; 47,XXX: 93.6%). The identified phenotypic associations with SCT spanned all PheWAS disease categories except neoplasms. Many associations are shared among three SCT subtypes, particularly for vascular diseases (e.g., chronic venous insufficiency (OR [95% CI] for 47,XXY 4.7 [3.9,5.8]; 47,XYY 5.6 [4.5,7.0]; 4 7,XXX 4.6 [2.7,7.6], venous thromboembolism (47,XXY 4.6 [3.7-5.6]; 47,XYY 4.1 [3.3-5.0]; 47,XXX 8.1 [4.2-15.4]), and glaucoma (47,XXY 2.5 [2.1-2.9]; 47,XYY 2.4 [2.0-2.8]; 47,XXX 2.3 [1.4-3.5]). A third sex chromosome confers an increased risk for systemic comorbidities, even if the SCT is not documented. SCT phenotypes largely overlap, suggesting one or more X/Y homolog genes may underlie pathophysiology and comorbidities across SCTs.

Show full abstractShow less

DOI

10.1101/2025.01.31.25321488

MAIN ANCESTRY

EUR

NBDC (hum0197)

Summary statistics

DESCRIPTION

NBDC human database entry hum0197 — metadata and access route for Japanese GWAS / summary statistics.

Show full descriptionShow less

URL

https://humandbs.dbcls.jp/en/hum0197

MAIN ANCESTRY

EAS

Ning C-38036550

Summary statistics

PUBMED_LINK

38036550

TITLE

Genome-wide association analysis of left ventricular imaging-derived phenotypes identifies 72 risk loci and yields genetic insights into hypertrophic cardiomyopathy.

Main citation

Ning C, Fan L, Jin M, Wang W, ...&, Miao X. (2023) Genome-wide association analysis of left ventricular imaging-derived phenotypes identifies 72 risk loci and yields genetic insights into hypertrophic cardiomyopathy. Nat Commun, 14 (1) 7900. doi:10.1038/s41467-023-43771-5. PMID 38036550

ABSTRACT

Left ventricular regional wall thickness (LVRWT) is an independent predictor of morbidity and mortality in cardiovascular diseases (CVDs). To identify specific genetic influences on individual LVRWT, we established a novel deep learning algorithm to calculate 12 LVRWTs accurately in 42,194 individuals from the UK Biobank with cardiac magnetic resonance (CMR) imaging. Genome-wide association studies of CMR-derived 12 LVRWTs identified 72 significant genetic loci associated with at least one LVRWT phenotype (P < 5 × 10-8), which were revealed to actively participate in heart development and contraction pathways. Significant causal relationships were observed between the LVRWT traits and hypertrophic cardiomyopathy (HCM) using genetic correlation and Mendelian randomization analyses (P < 0.01). The polygenic risk score of inferoseptal LVRWT at end systole exhibited a notable association with incident HCM, facilitating the identification of high-risk individuals. The findings yield insights into the genetic determinants of LVRWT phenotypes and shed light on the biological basis for HCM etiology.

Show full abstractShow less

DOI

10.1038/s41467-023-43771-5

MAIN ANCESTRY

EUR

NSPT

Summary statistics

PUBMED_LINK

38641644

DESCRIPTION

Methylation quantitative trait loci (mQTLs) CpGs in the whole blood of 3,523 Han Chinese from the National Survey of Physical Traits (NSPT) cohort

Show full descriptionShow less

URL

https://www.biosino.org/sinomqtl/

TITLE

Analysis of blood methylation quantitative trait loci in East Asians reveals ancestry-specific impacts on complex traits.

Main citation

Peng Q, Liu X, Li W, Jing H, ...&, Wang S. (2024) Analysis of blood methylation quantitative trait loci in East Asians reveals ancestry-specific impacts on complex traits. Nat Genet, 56 (5) 846-860. doi:10.1038/s41588-023-01494-9. PMID 38641644

ABSTRACT

Methylation quantitative trait loci (mQTLs) are essential for understanding the role of DNA methylation changes in genetic predisposition, yet they have not been fully characterized in East Asians (EAs). Here we identified mQTLs in whole blood from 3,523 Chinese individuals and replicated them in additional 1,858 Chinese individuals from two cohorts. Over 9% of mQTLs displayed specificity to EAs, facilitating the fine-mapping of EA-specific genetic associations, as shown for variants associated with height. Trans-mQTL hotspots revealed biological pathways contributing to EA-specific genetic associations, including an ERG-mediated 233 trans-mCpG network, implicated in hematopoietic cell differentiation, which likely reflects binding efficiency modulation of the ERG protein complex. More than 90% of mQTLs were shared between different blood cell lineages, with a smaller fraction of lineage-specific mQTLs displaying preferential hypomethylation in the respective lineages. Our study provides new insights into the mQTL landscape across genetic ancestries and their downstream effects on cellular processes and diseases/traits.

Show full abstractShow less

DOI

10.1038/s41588-023-01494-9

OmicsPred portal

Summary statistics

PUBMED_LINK

36991119

URL

https://www.omicspred.org/

TITLE

An atlas of genetic scores to predict multi-omic traits.

Main citation

Xu Y, Ritchie SC, Liang Y, Timmers PRHJ, ...&, Inouye M. (2023) An atlas of genetic scores to predict multi-omic traits. Nature, 616 (7955) 123-131. doi:10.1038/s41586-023-05844-9. PMID 36991119

ABSTRACT

The use of omic modalities to dissect the molecular underpinnings of common diseases and traits is becoming increasingly common. But multi-omic traits can be genetically predicted, which enables highly cost-effective and powerful analyses for studies that do not have multi-omics1. Here we examine a large cohort (the INTERVAL study2; n = 50,000 participants) with extensive multi-omic data for plasma proteomics (SomaScan, n = 3,175; Olink, n = 4,822), plasma metabolomics (Metabolon HD4, n = 8,153), serum metabolomics (Nightingale, n = 37,359) and whole-blood Illumina RNA sequencing (n = 4,136), and use machine learning to train genetic scores for 17,227 molecular traits, including 10,521 that reach Bonferroni-adjusted significance. We evaluate the performance of genetic scores through external validation across cohorts of individuals of European, Asian and African American ancestries. In addition, we show the utility of these multi-omic genetic scores by quantifying the genetic control of biological pathways and by generating a synthetic multi-omic dataset of the UK Biobank3 to identify disease associations using a phenome-wide scan. We highlight a series of biological insights with regard to genetic mechanisms in metabolism and canonical pathway associations with disease; for example, JAK-STAT signalling and coronary atherosclerosis. Finally, we develop a portal ( https://www.omicspred.org/ ) to facilitate public access to all genetic scores and validation results, as well as to serve as a platform for future extensions and enhancements of multi-omic genetic scores.

Show full abstractShow less

DOI

10.1038/s41586-023-05844-9

OneK1k

Summary statistics

PUBMED_LINK

35389779

URL

https://onek1k.org/

TITLE

Single-cell eQTL mapping identifies cell type-specific genetic control of autoimmune disease.

Main citation

Yazar S, Alquicira-Hernandez J, Wing K, Senabouth A, ...&, Powell JE. (2022) Single-cell eQTL mapping identifies cell type-specific genetic control of autoimmune disease. Science, 376 (6589) eabf3041. doi:10.1126/science.abf3041. PMID 35389779

ABSTRACT

The human immune system displays substantial variation between individuals, leading to differences in susceptibility to autoimmune disease. We present single-cell RNA sequencing (scRNA-seq) data from 1,267,758 peripheral blood mononuclear cells from 982 healthy human subjects. For 14 cell types, we identified 26,597 independent cis-expression quantitative trait loci (eQTLs) and 990 trans-eQTLs, with most showing cell type-specific effects on gene expression. We subsequently show how eQTLs have dynamic allelic effects in B cells that are transitioning from naïve to memory states and demonstrate how commonly segregating alleles lead to interindividual variation in immune function. Finally, using a Mendelian randomization approach, we identify the causal route by which 305 risk loci contribute to autoimmune disease at the cellular level. This work brings together genetic epidemiology with scRNA-seq to uncover drivers of interindividual variation in the immune system.

Show full abstractShow less

DOI

10.1126/science.abf3041

OpenGWAS

Summary statistics

DESCRIPTION

MRC IEU OpenGWAS database — harmonized GWAS summary statistics and API for MR and related analyses.

Show full descriptionShow less

URL

https://gwas.mrcieu.ac.uk/

PREPRINT_DOI

10.1101/2020.08.10.244293

SERVER

biorxiv

Main citation

Elsworth, B., Lyon, M., Alexander, T., Liu, Y., Matthews, P., Hallett, J., ... & Hemani, G. (2020). The MRC IEU OpenGWAS data infrastructure. BioRxiv, 2020-08.

MAIN ANCESTRY

Multi-ancestry

Ota

Summary statistics

PUBMED_LINK

33930287

TITLE

Dynamic landscape of immune cell-specific gene regulation in immune-mediated diseases.

Main citation

Ota M, Nagafuchi Y, Hatano H, Ishigaki K, ...&, Fujio K. (2021) Dynamic landscape of immune cell-specific gene regulation in immune-mediated diseases. Cell, 184 (11) 3006-3021.e17. doi:10.1016/j.cell.2021.03.056. PMID 33930287

ABSTRACT

Genetic studies have revealed many variant loci that are associated with immune-mediated diseases. To elucidate the disease pathogenesis, it is essential to understand the function of these variants, especially under disease-associated conditions. Here, we performed a large-scale immune cell gene-expression analysis, together with whole-genome sequence analysis. Our dataset consists of 28 distinct immune cell subsets from 337 patients diagnosed with 10 categories of immune-mediated diseases and 79 healthy volunteers. Our dataset captured distinctive gene-expression profiles across immune cell types and diseases. Expression quantitative trait loci (eQTL) analysis revealed dynamic variations of eQTL effects in the context of immunological conditions, as well as cell types. These cell-type-specific and context-dependent eQTLs showed significant enrichment in immune disease-associated genetic variants, and they implicated the disease-relevant cell types, genes, and environment. This atlas deepens our understanding of the immunogenetic functions of disease-associated variants under in vivo disease conditions.

Show full abstractShow less

DOI

10.1016/j.cell.2021.03.056

Pan-UKB

Summary statistics

PUBMED_LINK

40968291

DESCRIPTION

Pan-UK Biobank — multi-ancestry GWAS in UK Biobank across thousands of phenotypes.

Show full descriptionShow less

URL

https://pan.ukbb.broadinstitute.org/

TITLE

Pan-UK Biobank genome-wide association analyses enhance discovery and resolution of ancestry-enriched effects.

Main citation

Karczewski KJ, Gupta R, Kanai M, Lu W, ...&, Martin AR. (2025) Pan-UK Biobank genome-wide association analyses enhance discovery and resolution of ancestry-enriched effects. Nat Genet, 57 (10) 2408-2417. doi:10.1038/s41588-025-02335-7. PMID 40968291

ABSTRACT

Large biobanks, such as the UK Biobank (UKB), enable massive phenome by genome-wide association studies that elucidate genetic etiology of complex traits. However, people from diverse genetic ancestry groups are often excluded from association analyses due to concerns about population structure introducing false positive associations. Here we generate mixed model associations and meta-analyses across genetic ancestry groups, inclusive of a larger fraction of the UK Biobank than previous efforts, to produce freely available summary statistics for 7,266 traits. We build a quality control and analysis framework informed by genetic architecture. Overall, we identify 14,676 significant loci (P < 5 × 10-8) in the meta-analysis that were not found in the EUR genetic ancestry group alone, including new associations, for example between CAMK2D and triglycerides. We also highlight associations from ancestry-enriched variation, including a known pleiotropic missense variant in G6PD associated with several biomarker traits. We release these results publicly alongside frequently asked questions that describe caveats for interpretation of results, enhancing available resources for interpretation of risk variants across diverse populations.

Show full abstractShow less

DOI

10.1038/s41588-025-02335-7

RELATED_BIOBANK

https://www.med.unc.edu/pgc/download-results/

MAIN ANCESTRY

EUR

Parisinos C-32247823

Summary statistics

PUBMED_LINK

32247823

TITLE

Genome-wide and Mendelian randomisation studies of liver MRI yield insights into the pathogenesis of steatohepatitis.

Main citation

Parisinos CA, Wilman HR, Thomas EL, Kelly M, ...&, Yaghootkar H. (2020) Genome-wide and Mendelian randomisation studies of liver MRI yield insights into the pathogenesis of steatohepatitis. J Hepatol, 73 (2) 241-251. doi:10.1016/j.jhep.2020.03.032. PMID 32247823

ABSTRACT

BACKGROUND & AIMS: MRI-based corrected T1 (cT1) is a non-invasive method to grade the severity of steatohepatitis and liver fibrosis. We aimed to identify genetic variants influencing liver cT1 and use genetics to understand mechanisms underlying liver fibroinflammatory disease and its link with other metabolic traits and diseases. METHODS: First, we performed a genome-wide association study (GWAS) in 14,440 Europeans, with liver cT1 measures, from the UK Biobank. Second, we explored the effects of the cT1 variants on liver blood tests, and a range of metabolic traits and diseases. Third, we used Mendelian randomisation to test the causal effects of 24 predominantly metabolic traits on liver cT1 measures. RESULTS: We identified 6 independent genetic variants associated with liver cT1 that reached the GWAS significance threshold (p <5×10-8). Four of the variants (rs759359281 in SLC30A10, rs13107325 in SLC39A8, rs58542926 in TM6SF2, rs738409 in PNPLA3) were also associated with elevated aminotransferases and had variable effects on liver fat and other metabolic traits. Insulin resistance, type 2 diabetes, non-alcoholic fatty liver and body mass index were causally associated with elevated cT1, whilst favourable adiposity (instrumented by variants associated with higher adiposity but lower risk of cardiometabolic disease and lower liver fat) was found to be protective. CONCLUSION: The association between 2 metal ion transporters and cT1 indicates an important new mechanism in steatohepatitis. Future studies are needed to determine whether interventions targeting the identified transporters might prevent liver disease in at-risk individuals. LAY SUMMARY: We estimated levels of liver inflammation and scarring based on magnetic resonance imaging of 14,440 UK Biobank participants. We performed a genetic study and identified variations in 6 genes associated with levels of liver inflammation and scarring. Participants with variations in 4 of these genes also had higher levels of markers of liver cell injury in blood samples, further validating their role in liver health. Two identified genes are involved in the transport of metal ions in our body. Further investigation of these variations may lead to better detection, assessment, and/or treatment of liver inflammation and scarring.

Show full abstractShow less

DOI

10.1016/j.jhep.2020.03.032

MAIN ANCESTRY

EUR

Persyn E-32358547

Summary statistics

PUBMED_LINK

32358547

TITLE

Genome-wide association study of MRI markers of cerebral small vessel disease in 42,310 participants.

Main citation

Persyn E, Hanscombe KB, Howson JMM, Lewis CM, ...&, Markus HS. (2020) Genome-wide association study of MRI markers of cerebral small vessel disease in 42,310 participants. Nat Commun, 11 (1) 2175. doi:10.1038/s41467-020-15932-3. PMID 32358547

ABSTRACT

Cerebral small vessel disease is a major cause of stroke and dementia, but its genetic basis is incompletely understood. We perform a genetic study of three MRI markers of the disease in UK Biobank imaging data and other sources: white matter hyperintensities (N = 42,310), fractional anisotropy (N = 17,663) and mean diffusivity (N = 17,467). Our aim is to better understand the disease pathophysiology. Across the three traits, we identify 31 loci, of which 21 were previously unreported. We perform a transcriptome-wide association study to identify associations with gene expression in relevant tissues, identifying 66 associated genes across the three traits. This genetic study provides insights into the understanding of the biological mechanisms underlying small vessel disease.

Show full abstractShow less

DOI

10.1038/s41467-020-15932-3

MAIN ANCESTRY

EUR

PGC (Psychiatric Genomics Consortium)

Summary statistics

PUBMED_LINK

25056061

DESCRIPTION

Psychiatric Genomics Consortium meta-analysis summary statistics for psychiatric disorders.

Show full descriptionShow less

URL

TITLE

Biological insights from 108 schizophrenia-associated genetic loci.

Main citation

Schizophrenia Working Group of the Psychiatric Genomics Consortium. (2014) Biological insights from 108 schizophrenia-associated genetic loci. Nature, 511 (7510) 421-7. doi:10.1038/nature13595. PMID 25056061

ABSTRACT

Schizophrenia is a highly heritable disorder. Genetic risk is conferred by a large number of alleles, including common alleles of small effect that might be detected by genome-wide association studies. Here we report a multi-stage schizophrenia genome-wide association study of up to 36,989 cases and 113,075 controls. We identify 128 independent associations spanning 108 conservatively defined loci that meet genome-wide significance, 83 of which have not been previously reported. Associations were enriched among genes expressed in brain, providing biological plausibility for the findings. Many findings have the potential to provide entirely new insights into aetiology, but associations at DRD2 and several genes involved in glutamatergic neurotransmission highlight molecules of known and potential therapeutic relevance to schizophrenia, and are consistent with leading pathophysiological hypotheses. Independent of genes expressed in brain, associations were enriched among genes expressed in tissues that have important roles in immunity, providing support for the speculated link between the immune system and schizophrenia.

Show full abstractShow less

DOI

10.1038/nature13595

MAIN ANCESTRY

Multi-ancestry

pGWAS server

Summary statistics

PUBMED_LINK

28240269

DESCRIPTION

In our study, we performed a genome-wide association study with protein levels (pGWAS). Using a highly multiplexed, aptamer-based, affinity proteomics platform (SOMAscan™), we quantified levels of 1,124 proteins in blood plasma samples from 1,000 German individuals (KORA cohort) and 338 Arab or Asian individuals (QMDiab cohort). We identified 539 independent, genome-wide significant SNP-to-protein associations, which can be investigated using this webserver.

Show full descriptionShow less

URL

https://metabolomics.helmholtz-muenchen.de/pgwas/

TITLE

Connecting genetic risk to disease end points through the human blood plasma proteome.

Main citation

Suhre K, Arnold M, Bhagwat AM, Cotton RJ, ...&, Graumann J. (2017) Connecting genetic risk to disease end points through the human blood plasma proteome. Nat Commun, 8 () 14357. doi:10.1038/ncomms14357. PMID 28240269

ABSTRACT

Genome-wide association studies (GWAS) with intermediate phenotypes, like changes in metabolite and protein levels, provide functional evidence to map disease associations and translate them into clinical applications. However, although hundreds of genetic variants have been associated with complex disorders, the underlying molecular pathways often remain elusive. Associations with intermediate traits are key in establishing functional links between GWAS-identified risk-variants and disease end points. Here we describe a GWAS using a highly multiplexed aptamer-based affinity proteomics platform. We quantify 539 associations between protein levels and gene variants (pQTLs) in a German cohort and replicate over half of them in an Arab and Asian cohort. Fifty-five of the replicated pQTLs are located in trans. Our associations overlap with 57 genetic risk loci for 42 unique disease end points. We integrate this information into a genome-proteome network and provide an interactive web-tool for interrogations. Our results provide a basis for novel approaches to pharmaceutical and diagnostic applications.

Show full abstractShow less

DOI

10.1038/ncomms14357

Pietzner M, et al-34648354

Summary statistics

PUBMED_LINK

34648354

TITLE

Mapping the proteo-genomic convergence of human diseases.

Main citation

Pietzner M, Wheeler E, Carrasco-Zanini J, Cortes A, ...&, Langenberg C. (2021) Mapping the proteo-genomic convergence of human diseases. Science, 374 (6569) eabj1541. doi:10.1126/science.abj1541. PMID 34648354

ABSTRACT

Characterization of the genetic regulation of proteins is essential for understanding disease etiology and developing therapies. We identified 10,674 genetic associations for 3892 plasma proteins to create a cis-anchored gene-protein-disease map of 1859 connections that highlights strong cross-disease biological convergence. This proteo-genomic map provides a framework to connect etiologically related diseases, to provide biological context for new or emerging disorders, and to integrate different biological domains to establish mechanisms for known gene-disease links. Our results identify proteo-genomic connections within and between diseases and establish the value of cis-protein variants for annotation of likely causal disease genes at loci identified in genome-wide association studies, thereby addressing a major barrier to experimental validation and clinical translation of genetic discoveries.

Show full abstractShow less

DOI

10.1126/science.abj1541

Pirruccello JP-32382064

Summary statistics

PUBMED_LINK

32382064

TITLE

Analysis of cardiac magnetic resonance imaging in 36,000 individuals yields genetic insights into dilated cardiomyopathy.

Main citation

Pirruccello JP, Bick A, Wang M, Chaffin M, ...&, Aragam KG. (2020) Analysis of cardiac magnetic resonance imaging in 36,000 individuals yields genetic insights into dilated cardiomyopathy. Nat Commun, 11 (1) 2254. doi:10.1038/s41467-020-15823-7. PMID 32382064

ABSTRACT

Dilated cardiomyopathy (DCM) is an important cause of heart failure and the leading indication for heart transplantation. Many rare genetic variants have been associated with DCM, but common variant studies of the disease have yielded few associated loci. As structural changes in the heart are a defining feature of DCM, we report a genome-wide association study of cardiac magnetic resonance imaging (MRI)-derived left ventricular measurements in 36,041 UK Biobank participants, with replication in 2184 participants from the Multi-Ethnic Study of Atherosclerosis. We identify 45 previously unreported loci associated with cardiac structure and function, many near well-established genes for Mendelian cardiomyopathies. A polygenic score of MRI-derived left ventricular end systolic volume strongly associates with incident DCM in the general population. Even among carriers of TTN truncating mutations, this polygenic score influences the size and function of the human heart. These results further implicate common genetic polymorphisms in the pathogenesis of DCM.

Show full abstractShow less

DOI

10.1038/s41467-020-15823-7

MAIN ANCESTRY

EUR

PLATLAS

Summary statistics

PUBMED_LINK

40313291

FULL NAME

PLeiotropic ATLAS

DESCRIPTION

PLATLAS — pleiotropy atlas with GWAS summary statistics across >1000 phenotypes (multi-biobank).

Show full descriptionShow less

URL

https://platlas.cels.anl.gov/

TITLE

Genome-Wide Assessment of Pleiotropy Across >1000 Traits from Global Biobanks.

Main citation

Levin MG, Koyama S, Woerner J, Zhang DY, ...&, Natarajan P. (2025) Genome-Wide Assessment of Pleiotropy Across >1000 Traits from Global Biobanks. medRxiv, () . doi:10.1101/2025.04.18.25326074. PMID 40313291

ABSTRACT

Large-scale genetic association studies have identified thousands of trait-associated risk loci, establishing the polygenic basis for common complex traits and diseases. Although prior studies suggest that many trait-associated loci are pleiotropic, the extent to which this pleiotropy reflects shared causal variants or confounding by linkage disequilibrium remains poorly characterized. To define a set of candidate loci with potentially pleiotropic associations, we performed genome-wide association study (GWAS) meta-analyses of up to 1,167 clinically relevant traits and diseases across 1,789,365 diverse individuals genetically similar to Admixed American (AMR, NMax = 60,756), African (AFR, NMax = 128,361), East Asian (EAS, NMax = 307,465), European (EUR, NMax = 1,283,907), and South Asian (SAS, NMax = 8,876) reference populations from the VA Million Veteran Program (MVP), UK Biobank (UKB), FinnGen, Biobank Japan (BBJ), Tohoku Medical Megabank (ToMMo), and Korean Genome and Epidemiology Study (KoGES). We identified 27,193 genome-wide significant locus-trait pairs (1MB region with PGWAMA < 5 × 10-8) in within-population analysis and 29,139 in multi-population analysis (PMR-MEGA < 5 × 10-8). Among these, 11.5% (n = 3,149) of locus-trait pairs in population-wise and 6.4% (n = 1,875) in multi-population analyses did not reach genome-wide significance in previously published GWAS. In aggregate, the genome-wide significant loci fell within 2,624 non-overlapping autosomal genomic windows on average ~600kb in size. Each locus contained genome-wide significant signals for a median of 6 traits (IQR 2 to 18), including 2,110 (80%) pleiotropic loci associated with >1 trait. Multi-trait colocalization identified 1,902 (72%) loci with high-confidence (posterior probability > 0.9) evidence of a shared causal variant across two or more traits. Variants in pleiotropic loci were significantly enriched for a broad spectrum of functional annotations compared to non-pleiotropic counterparts. Polygenic scores (PGS) developed from these data generally improved prediction compared to existing PGS, and were broadly associated with both primary and pleiotropic phenotypes. These results provide a contemporary map of genetic pleiotropy across the spectrum of human traits/diseases and diverse genetic backgrounds.

Show full abstractShow less

DOI

10.1101/2025.04.18.25326074

MAIN ANCESTRY

ALL

Png G, et al-34857772

Summary statistics

PUBMED_LINK

34857772

TITLE

Mapping the serum proteome to neurological diseases using whole genome sequencing.

Main citation

Png G, Barysenka A, Repetto L, Navarro P, ...&, Zeggini E. (2021) Mapping the serum proteome to neurological diseases using whole genome sequencing. Nat Commun, 12 (1) 7042. doi:10.1038/s41467-021-27387-1. PMID 34857772

ABSTRACT

Despite the increasing global burden of neurological disorders, there is a lack of effective diagnostic and therapeutic biomarkers. Proteins are often dysregulated in disease and have a strong genetic component. Here, we carry out a protein quantitative trait locus analysis of 184 neurologically-relevant proteins, using whole genome sequencing data from two isolated population-based cohorts (N = 2893). In doing so, we elucidate the genetic landscape of the circulating proteome and its connection to neurological disorders. We detect 214 independently-associated variants for 107 proteins, the majority of which (76%) are cis-acting, including 114 variants that have not been previously identified. Using two-sample Mendelian randomisation, we identify causal associations between serum CD33 and Alzheimer's disease, GPNMB and Parkinson's disease, and MSR1 and schizophrenia, describing their clinical potential and highlighting drug repurposing opportunities.

Show full abstractShow less

DOI

10.1038/s41467-021-27387-1

Proteome PheWAS browser

Summary statistics

PUBMED_LINK

32895551

TITLE

Phenome-wide Mendelian randomization mapping the influence of the plasma proteome on complex diseases.

Main citation

Zheng J, Haberland V, Baird D, Walker V, ...&, Gaunt TR. (2020) Phenome-wide Mendelian randomization mapping the influence of the plasma proteome on complex diseases. Nat Genet, 52 (10) 1122-1131. doi:10.1038/s41588-020-0682-6. PMID 32895551

ABSTRACT

The human proteome is a major source of therapeutic targets. Recent genetic association analyses of the plasma proteome enable systematic evaluation of the causal consequences of variation in plasma protein levels. Here we estimated the effects of 1,002 proteins on 225 phenotypes using two-sample Mendelian randomization (MR) and colocalization. Of 413 associations supported by evidence from MR, 130 (31.5%) were not supported by results of colocalization analyses, suggesting that genetic confounding due to linkage disequilibrium is widespread in naïve phenome-wide association studies of proteins. Combining MR and colocalization evidence in cis-only analyses, we identified 111 putatively causal effects between 65 proteins and 52 disease-related phenotypes ( https://www.epigraphdb.org/pqtl/ ). Evaluation of data from historic drug development programs showed that target-indication pairs with MR and colocalization support were more likely to be approved, evidencing the value of this approach in identifying and prioritizing potential therapeutic targets.

Show full abstractShow less

DOI

10.1038/s41588-020-0682-6

PsychENCODE

Summary statistics

PUBMED_LINK

26605881

DESCRIPTION

Established in 2015 by the National Institute of Mental Health, the PsychENCODE Consortium brings together multidisciplinary teams to study the molecular basis of neuropsychiatric diseases. Genetic influences on brain function are remarkably complex, characterized by a highly polygenic risk architecture and often located in the non-coding regions of the genome. PsychENCODE members generate large-scale gene expression and regulatory data from human postmortem brain tissues in major psychiatric disorders across multiple developmental stages. The goal is to map and functionally validate disease‐associated genetic variants, regulatory elements, genes and cell types. Phase II of the project focused on single-cell and spatial data, culminating in a collection of 14 papers published on May 24, 2024 (9 in Science, 3 in Science Advances, 1 in Scientific Reports, and 1 in Molecular Psychiatry). Phase I of the project was published in 2018 in a collection of 11 papers in Science, Science Translational Medicine, and Science Advances.

Show full descriptionShow less

URL

TITLE

The PsychENCODE project.

Main citation

PsychENCODE Consortium, Akbarian S, Liu C, Knowles JA, ...&, Sestan N. (2015) The PsychENCODE project. Nat Neurosci, 18 (12) 1707-12. doi:10.1038/nn.4156. PMID 26605881

ABSTRACT

Recent research on disparate psychiatric disorders has implicated rare variants in genes involved in global gene regulation and chromatin modification, as well as many common variants located primarily in regulatory regions of the genome. Understanding precisely how these variants contribute to disease will require a deeper appreciation for the mechanisms of gene regulation in the developing and adult human brain. The PsychENCODE project aims to produce a public resource of multidimensional genomic data using tissue- and cell type–specific samples from approximately 1,000 phenotypically well-characterized, high-quality healthy and disease-affected human post-mortem brains, as well as functionally characterize disease-associated regulatory elements and variants in model systems. We are beginning with a focus on autism spectrum disorder, bipolar disorder and schizophrenia, and expect that this knowledge will apply to a wide variety of psychiatric disorders. This paper outlines the motivation and design of PsychENCODE.

Show full abstractShow less

DOI

10.1038/nn.4156

PsychENCODE Phase I

Summary statistics

PUBMED_LINK

30545857

DESCRIPTION

Phase I of the project was published on Dec 14, 2018 in a collection of 11 papers in Science, Science Translational Medicine, and Science Advances.

Show full descriptionShow less

URL

TITLE

Comprehensive functional genomic resource and integrative model for the human brain.

Main citation

Wang D, Liu S, Warrell J, Won H, ...&, Gerstein MB. (2018) Comprehensive functional genomic resource and integrative model for the human brain. Science, 362 (6420) . doi:10.1126/science.aat8464. PMID 30545857

ABSTRACT

Despite progress in defining genetic risk for psychiatric disorders, their molecular mechanisms remain elusive. Addressing this, the PsychENCODE Consortium has generated a comprehensive online resource for the adult brain across 1866 individuals. The PsychENCODE resource contains ~79,000 brain-active enhancers, sets of Hi-C linkages, and topologically associating domains; single-cell expression profiles for many cell types; expression quantitative-trait loci (QTLs); and further QTLs associated with chromatin, splicing, and cell-type proportions. Integration shows that varying cell-type proportions largely account for the cross-population variation in expression (with >88% reconstruction accuracy). It also allows building of a gene regulatory network, linking genome-wide association study variants to genes (e.g., 321 for schizophrenia). We embed this network into an interpretable deep-learning model, which improves disease prediction by ~6-fold versus polygenic risk scores and identifies key genes and pathways in psychiatric disorders.

Show full abstractShow less

DOI

10.1126/science.aat8464

PsychENCODE Phase II

Summary statistics

PUBMED_LINK

38781368

DESCRIPTION

A large-scale, cross-population resource of gene, isoform, and splicing regulation in the developing human brain

Show full descriptionShow less

URL

TITLE

Cross-ancestry atlas of gene, isoform, and splicing regulation in the developing human brain.

Main citation

Wen C, Margolis M, Dai R, Zhang P, ...&, PsychENCODE Consortium. (2024) Cross-ancestry atlas of gene, isoform, and splicing regulation in the developing human brain. Science, 384 (6698) eadh0829. doi:10.1126/science.adh0829. PMID 38781368

ABSTRACT

Neuropsychiatric genome-wide association studies (GWASs), including those for autism spectrum disorder and schizophrenia, show strong enrichment for regulatory elements in the developing brain. However, prioritizing risk genes and mechanisms is challenging without a unified regulatory atlas. Across 672 diverse developing human brains, we identified 15,752 genes harboring gene, isoform, and/or splicing quantitative trait loci, mapping 3739 to cellular contexts. Gene expression heritability drops during development, likely reflecting both increasing cellular heterogeneity and the intrinsic properties of neuronal maturation. Isoform-level regulation, particularly in the second trimester, mediated the largest proportion of GWAS heritability. Through colocalization, we prioritized mechanisms for about 60% of GWAS loci across five disorders, exceeding adult brain findings. Finally, we contextualized results within gene and isoform coexpression networks, revealing the comprehensive landscape of transcriptome regulation in development and disease.

Show full abstractShow less

DOI

10.1126/science.adh0829

PsychENCODE Phase II

Summary statistics

PUBMED_LINK

38781369

DESCRIPTION

Phase II of the project focused on single-cell and spatial data, culminating in a collection of 14 papers published on May 24, 2024 (9 in Science, 3 in Science Advances, 1 in Scientific Reports, and 1 in Molecular Psychiatry).

Show full descriptionShow less

URL

https://yanglab.westlake.edu.cn/data/brainmeta/cis_sqtl/

TITLE

Single-cell genomics and regulatory networks for 388 human brains.

Main citation

Emani PS, Liu JJ, Clarke D, Jensen M, ...&, PsychENCODE Consortium. (2024) Single-cell genomics and regulatory networks for 388 human brains. Science, 384 (6698) eadi5199. doi:10.1126/science.adi5199. PMID 38781369

ABSTRACT

Single-cell genomics is a powerful tool for studying heterogeneous tissues such as the brain. Yet little is understood about how genetic variants influence cell-level gene expression. Addressing this, we uniformly processed single-nuclei, multiomics datasets into a resource comprising >2.8 million nuclei from the prefrontal cortex across 388 individuals. For 28 cell types, we assessed population-level variation in expression and chromatin across gene families and drug targets. We identified >550,000 cell type-specific regulatory elements and >1.4 million single-cell expression quantitative trait loci, which we used to build cell-type regulatory and cell-to-cell communication networks. These networks manifest cellular changes in aging and neuropsychiatric disorders. We further constructed an integrative model accurately imputing single-cell expression and simulating perturbations; the model prioritized ~250 disease-risk genes and drug targets with associated cell types.

Show full abstractShow less

DOI

10.1126/science.adi5199

Qi

Summary statistics

PUBMED_LINK

35982161

DESCRIPTION

THISTLE; 2,865 brain cortex samples from 2,443 unrelated individuals of European ancestry with genome-wide SNP data

Show full descriptionShow less

URL

TITLE

Genetic control of RNA splicing and its distinct role in complex trait variation.

Main citation

Qi T, Wu Y, Fang H, Zhang F, ...&, Yang J. (2022) Genetic control of RNA splicing and its distinct role in complex trait variation. Nat Genet, 54 (9) 1355-1363. doi:10.1038/s41588-022-01154-4. PMID 35982161

ABSTRACT

Most genetic variants identified from genome-wide association studies (GWAS) in humans are noncoding, indicating their role in gene regulation. Previous studies have shown considerable links of GWAS signals to expression quantitative trait loci (eQTLs) but the links to other genetic regulatory mechanisms, such as splicing QTLs (sQTLs), are underexplored. Here, we introduce an sQTL mapping method, testing for heterogeneity between isoform-eQTL effects (THISTLE), with improved power over competing methods. Applying THISTLE together with a complementary sQTL mapping strategy to brain transcriptomic (n = 2,865) and genotype data, we identified 12,794 genes with cis-sQTLs at P < 5 × 10-8, approximately 61% of which were distinct from eQTLs. Integrating the sQTL data into GWAS for 12 brain-related complex traits (including diseases), we identified 244 genes associated with the traits through cis-sQTLs, approximately 61% of which could not be discovered using the corresponding eQTL data. Our study demonstrates the distinct role of most sQTLs in the genetic regulation of transcription and complex trait variation.

Show full abstractShow less

DOI

10.1038/s41588-022-01154-4

Rakowski A

Summary statistics

PREPRINT_DOI

10.1101/2024.06.11.24308721

SERVER

biorxiv

Main citation

Rakowski, A., Monti, R. & Lippert, C. TransferGWAS of T1-weighted brain MRI data from the UK Biobank. bioRxiv 2024.06.11.24308721 (2024) doi:10.1101/2024.06.11.24308721.

MAIN ANCESTRY

EUR

Review-Suhre K, et al-32860016

Summary statistics

PUBMED_LINK

32860016

DESCRIPTION

A Table of all published GWAS with proteomics

Show full descriptionShow less

URL

http://www.metabolomix.com/a-table-of-all-published-gwas-with-proteomics/

TITLE

Genetics meets proteomics: perspectives for large population-based studies.

Main citation

Suhre K, McCarthy MI, Schwenk JM. (2021) Genetics meets proteomics: perspectives for large population-based studies. Nat Rev Genet, 22 (1) 19-37. doi:10.1038/s41576-020-0268-2. PMID 32860016

ABSTRACT

Proteomic analysis of cells, tissues and body fluids has generated valuable insights into the complex processes influencing human biology. Proteins represent intermediate phenotypes for disease and provide insight into how genetic and non-genetic risk factors are mechanistically linked to clinical outcomes. Associations between protein levels and DNA sequence variants that colocalize with risk alleles for common diseases can expose disease-associated pathways, revealing novel drug targets and translational biomarkers. However, genome-wide, population-scale analyses of proteomic data are only now emerging. Here, we review current findings from studies of the plasma proteome and discuss their potential for advancing biomedical translation through the interpretation of genome-wide association analyses. We highlight the challenges faced by currently available technologies and provide perspectives relevant to their future application in large-scale biobank studies.

Show full abstractShow less

DOI

10.1038/s41576-020-0268-2

Ruffieux H, et al-32492067

Summary statistics

PUBMED_LINK

32492067

TITLE

A fully joint Bayesian quantitative trait locus mapping of human protein abundance in plasma.

Main citation

Ruffieux H, Carayol J, Popescu R, Harper ME, ...&, Valsesia A. (2020) A fully joint Bayesian quantitative trait locus mapping of human protein abundance in plasma. PLoS Comput Biol, 16 (6) e1007882. doi:10.1371/journal.pcbi.1007882. PMID 32492067

ABSTRACT

Molecular quantitative trait locus (QTL) analyses are increasingly popular to explore the genetic architecture of complex traits, but existing studies do not leverage shared regulatory patterns and suffer from a large multiplicity burden, which hampers the detection of weak signals such as trans associations. Here, we present a fully multivariate proteomic QTL (pQTL) analysis performed with our recently proposed Bayesian method LOCUS on data from two clinical cohorts, with plasma protein levels quantified by mass-spectrometry and aptamer-based assays. Our two-stage study identifies 136 pQTL associations in the first cohort, of which >80% replicate in the second independent cohort and have significant enrichment with functional genomic elements and disease risk loci. Moreover, 78% of the pQTLs whose protein abundance was quantified by both proteomic techniques are confirmed across assays. Our thorough comparisons with standard univariate QTL mapping on (1) these data and (2) synthetic data emulating the real data show how LOCUS borrows strength across correlated protein levels and markers on a genome-wide scale to effectively increase statistical power. Notably, 15% of the pQTLs uncovered by LOCUS would be missed by the univariate approach, including several trans and pleiotropic hits with successful independent validation. Finally, the analysis of extensive clinical data from the two cohorts indicates that the genetically-driven proteins identified by LOCUS are enriched in associations with low-grade inflammation, insulin resistance and dyslipidemia and might therefore act as endophenotypes for metabolic diseases. While considerations on the clinical role of the pQTLs are beyond the scope of our work, these findings generate useful hypotheses to be explored in future research; all results are accessible online from our searchable database. Thanks to its efficient variational Bayes implementation, LOCUS can analyze jointly thousands of traits and millions of markers. Its applicability goes beyond pQTL studies, opening new perspectives for large-scale genome-wide association and QTL analyses. Diet, Obesity and Genes (DiOGenes) trial registration number: NCT00390637.

Show full abstractShow less

DOI

10.1371/journal.pcbi.1007882

SABR

Summary statistics

PUBMED_LINK

40500424

DESCRIPTION

South African Blood Regulatory

Show full descriptionShow less

URL

https://zenodo.org/records/15334125

TITLE

A map of blood regulatory variation in South Africans enables GWAS interpretation.

Main citation

Castel SE, Tluway FD, Emde AK, Smyth N, ...&, Ramsay M. (2025) A map of blood regulatory variation in South Africans enables GWAS interpretation. Nat Genet, 57 (7) 1628-1637. doi:10.1038/s41588-025-02223-0. PMID 40500424

ABSTRACT

Functional genomics resources are critical for interpreting human genetic studies, but currently they are predominantly from European-ancestry individuals. Here we present the South African Blood Regulatory (SABR) resource, a map of blood regulatory variation that includes three South Eastern Bantu-speaking groups. Using paired whole-genome and blood transcriptome data from over 600 individuals, we map the genetic architecture of 40 blood cell traits derived from deconvolution analysis, as well as expression, splice and cell-type interaction quantitative trait loci. We comprehensively compare SABR to the Genotype Tissue Expression Project and characterize thousands of regulatory variants only observed in African-ancestry individuals. Finally, we demonstrate the increased utility of SABR for interpreting African-ancestry association studies by identifying putatively causal genes and molecular mechanisms through colocalization analysis of blood-relevant traits from the Pan-UK Biobank. Importantly, we make full SABR summary statistics publicly available to support the African genomics community.

Show full abstractShow less

DOI

10.1038/s41588-025-02223-0

Said

Summary statistics

PREPRINT_DOI

10.1101/2023.11.13.23298365

SERVER

medrxiv

Main citation

Said, S. et al. Ancestry diversity in the genetic determinants of the human plasma proteome and associated new drug targets. bioRxiv (2023) doi:10.1101/2023.11.13.23298365.

RELATED_BIOBANK

China Kadoorie Biobank

MAIN ANCESTRY

EAS

Sasayama D, et al-28031287

Summary statistics

PUBMED_LINK

28031287

TITLE

Genome-wide quantitative trait loci mapping of the human cerebrospinal fluid proteome.

Main citation

Sasayama D, Hattori K, Ogawa S, Yokota Y, ...&, Kunugi H. (2017) Genome-wide quantitative trait loci mapping of the human cerebrospinal fluid proteome. Hum Mol Genet, 26 (1) 44-51. doi:10.1093/hmg/ddw366. PMID 28031287

ABSTRACT

Cerebrospinal fluid (CSF) is virtually the only one accessible source of proteins derived from the central nervous system (CNS) of living humans and possibly reflects the pathophysiology of a variety of neuropsychiatric diseases. However, little is known regarding the genetic basis of variation in protein levels of human CSF. We examined CSF levels of 1,126 proteins in 133 subjects and performed a genome-wide association analysis of 514,227 single nucleotide polymorphisms (SNPs) to detect protein quantitative trait loci (pQTLs). To be conservative, Spearman's correlation was used to identify an association between genotypes of SNPs and protein levels. A total of 421 cis and 25 trans SNP-protein pairs were significantly correlated at a false discovery rate (FDR) of less than 0.01 (nominal P < 7.66 × 10-9). Cis-only analysis revealed additional 580 SNP-protein pairs with FDR < 0.01 (nominal P < 2.13 × 10-5). pQTL SNPs were more likely, compared to non-pQTL SNPs, to be a disease/trait-associated variants identified by previous genome-wide association studies. The present findings suggest that genetic variations play an important role in the regulation of protein expression in the CNS. The obtained database may serve as a valuable resource to understand the genetic bases for CNS protein expression pattern in humans.

Show full abstractShow less

DOI

10.1093/hmg/ddw366

sc-eQTLGen

Summary statistics

PUBMED_LINK

32149610

URL

https://www.eqtlgen.org/sc/

TITLE

The single-cell eQTLGen consortium.

Main citation

van der Wijst M, de Vries DH, Groot HE, Trynka G, ...&, Franke L. (2020) The single-cell eQTLGen consortium. Elife, 9 () . doi:10.7554/eLife.52155. PMID 32149610

ABSTRACT

In recent years, functional genomics approaches combining genetic information with bulk RNA-sequencing data have identified the downstream expression effects of disease-associated genetic risk factors through so-called expression quantitative trait locus (eQTL) analysis. Single-cell RNA-sequencing creates enormous opportunities for mapping eQTLs across different cell types and in dynamic processes, many of which are obscured when using bulk methods. Rapid increase in throughput and reduction in cost per cell now allow this technology to be applied to large-scale population genetics studies. To fully leverage these emerging data resources, we have founded the single-cell eQTLGen consortium (sc-eQTLGen), aimed at pinpointing the cellular contexts in which disease-causing genetic variants affect gene expression. Here, we outline the goals, approach and potential utility of the sc-eQTLGen consortium. We also provide a set of study design considerations for future single-cell eQTL studies.

Show full abstractShow less

DOI

10.7554/eLife.52155

SCALLOP

Summary statistics

DESCRIPTION

The SCALLOP consortium (Systematic and Combined AnaLysis of Olink Proteins) is a collaborative framework for discovery and follow-up of genetic associations with proteins on the Olink Proteomics platform. To date, 35 PIs from 28 research institutions have joined the effort, which now comprises summary level data for more than 70,000 patients and controls from 45 cohort studies. SCALLOP welcomes new members.

Show full descriptionShow less

URL

http://www.scallop-consortium.com/

RELATED_BIOBANK

https://open.win.ox.ac.uk/ukbiobank/big40/pheweb33k/

MAIN ANCESTRY

EUR

Shah M-37604819

Summary statistics

PUBMED_LINK

37604819

TITLE

Environmental and genetic predictors of human cardiovascular ageing.

Main citation

Shah M, de A Inácio MH, Lu C, Schiratti PR, ...&, O'Regan DP. (2023) Environmental and genetic predictors of human cardiovascular ageing. Nat Commun, 14 (1) 4941. doi:10.1038/s41467-023-40566-6. PMID 37604819

ABSTRACT

Cardiovascular ageing is a process that begins early in life and leads to a progressive change in structure and decline in function due to accumulated damage across diverse cell types, tissues and organs contributing to multi-morbidity. Damaging biophysical, metabolic and immunological factors exceed endogenous repair mechanisms resulting in a pro-fibrotic state, cellular senescence and end-organ damage, however the genetic architecture of cardiovascular ageing is not known. Here we use machine learning approaches to quantify cardiovascular age from image-derived traits of vascular function, cardiac motion and myocardial fibrosis, as well as conduction traits from electrocardiograms, in 39,559 participants of UK Biobank. Cardiovascular ageing is found to be significantly associated with common or rare variants in genes regulating sarcomere homeostasis, myocardial immunomodulation, and tissue responses to biophysical stress. Ageing is accelerated by cardiometabolic risk factors and we also identify prescribed medications that are potential modifiers of ageing. Through large-scale modelling of ageing across multiple traits our results reveal insights into the mechanisms driving premature cardiovascular ageing and reveal potential molecular targets to attenuate age-related processes.

Show full abstractShow less

DOI

10.1038/s41467-023-40566-6

MAIN ANCESTRY

EUR

Smith SM-33875891

Summary statistics

PUBMED_LINK

33875891

DESCRIPTION

Oxford Brain Imaging Genetics (BIG40)

Show full descriptionShow less

URL

TITLE

An expanded set of genome-wide association studies of brain imaging phenotypes in UK Biobank.

Main citation

Smith SM, Douaud G, Chen W, Hanayik T, ...&, Elliott LT. (2021) An expanded set of genome-wide association studies of brain imaging phenotypes in UK Biobank. Nat Neurosci, 24 (5) 737-745. doi:10.1038/s41593-021-00826-4. PMID 33875891

ABSTRACT

UK Biobank is a major prospective epidemiological study, including multimodal brain imaging, genetics and ongoing health outcomes. Previously, we published genome-wide associations of 3,144 brain imaging-derived phenotypes, with a discovery sample of 8,428 individuals. Here we present a new open resource of genome-wide association study summary statistics, using the 2020 data release, almost tripling the discovery sample size. We now include the X chromosome and new classes of imaging-derived phenotypes (subcortical volumes and tissue contrast). Previously, we found 148 replicated clusters of associations between genetic variants and imaging phenotypes; in this study, we found 692, including 12 on the X chromosome. We describe some of the newly found associations, focusing on the X chromosome and autosomal associations involving the new classes of imaging-derived phenotypes. Our novel associations implicate, for example, pathways involved in the rare X-linked STAR (syndactyly, telecanthus and anogenital and renal malformations) syndrome, Alzheimer's disease and mitochondrial disorders.

Show full abstractShow less

DOI

10.1038/s41593-021-00826-4

MAIN ANCESTRY

EUR

Suhre K, et al-28240269

Summary statistics

PUBMED_LINK

28240269

TITLE

Connecting genetic risk to disease end points through the human blood plasma proteome.

Main citation

Suhre K, Arnold M, Bhagwat AM, Cotton RJ, ...&, Graumann J. (2017) Connecting genetic risk to disease end points through the human blood plasma proteome. Nat Commun, 8 () 14357. doi:10.1038/ncomms14357. PMID 28240269

ABSTRACT

Genome-wide association studies (GWAS) with intermediate phenotypes, like changes in metabolite and protein levels, provide functional evidence to map disease associations and translate them into clinical applications. However, although hundreds of genetic variants have been associated with complex disorders, the underlying molecular pathways often remain elusive. Associations with intermediate traits are key in establishing functional links between GWAS-identified risk-variants and disease end points. Here we describe a GWAS using a highly multiplexed aptamer-based affinity proteomics platform. We quantify 539 associations between protein levels and gene variants (pQTLs) in a German cohort and replicate over half of them in an Arab and Asian cohort. Fifty-five of the replicated pQTLs are located in trans. Our associations overlap with 57 genetic risk loci for 42 unique disease end points. We integrate this information into a genome-proteome network and provide an interactive web-tool for interrogations. Our results provide a basis for novel approaches to pharmaceutical and diagnostic applications.

Show full abstractShow less

DOI

10.1038/ncomms14357

Suhre K, et al-38412862

Summary statistics

PUBMED_LINK

38412862

DESCRIPTION

rQTLs

Show full descriptionShow less

TITLE

Genetic associations with ratios between protein levels detect new pQTLs and reveal protein-protein interactions.

Main citation

Suhre K. (2024) Genetic associations with ratios between protein levels detect new pQTLs and reveal protein-protein interactions. Cell Genom, 4 (3) 100506. doi:10.1016/j.xgen.2024.100506. PMID 38412862

ABSTRACT

Protein quantitative trait loci (pQTLs) are an invaluable source of information for drug target development because they provide genetic evidence to support protein function, suggest relationships between cis- and trans-associated proteins, and link proteins to disease endpoints. Using Olink proteomics data for 1,463 proteins measured in over 54,000 samples of the UK Biobank, we identified 4,248 associations with 2,821 ratios between protein levels (rQTLs). rQTLs were 7.6-fold enriched in known protein-protein interactions, suggesting that their ratios reflect biological links between the implicated proteins. Conducting a GWAS on ratios increased the number of discovered genetic signals by 24.7%. The approach can identify novel loci of clinical relevance, support causal gene identification, and reveal complex networks of interacting proteins. Taken together, our study adds significant value to the genetic insights that can be derived from the UKB proteomics data and motivates the wider use of ratios in large-scale GWAS.

Show full abstractShow less

DOI

10.1016/j.xgen.2024.100506

RELATED_BIOBANK

https://taiwanview.twbiobank.org.tw/pheweb.php

MAIN ANCESTRY

EUR

Sun BB, et al-29875488

Summary statistics

PUBMED_LINK

29875488

TITLE

Genomic atlas of the human plasma proteome.

Main citation

Sun BB, Maranville JC, Peters JE, Stacey D, ...&, Butterworth AS. (2018) Genomic atlas of the human plasma proteome. Nature, 558 (7708) 73-79. doi:10.1038/s41586-018-0175-2. PMID 29875488

ABSTRACT

Although plasma proteins have important roles in biological processes and are the direct targets of many drugs, the genetic factors that control inter-individual variation in plasma protein levels are not well understood. Here we characterize the genetic architecture of the human plasma proteome in healthy blood donors from the INTERVAL study. We identify 1,927 genetic associations with 1,478 proteins, a fourfold increase on existing knowledge, including trans associations for 1,104 proteins. To understand the consequences of perturbations in plasma protein levels, we apply an integrated approach that links genetic variation with biological pathway, disease, and drug databases. We show that protein quantitative trait loci overlap with gene expression quantitative trait loci, as well as with disease-associated loci, and find evidence that protein biomarkers have causal roles in disease using Mendelian randomization analysis. By linking genetic factors to diseases via specific proteins, our analyses highlight potential therapeutic targets, opportunities for matching existing drugs with new disease indications, and potential safety concerns for drugs under development.

Show full abstractShow less

DOI

10.1038/s41586-018-0175-2

Sun BB, et al-37794186

Summary statistics

PUBMED_LINK

37794186

TITLE

Plasma proteomic associations with genetics and health in the UK Biobank.

Main citation

Sun BB, Chiou J, Traylor M, Benner C, ...&, Whelan CD. (2023) Plasma proteomic associations with genetics and health in the UK Biobank. Nature, 622 (7982) 329-338. doi:10.1038/s41586-023-06592-6. PMID 37794186

ABSTRACT

The Pharma Proteomics Project is a precompetitive biopharmaceutical consortium characterizing the plasma proteomic profiles of 54,219 UK Biobank participants. Here we provide a detailed summary of this initiative, including technical and biological validations, insights into proteomic disease signatures, and prediction modelling for various demographic and health indicators. We present comprehensive protein quantitative trait locus (pQTL) mapping of 2,923 proteins that identifies 14,287 primary genetic associations, of which 81% are previously undescribed, alongside ancestry-specific pQTL mapping in non-European individuals. The study provides an updated characterization of the genetic architecture of the plasma proteome, contextualized with projected pQTL discovery rates as sample sizes and proteomic assay coverages increase over time. We offer extensive insights into trans pQTLs across multiple biological domains, highlight genetic influences on ligand-receptor interactions and pathway perturbations across a diverse collection of cytokines and complement networks, and illustrate long-range epistatic effects of ABO blood group and FUT2 secretor status on proteins with gastrointestinal tissue-enriched expression. We demonstrate the utility of these data for drug discovery by extending the genetic proxied effects of protein targets, such as PCSK9, on additional endpoints, and disentangle specific genes and proteins perturbed at loci associated with COVID-19 susceptibility. This public-private partnership provides the scientific community with an open-access proteomics resource of considerable breadth and depth to help to elucidate the biological mechanisms underlying proteo-genomic discoveries and accelerate the development of biomarkers, predictive models and therapeutics1.

Show full abstractShow less

DOI

10.1038/s41586-023-06592-6

Sun BB-36241887

Summary statistics

PUBMED_LINK

36241887

TITLE

Genetic map of regional sulcal morphology in the human brain from UK biobank data.

Main citation

Sun BB, Loomis SJ, Pizzagalli F, Shatokhina N, ...&, Whelan CD. (2022) Genetic map of regional sulcal morphology in the human brain from UK biobank data. Nat Commun, 13 (1) 6071. doi:10.1038/s41467-022-33829-1. PMID 36241887

ABSTRACT

Genetic associations with macroscopic brain structure can provide insights into brain function and disease. However, specific associations with measures of local brain folding are largely under-explored. Here, we conducted large-scale genome- and exome-wide associations of regional cortical sulcal measures derived from magnetic resonance imaging scans of 40,169 individuals in UK Biobank. We discovered 388 regional brain folding associations across 77 genetic loci, with genes in associated loci enriched for expression in the cerebral cortex, neuronal development processes, and differential regulation during early brain development. We integrated brain eQTLs to refine genes for various loci, implicated several genes involved in neurodevelopmental disorders, and highlighted global genetic correlations with neuropsychiatric phenotypes. We provide an interactive 3D visualisation of our summary associations, emphasising added resolution of regional analyses. Our results offer new insights into the genetic architecture of brain folding and provide a resource for future studies of sulcal morphology in health and disease.

Show full abstractShow less

DOI

10.1038/s41467-022-33829-1

MAIN ANCESTRY

EUR

Sun W, et al-27532455

Summary statistics

PUBMED_LINK

27532455

TITLE

Common Genetic Polymorphisms Influence Blood Biomarker Measurements in COPD.

Main citation

Sun W, Kechris K, Jacobson S, Drummond MB, ...&, COPDGene Investigators. (2016) Common Genetic Polymorphisms Influence Blood Biomarker Measurements in COPD. PLoS Genet, 12 (8) e1006011. doi:10.1371/journal.pgen.1006011. PMID 27532455

ABSTRACT

Implementing precision medicine for complex diseases such as chronic obstructive lung disease (COPD) will require extensive use of biomarkers and an in-depth understanding of how genetic, epigenetic, and environmental variations contribute to phenotypic diversity and disease progression. A meta-analysis from two large cohorts of current and former smokers with and without COPD [SPIROMICS (N = 750); COPDGene (N = 590)] was used to identify single nucleotide polymorphisms (SNPs) associated with measurement of 88 blood proteins (protein quantitative trait loci; pQTLs). PQTLs consistently replicated between the two cohorts. Features of pQTLs were compared to previously reported expression QTLs (eQTLs). Inference of causal relations of pQTL genotypes, biomarker measurements, and four clinical COPD phenotypes (airflow obstruction, emphysema, exacerbation history, and chronic bronchitis) were explored using conditional independence tests. We identified 527 highly significant (p < 8 X 10-10) pQTLs in 38 (43%) of blood proteins tested. Most pQTL SNPs were novel with low overlap to eQTL SNPs. The pQTL SNPs explained >10% of measured variation in 13 protein biomarkers, with a single SNP (rs7041; p = 10-392) explaining 71%-75% of the measured variation in vitamin D binding protein (gene = GC). Some of these pQTLs [e.g., pQTLs for VDBP, sRAGE (gene = AGER), surfactant protein D (gene = SFTPD), and TNFRSF10C] have been previously associated with COPD phenotypes. Most pQTLs were local (cis), but distant (trans) pQTL SNPs in the ABO blood group locus were the top pQTL SNPs for five proteins. The inclusion of pQTL SNPs improved the clinical predictive value for the established association of sRAGE and emphysema, and the explanation of variance (R2) for emphysema improved from 0.3 to 0.4 when the pQTL SNP was included in the model along with clinical covariates. Causal modeling provided insight into specific pQTL-disease relationships for airflow obstruction and emphysema. In conclusion, given the frequency of highly significant local pQTLs, the large amount of variance potentially explained by pQTL, and the differences observed between pQTLs and eQTLs SNPs, we recommend that protein biomarker-disease association studies take into account the potential effect of common local SNPs and that pQTLs be integrated along with eQTLs to uncover disease mechanisms. Large-scale blood biomarker studies would also benefit from close attention to the ABO blood group.

Show full abstractShow less

DOI

10.1371/journal.pgen.1006011

Surapaneni A, et al-35870639

Summary statistics

PUBMED_LINK

35870639

TITLE

Identification of 969 protein quantitative trait loci in an African American population with kidney disease attributed to hypertension.

Main citation

Surapaneni A, Schlosser P, Zhou L, Liu C, ...&, Grams ME. (2022) Identification of 969 protein quantitative trait loci in an African American population with kidney disease attributed to hypertension. Kidney Int, 102 (5) 1167-1177. doi:10.1016/j.kint.2022.07.005. PMID 35870639

ABSTRACT

Investigations into the causal underpinnings of disease processes can be aided by the incorporation of genetic information. Genetic studies require populations varied in both ancestry and prevalent disease in order to optimize discovery and ensure generalizability of findings to the global population. Here, we report the genetic determinants of the serum proteome in 466 African Americans with chronic kidney disease attributed to hypertension from the richly phenotyped African American Study of Kidney Disease and Hypertension (AASK) study. Using the largest aptamer-based protein profiling platform to date (6,790 proteins or protein complexes), we identified 969 genetic associations with 900 unique proteins; including 52 novel cis (local) associations and 379 novel trans (distant) associations. The genetic effects of previously published cis-protein quantitative trait loci (pQTLs) were found to be highly reproducible, and we found evidence that our novel genetic signals colocalize with gene expression and disease processes. Many trans- pQTLs were found to reflect associations mediated by the circulating cis protein, and the common trans-pQTLs are enriched for processes involving extracellular vesicles, highlighting a plausible mechanism for distal regulation of the levels of secreted proteins. Thus, our study generates a valuable resource of genetic associations linking variants to protein levels and disease in an understudied patient population to inform future studies of drug targets and physiology.

Show full abstractShow less

DOI

10.1016/j.kint.2022.07.005

Taiwan BioBank Pheweb

Summary statistics

PUBMED_LINK

29149267

DESCRIPTION

Taiwan Biobank PheWeb — GWAS summary statistics for Taiwanese participants.

Show full descriptionShow less

URL

TITLE

Taiwan Biobank: making cross-database convergence possible in the Big Data era.

Main citation

Lin JC, Fan CT, Liao CC, Chen YS. (2018) Taiwan Biobank: making cross-database convergence possible in the Big Data era. Gigascience, 7 (1) 1-4. doi:10.1093/gigascience/gix110. PMID 29149267

ABSTRACT

The Taiwan Biobank (TWB) is a biomedical research database of biopsy data from 200 000 participants. Access to this database has been granted to research communities taking part in the development of precision medicines; however, this has raised issues surrounding TWB's access to electronic medical records (EMRs). The Personal Data Protection Act of Taiwan restricts access to EMRs for purposes not covered by patients' original consent. This commentary explores possible legal solutions to help ensure that the access TWB has to EMR abides with legal obligations, and with governance frameworks associated with ethical, legal, and social implications. We suggest utilizing "hash function" algorithms to create nonretrospective, anonymized data for the purpose of cross-transmission and/or linkage with EMR.

Show full abstractShow less

DOI

10.1093/gigascience/gix110

RELATED_BIOBANK

Taiwan Biobank

MAIN ANCESTRY

EAS

TenK10k

Summary statistics

DESCRIPTION

Phase 1: matched WGS and scRNA-seq in ~1.9k individuals; common and rare variant sc-eQTLs in 28 immune cell types (SAIGE-QTL).

Show full descriptionShow less

URL

https://www.medrxiv.org/content/10.1101/2025.03.20.25324352v2

TITLE

Impact of rare and common genetic variation on cell type-specific gene expression in human blood.

Main citation

Cuomo ASE, Spenceley E, Tanudisastro HA, Bowen B, ...&, Powell JE. (2025) Impact of rare and common genetic variation on cell type-specific gene expression in human blood. medRxiv, () . doi:10.1101/2025.03.20.25324352

ABSTRACT

Understanding the genetic basis of gene expression can shed light on the regulatory mechanisms underlying complex traits and diseases. Single cell-resolved measures of RNA levels and single-cell expression quantitative trait loci (sc-eQTLs) have revealed genetic regulation that drives sub-tissue cell states and types across diverse human tissues. Here, we describe the first phase of TenK10K, the largest-to-date dataset of matched whole-genome sequencing (WGS) and single-cell RNA-sequencing (scRNA-seq). We leverage scRNA-seq data from over 5 million cells across 28 immune cell types, and matched WGS, from 1,925 individuals, which provides power to detect associations between rare and low-frequency genetic variants that have largely been uncharacterised in their impact on cell-specific gene expression. We map the effects of both common and rare variants in a cell type-specific manner using a recently introduced method that increases power by modelling single cells directly rather than relying on aggregated ‘pseudobulk’ counts. We identify putative common regulatory variants for 83% of all 21,404 genes tested and cumulative rare variant signals for 47% of genes. We explore how genetic effects vary across cell type and state spectra, develop a framework to determine the degree to which sc-eQTLs are cell type-specific, and show that about half of the effects are observed only in one or a few cell types. By integrating our results with functional annotations and disease information, we also further characterise the likely molecular modes of action for many disease-variant associations. Finally, we explore the effects that genetic variants have on gene expression across continuous cell states and functions, and effects that vary cell state abundance directly.

Show full abstractShow less

DOI

10.1101/2025.03.20.25324352

Thareja G, et al-36168886

Summary statistics

PUBMED_LINK

36168886

TITLE

Differences and commonalities in the genetic architecture of protein quantitative trait loci in European and Arab populations.

Main citation

Thareja G, Belkadi A, Arnold M, Albagha OME, ...&, Suhre K. (2023) Differences and commonalities in the genetic architecture of protein quantitative trait loci in European and Arab populations. Hum Mol Genet, 32 (6) 907-916. doi:10.1093/hmg/ddac243. PMID 36168886

ABSTRACT

Polygenic scores (PGS) can identify individuals at risk of adverse health events and guide genetics-based personalized medicine. However, it is not clear how well PGS translate between different populations, limiting their application to well-studied ethnicities. Proteins are intermediate traits linking genetic predisposition and environmental factors to disease, with numerous blood circulating protein levels representing functional readouts of disease-related processes. We hypothesized that studying the genetic architecture of a comprehensive set of blood-circulating proteins between a European and an Arab population could shed fresh light on the translatability of PGS to understudied populations. We therefore conducted a genome-wide association study with whole-genome sequencing data using 1301 proteins measured on the SOMAscan aptamer-based affinity proteomics platform in 2935 samples of Qatar Biobank and evaluated the replication of protein quantitative traits (pQTLs) from European studies in an Arab population. Then, we investigated the colocalization of shared pQTL signals between the two populations. Finally, we compared the performance of protein PGS derived from a Caucasian population in a European and an Arab cohort. We found that the majority of shared pQTL signals (81.8%) colocalized between both populations. About one-third of the genetic protein heritability was explained by protein PGS derived from a European cohort, with protein PGS performing ~20% better in Europeans when compared to Arabs. Our results are relevant for the translation of PGS to non-Caucasian populations, as well as for future efforts to extend genetic research to understudied populations.

Show full abstractShow less

DOI

10.1093/hmg/ddac243

Tohoku Medical Megabank (TMM) Jmorp

Summary statistics

PUBMED_LINK

37930845

DESCRIPTION

Tohoku Medical Megabank / jMorp multi-omics reference and GWAS-related summary data portal.

Show full descriptionShow less

URL

https://jmorp.megabank.tohoku.ac.jp/202109/gwas/

TITLE

jMorp: Japanese Multi-Omics Reference Panel update report 2023.

Main citation

Tadaka S, Kawashima J, Hishinuma E, Saito S, ...&, Kinoshita K. (2024) jMorp: Japanese Multi-Omics Reference Panel update report 2023. Nucleic Acids Res, 52 (D1) D622-D632. doi:10.1093/nar/gkad978. PMID 37930845

ABSTRACT

Modern medicine is increasingly focused on personalized medicine, and multi-omics data is crucial in understanding biological phenomena and disease mechanisms. Each ethnic group has its unique genetic background with specific genomic variations influencing disease risk and drug response. Therefore, multi-omics data from specific ethnic populations are essential for the effective implementation of personalized medicine. Various prospective cohort studies, such as the UK Biobank, All of Us and Lifelines, have been conducted worldwide. The Tohoku Medical Megabank project was initiated after the Great East Japan Earthquake in 2011. It collects biological specimens and conducts genome and omics analyses to build a basis for personalized medicine. Summary statistical data from these analyses are available in the jMorp web database (https://jmorp.megabank.tohoku.ac.jp), which provides a multidimensional approach to the diversity of the Japanese population. jMorp was launched in 2015 as a public database for plasma metabolome and proteome analyses and has been continuously updated. The current update will significantly expand the scale of the data (metabolome, genome, transcriptome, and metagenome). In addition, the user interface and backend server implementations were rewritten to improve the connectivity between the items stored in jMorp. This paper provides an overview of the new version of the jMorp.

Show full abstractShow less

DOI

10.1093/nar/gkad978

RELATED_BIOBANK

Tohoku Medical Megabank

MAIN ANCESTRY

EAS

TPMI PheWeb

Summary statistics

PUBMED_LINK

41092961

DESCRIPTION

Taiwan Precision Medicine Initiative PheWeb — cohort GWAS summary statistics.

Show full descriptionShow less

URL

https://pheweb.ibms.sinica.edu.tw/

TITLE

The Taiwan Precision Medicine Initiative provides a cohort for large-scale studies.

Main citation

Yang HC, Kwok PY, Li LH, Liu YM, ...&, Wu JY. (2025) The Taiwan Precision Medicine Initiative provides a cohort for large-scale studies. Nature, 648 (8092) 117-127. doi:10.1038/s41586-025-09680-x. PMID 41092961

ABSTRACT

Han Chinese people comprise nearly 20% of the global population but remain under-represented in genetic studies1,2, so there is an urgent need for large-scale cohorts to advance precision medicine. Here we present the Taiwan Precision Medicine Initiative (TPMI), established by Academia Sinica in collaboration with 16 major medical centres around Taiwan, which has recruited 565,390 participants who consent to provide DNA samples for genetic profiling and grant access to their electronic medical records (EMRs) for research. EMR access is both retrospective and prospective, allowing longitudinal studies. Genetic profiling is done with population-optimized arrays of single-nucleotide polymorphisms for people of Han Chinese ancestry, which enable genome-wide association3,4, phenome-wide association5,6 and polygenic risk score7,8 studies to be performed to evaluate common disease risk and pharmacogenetic response. Participants also agreed to be re-contacted for future research and receive personalized genetic risk profiles with health management recommendations. The TPMI has established the TPMI Data Access Platform, a central database and analysis platform that both safeguards the security of the data and facilitates academic research. As a large cohort of individuals with non-European ancestry that merges genetic profiles with EMR data and enables longitudinal follow-up, TPMI provides a unique resource that could be used to validate genetic risk prediction models, perform clinical trials of risk-based health management and inform health policies. Ultimately, the TPMI cohort will contribute to global genetic research and serve as a model for population-based precision medicine.

Show full abstractShow less

DOI

10.1038/s41586-025-09680-x

RELATED_BIOBANK

Taiwan Precision Medicine Initiative

MAIN ANCESTRY

EAS

UKB

Summary statistics

PUBMED_LINK

41639462

URL

https://azphewas.com/

TITLE

Phenome-wide analysis of copy number variants in 470,727 UK Biobank genomes.

Main citation

Zou XZ, Hu F, Lou H, Burren OS, ...&, Carss K. (2026) Phenome-wide analysis of copy number variants in 470,727 UK Biobank genomes. Nature, () . doi:10.1038/s41586-025-10087-x. PMID 41639462

ABSTRACT

Copy number variants (CNVs) are key drivers of human diversity and disease risk1. Here we evaluate the role of CNVs across a broad range of human phenotypes and diseases by analysing CNVs from 470,727 UK Biobank whole-genome sequences and conducting a variant- and gene-level phenome-wide association study (PheWAS) with 2,941 plasma protein abundance measurements, 13,336 binary clinical phenotypes and 1,911 quantitative traits. Proteomic analyses validated functional associations of CNVs with nearby genes (cis-protein quantitative trait loci; cis-pQTLs)-with deletions and duplications typically associated with reduced and increased protein levels, respectively-and uncovered previously unknown protein-protein interactions (trans-pQTLs). Our PheWAS recapitulated known associations and uncovered associations in both coding and non-coding regions. Notably, we identified a rare deletion in ZNF451 associated with increased leukocyte telomere length and a non-coding deletion of a SLC2A9 enhancer associated with reduced gout risk. In addition, by combining CNVs with protein-coding single nucleotide variants and indels, we enhanced the power of our study to detect gene-disease associations. Finally, we leveraged this multiomics dataset to identify several pQTLs that constitute candidate biomarkers, including TMPRSS5 for Charcot-Marie-Tooth disease type 1A. This multiancestry whole-genome-sequence CNV PheWAS offers insights into the roles of CNVs in human health outcomes and could serve as a valuable resource for therapeutic development.

Show full abstractShow less

DOI

10.1038/s41586-025-10087-x

RELATED_BIOBANK

MAIN ANCESTRY

EUR

UKB exome

Summary statistics

PUBMED_LINK

34375979

DESCRIPTION

UK Biobank exome sequence-based GWAS summary statistics (gene- and variant-level association resource).

Show full descriptionShow less

URL

https://azphewas.com/

TITLE

Rare variant contribution to human disease in 281,104 UK Biobank exomes.

Main citation

Wang Q, Dhindsa RS, Carss K, Harper AR, ...&, Petrovski S. (2021) Rare variant contribution to human disease in 281,104 UK Biobank exomes. Nature, 597 (7877) 527-532. doi:10.1038/s41586-021-03855-y. PMID 34375979

ABSTRACT

Genome-wide association studies have uncovered thousands of common variants associated with human disease, but the contribution of rare variants to common disease remains relatively unexplored. The UK Biobank contains detailed phenotypic data linked to medical records for approximately 500,000 participants, offering an unprecedented opportunity to evaluate the effect of rare variation on a broad collection of traits1,2. Here we study the relationships between rare protein-coding variants and 17,361 binary and 1,419 quantitative phenotypes using exome sequencing data from 269,171 UK Biobank participants of European ancestry. Gene-based collapsing analyses revealed 1,703 statistically significant gene-phenotype associations for binary traits, with a median odds ratio of 12.4. Furthermore, 83% of these associations were undetectable via single-variant association tests, emphasizing the power of gene-based collapsing analysis in the setting of high allelic heterogeneity. Gene-phenotype associations were also significantly enriched for loss-of-function-mediated traits and approved drug targets. Finally, we performed ancestry-specific and pan-ancestry collapsing analyses using exome sequencing data from 11,933 UK Biobank participants of African, East Asian or South Asian ancestry. Our results highlight a significant contribution of rare variants to common disease. Summary statistics are publicly available through an interactive portal ( http://azphewas.com/ ).

Show full abstractShow less

DOI

10.1038/s41586-021-03855-y

RELATED_BIOBANK

https://yanglab.westlake.edu.cn/data/ukb_fastgwa/imp/

MAIN ANCESTRY

EUR

UKB fastgwa (Imputation)

Summary statistics

PUBMED_LINK

31768069

DESCRIPTION

UK Biobank GWAS from fastGWA on imputed genotype data (continuous and binary traits).

Show full descriptionShow less

URL

TITLE

A resource-efficient tool for mixed model association analysis of large-scale data.

Main citation

Jiang L, Zheng Z, Qi T, Kemper KE, ...&, Yang J. (2019) A resource-efficient tool for mixed model association analysis of large-scale data. Nat Genet, 51 (12) 1749-1755. doi:10.1038/s41588-019-0530-8. PMID 31768069

ABSTRACT

The genome-wide association study (GWAS) has been widely used as an experimental design to detect associations between genetic variants and a phenotype. Two major confounding factors, population stratification and relatedness, could potentially lead to inflated GWAS test statistics and hence to spurious associations. Mixed linear model (MLM)-based approaches can be used to account for sample structure. However, genome-wide association (GWA) analyses in biobank samples such as the UK Biobank (UKB) often exceed the capability of most existing MLM-based tools especially if the number of traits is large. Here, we develop an MLM-based tool (fastGWA) that controls for population stratification by principal components and for relatedness by a sparse genetic relationship matrix for GWA analyses of biobank-scale data. We demonstrate by extensive simulations that fastGWA is reliable, robust and highly resource-efficient. We then apply fastGWA to 2,173 traits on array-genotyped and imputed samples from 456,422 individuals and to 2,048 traits on whole-exome-sequenced samples from 46,191 individuals in the UKB.

Show full abstractShow less

DOI

10.1038/s41588-019-0530-8

RELATED_BIOBANK

https://yanglab.westlake.edu.cn/data/ukb_fastgwa/wes/

MAIN ANCESTRY

EUR

UKB fastgwa (WES)

Summary statistics

PUBMED_LINK

31768069

DESCRIPTION

UK Biobank GWAS from fastGWA on whole-exome sequence data.

Show full descriptionShow less

URL

TITLE

A resource-efficient tool for mixed model association analysis of large-scale data.

Main citation

Jiang L, Zheng Z, Qi T, Kemper KE, ...&, Yang J. (2019) A resource-efficient tool for mixed model association analysis of large-scale data. Nat Genet, 51 (12) 1749-1755. doi:10.1038/s41588-019-0530-8. PMID 31768069

ABSTRACT

The genome-wide association study (GWAS) has been widely used as an experimental design to detect associations between genetic variants and a phenotype. Two major confounding factors, population stratification and relatedness, could potentially lead to inflated GWAS test statistics and hence to spurious associations. Mixed linear model (MLM)-based approaches can be used to account for sample structure. However, genome-wide association (GWA) analyses in biobank samples such as the UK Biobank (UKB) often exceed the capability of most existing MLM-based tools especially if the number of traits is large. Here, we develop an MLM-based tool (fastGWA) that controls for population stratification by principal components and for relatedness by a sparse genetic relationship matrix for GWA analyses of biobank-scale data. We demonstrate by extensive simulations that fastGWA is reliable, robust and highly resource-efficient. We then apply fastGWA to 2,173 traits on array-genotyped and imputed samples from 456,422 individuals and to 2,048 traits on whole-exome-sequenced samples from 46,191 individuals in the UKB.

Show full abstractShow less

DOI

10.1038/s41588-019-0530-8

RELATED_BIOBANK

https://yanglab.westlake.edu.cn/data/ukb_fastgwa/imp_binary/

MAIN ANCESTRY

EUR

UKB fastgwa-glmm (Binary)

Summary statistics

PUBMED_LINK

34737426

DESCRIPTION

UK Biobank binary-trait GWAS from SAIGE-style GLMM analysis (fastGWA-glmm pipeline).

Show full descriptionShow less

URL

TITLE

A generalized linear mixed model association tool for biobank-scale data.

Main citation

Jiang L, Zheng Z, Fang H, Yang J. (2021) A generalized linear mixed model association tool for biobank-scale data. Nat Genet, 53 (11) 1616-1621. doi:10.1038/s41588-021-00954-4. PMID 34737426

ABSTRACT

Compared with linear mixed model-based genome-wide association (GWA) methods, generalized linear mixed model (GLMM)-based methods have better statistical properties when applied to binary traits but are computationally much slower. In the present study, leveraging efficient sparse matrix-based algorithms, we developed a GLMM-based GWA tool, fastGWA-GLMM, that is severalfold to orders of magnitude faster than the state-of-the-art tools when applied to the UK Biobank (UKB) data and scalable to cohorts with millions of individuals. We show by simulation that the fastGWA-GLMM test statistics of both common and rare variants are well calibrated under the null, even for traits with extreme case-control ratios. We applied fastGWA-GLMM to the UKB data of 456,348 individuals, 11,842,647 variants and 2,989 binary traits (full summary statistics available at http://fastgwa.info/ukbimpbin ), and identified 259 rare variants associated with 75 traits, demonstrating the use of imputed genotype data in a large cohort to discover rare variants for binary complex traits.

Show full abstractShow less

DOI

10.1038/s41588-021-00954-4

RELATED_BIOBANK

MAIN ANCESTRY

EUR

UKB gene-based (Genebass)

Summary statistics

PUBMED_LINK

36778668

DESCRIPTION

UK Biobank gene-based association results from the Genebass / exome analysis resource.

Show full descriptionShow less

URL

https://genebass.org/

TITLE

Systematic single-variant and gene-based association testing of thousands of phenotypes in 394,841 UK Biobank exomes.

Main citation

Karczewski KJ, Solomonson M, Chao KR, Goodrich JK, ...&, Neale BM. (2022) Systematic single-variant and gene-based association testing of thousands of phenotypes in 394,841 UK Biobank exomes. Cell Genom, 2 (9) 100168. doi:10.1016/j.xgen.2022.100168. PMID 36778668

ABSTRACT

Genome-wide association studies have successfully discovered thousands of common variants associated with human diseases and traits, but the landscape of rare variations in human disease has not been explored at scale. Exome-sequencing studies of population biobanks provide an opportunity to systematically evaluate the impact of rare coding variations across a wide range of phenotypes to discover genes and allelic series relevant to human health and disease. Here, we present results from systematic association analyses of 4,529 phenotypes using single-variant and gene tests of 394,841 individuals in the UK Biobank with exome-sequence data. We find that the discovery of genetic associations is tightly linked to frequency and is correlated with metrics of deleteriousness and natural selection. We highlight biological findings elucidated by these data and release the dataset as a public resource alongside the Genebass browser for rapidly exploring rare-variant association results.

Show full abstractShow less

DOI

10.1016/j.xgen.2022.100168

RELATED_BIOBANK

https://pheweb.org/UKB-Neale/

MAIN ANCESTRY

EUR

UKB Neale

Summary statistics

DESCRIPTION

Neale lab UK Biobank GWAS summary statistics (round-2 style phenome-wide results via PheWeb).

Show full descriptionShow less

URL

RELATED_BIOBANK

https://pheweb.org/UKB-SAIGE/

MAIN ANCESTRY

EUR

UKB saige

Summary statistics

PUBMED_LINK

30104761

DESCRIPTION

UK Biobank GWAS with SAIGE (mixed-model association for biobank-scale binary and quantitative traits).

Show full descriptionShow less

URL

TITLE

Efficiently controlling for case-control imbalance and sample relatedness in large-scale genetic association studies.

Main citation

Zhou W, Nielsen JB, Fritsche LG, Dey R, ...&, Lee S. (2018) Efficiently controlling for case-control imbalance and sample relatedness in large-scale genetic association studies. Nat Genet, 50 (9) 1335-1341. doi:10.1038/s41588-018-0184-y. PMID 30104761

ABSTRACT

In genome-wide association studies (GWAS) for thousands of phenotypes in large biobanks, most binary traits have substantially fewer cases than controls. Both of the widely used approaches, the linear mixed model and the recently proposed logistic mixed model, perform poorly; they produce large type I error rates when used to analyze unbalanced case-control phenotypes. Here we propose a scalable and accurate generalized mixed model association test that uses the saddlepoint approximation to calibrate the distribution of score test statistics. This method, SAIGE (Scalable and Accurate Implementation of GEneralized mixed model), provides accurate P values even when case-control ratios are extremely unbalanced. SAIGE uses state-of-art optimization strategies to reduce computational costs; hence, it is applicable to GWAS for thousands of phenotypes by large biobanks. Through the analysis of UK Biobank data of 408,961 samples from white British participants with European ancestry for > 1,400 binary phenotypes, we show that SAIGE can efficiently analyze large sample data, controlling for unbalanced case-control ratios and sample relatedness.

Show full abstractShow less

DOI

10.1038/s41588-018-0184-y

RELATED_BIOBANK

https://pheweb.org/UKB-TOPMed/

MAIN ANCESTRY

EUR

UKB TOPMed

Summary statistics

PUBMED_LINK

33568819

DESCRIPTION

UK Biobank GWAS using TOPMed-imputed genotypes (multi-ancestry imputation panel).

Show full descriptionShow less

URL

TITLE

Sequencing of 53,831 diverse genomes from the NHLBI TOPMed Program.

Main citation

Taliun D, Harris DN, Kessler MD, Carlson J, ...&, Abecasis GR. (2021) Sequencing of 53,831 diverse genomes from the NHLBI TOPMed Program. Nature, 590 (7845) 290-299. doi:10.1038/s41586-021-03205-y. PMID 33568819

ABSTRACT

The Trans-Omics for Precision Medicine (TOPMed) programme seeks to elucidate the genetic architecture and biology of heart, lung, blood and sleep disorders, with the ultimate goal of improving diagnosis, treatment and prevention of these diseases. The initial phases of the programme focused on whole-genome sequencing of individuals with rich phenotypic data and diverse backgrounds. Here we describe the TOPMed goals and design as well as the available resources and early insights obtained from the sequence data. The resources include a variant browser, a genotype imputation server, and genomic and phenotypic data that are available through dbGaP (Database of Genotypes and Phenotypes)1. In the first 53,831 TOPMed samples, we detected more than 400 million single-nucleotide and insertion or deletion variants after alignment with the reference genome. Additional previously undescribed variants were detected through assembly of unmapped reads and customized analysis in highly variable loci. Among the more than 400 million detected variants, 97% have frequencies of less than 1% and 46% are singletons that are present in only one individual (53% among unrelated individuals). These rare variants provide insights into mutational processes and recent human evolutionary history. The extensive catalogue of genetic variation in TOPMed studies provides unique opportunities for exploring the contributions of rare and noncoding sequence variants to phenotypic variation. Furthermore, combining TOPMed haplotypes with modern imputation methods improves the power and reach of genome-wide association studies to include variants down to a frequency of approximately 0.01%.

Show full abstractShow less

DOI

10.1038/s41586-021-03205-y

RELATED_BIOBANK