Tools Tissue and single cell
Curation of Tissue and single cell — listings under the GWAS Tools tab.
Summary Table
Click a column header to sort the table.
| NAME | Main citation | YEAR |
|---|---|---|
| CoCoNet | Shang L et al., PLoS Genet, 2020 |
2020 |
| EPIC | Wang R et al., PLoS Genet, 2022 |
2022 |
| LDSC-SEG | Finucane HK et al., Nat Genet, 2018 |
2018 |
| MAGMA | de Leeuw CA et al., PLoS Comput Biol, 2015 |
2015 |
| RolyPoly | Calderon D et al., Am J Hum Genet, 2017 |
2017 |
| SCARlink | Mitra S et al., Nat Genet, 2024 |
2024 |
| SCAVENGE | Yu F et al., Nat Biotechnol, 2022 |
2022 |
| SCENT | Sakaue S et al., Nat Genet, 2024 |
2024 |
| TCSC | Amariuta T et al., Nat Genet, 2023 |
2023 |
| cellAdmix | Mitchel J et al., Nat Genet, 2026 |
2026 |
| gsMap | Song L et al., Nature, 2025 |
2025 |
| pgBoost | Dorans, E. R., Jagadeesh, K., Dey, K., & Price, A. L. (2024). Linking regulatory variants to target genes by… |
NA |
| sc-linker | Jagadeesh KA et al., Nat Genet, 2022 |
2022 |
| scDRS | Zhang MJ et al., Nat Genet, 2022 |
2022 |
| scGWAS | Jia P et al., Genome Biol, 2022 |
2022 |
| seismic | Lai Q et al., Nat Commun, 2025 |
2025 |
CoCoNet
PUBMED_LINK
DESCRIPTION
CoCoNet is a composite likelihood-based covariance regression network model for identifying trait-relevant tissues or cell types.
URL
KEYWORDS
composite likelihood-based inference algorithm
TITLE
Leveraging gene co-expression patterns to infer trait-relevant tissues in genome-wide association studies.
Main citation
Shang L, Smith JA, Zhou X. (2020) Leveraging gene co-expression patterns to infer trait-relevant tissues in genome-wide association studies. PLoS Genet, 16 (4) e1008734. doi:10.1371/journal.pgen.1008734. PMID 32310941
ABSTRACT
Genome-wide association studies (GWASs) have identified many SNPs associated with various common diseases. Understanding the biological functions of these identified SNP associations requires identifying disease/trait relevant tissues or cell types. Here, we develop a network method, CoCoNet, to facilitate the identification of trait-relevant tissues or cell types. Different from existing approaches, CoCoNet incorporates tissue-specific gene co-expression networks constructed from either bulk or single cell RNA sequencing (RNAseq) studies with GWAS data for trait-tissue inference. In particular, CoCoNet relies on a covariance regression network model to express gene-level effect measurements for the given GWAS trait as a function of the tissue-specific co-expression adjacency matrix. With a composite likelihood-based inference algorithm, CoCoNet is scalable to tens of thousands of genes. We validate the performance of CoCoNet through extensive simulations. We apply CoCoNet for an in-depth analysis of four neurological disorders and four autoimmune diseases, where we integrate the corresponding GWASs with bulk RNAseq data from 38 tissues and single cell RNAseq data from 10 cell types. In the real data applications, we show how CoCoNet can help identify specific glial cell types relevant for neurological disorders and identify disease-targeted colon tissues as relevant for autoimmune diseases.
DOI
10.1371/journal.pgen.1008734
EPIC
PUBMED_LINK
FULL NAME
cEll tyPe enrIChment
DESCRIPTION
Inferring relevant tissues and cell types for complex traits in genome-wide association studies
URL
KEYWORDS
GWAS, scRNA-seq
TITLE
EPIC: Inferring relevant cell types for complex traits by integrating genome-wide association studies and single-cell RNA sequencing.
Main citation
Wang R, Lin DY, Jiang Y. (2022) EPIC: Inferring relevant cell types for complex traits by integrating genome-wide association studies and single-cell RNA sequencing. PLoS Genet, 18 (6) e1010251. doi:10.1371/journal.pgen.1010251. PMID 35709291
ABSTRACT
More than a decade of genome-wide association studies (GWASs) have identified genetic risk variants that are significantly associated with complex traits. Emerging evidence suggests that the function of trait-associated variants likely acts in a tissue- or cell-type-specific fashion. Yet, it remains challenging to prioritize trait-relevant tissues or cell types to elucidate disease etiology. Here, we present EPIC (cEll tyPe enrIChment), a statistical framework that relates large-scale GWAS summary statistics to cell-type-specific gene expression measurements from single-cell RNA sequencing (scRNA-seq). We derive powerful gene-level test statistics for common and rare variants, separately and jointly, and adopt generalized least squares to prioritize trait-relevant cell types while accounting for the correlation structures both within and between genes. Using enrichment of loci associated with four lipid traits in the liver and enrichment of loci associated with three neurological disorders in the brain as ground truths, we show that EPIC outperforms existing methods. We apply our framework to multiple scRNA-seq datasets from different platforms and identify cell types underlying type 2 diabetes and schizophrenia. The enrichment is replicated using independent GWAS and scRNA-seq datasets and further validated using PubMed search and existing bulk case-control testing results.
DOI
10.1371/journal.pgen.1010251
LDSC-SEG
PUBMED_LINK
FULL NAME
LD score regression applied to specifically expressed genes
URL
KEYWORDS
LDSC, tissue, cell type
TITLE
Heritability enrichment of specifically expressed genes identifies disease-relevant tissues and cell types.
Main citation
Finucane HK, Reshef YA, Anttila V, Slowikowski K, ...&, Price AL. (2018) Heritability enrichment of specifically expressed genes identifies disease-relevant tissues and cell types. Nat Genet, 50 (4) 621-629. doi:10.1038/s41588-018-0081-4. PMID 29632380
ABSTRACT
We introduce an approach to identify disease-relevant tissues and cell types by analyzing gene expression data together with genome-wide association study (GWAS) summary statistics. Our approach uses stratified linkage disequilibrium (LD) score regression to test whether disease heritability is enriched in regions surrounding genes with the highest specific expression in a given tissue. We applied our approach to gene expression data from several sources together with GWAS summary statistics for 48 diseases and traits (average N = 169,331) and found significant tissue-specific enrichments (false discovery rate (FDR) < 5%) for 34 traits. In our analysis of multiple tissues, we detected a broad range of enrichments that recapitulated known biology. In our brain-specific analysis, significant enrichments included an enrichment of inhibitory over excitatory neurons for bipolar disorder, and excitatory over inhibitory neurons for schizophrenia and body mass index. Our results demonstrate that our polygenic approach is a powerful way to leverage gene expression data for interpreting GWAS signals.
DOI
10.1038/s41588-018-0081-4
MAGMA
PUBMED_LINK
FULL NAME
Multi-marker Analysis of GenoMic Annotation
URL
TITLE
MAGMA: generalized gene-set analysis of GWAS data.
Main citation
de Leeuw CA, Mooij JM, Heskes T, Posthuma D. (2015) MAGMA: generalized gene-set analysis of GWAS data. PLoS Comput Biol, 11 (4) e1004219. doi:10.1371/journal.pcbi.1004219. PMID 25885710
ABSTRACT
By aggregating data for complex traits in a biologically meaningful way, gene and gene-set analysis constitute a valuable addition to single-marker analysis. However, although various methods for gene and gene-set analysis currently exist, they generally suffer from a number of issues. Statistical power for most methods is strongly affected by linkage disequilibrium between markers, multi-marker associations are often hard to detect, and the reliance on permutation to compute p-values tends to make the analysis computationally very expensive. To address these issues we have developed MAGMA, a novel tool for gene and gene-set analysis. The gene analysis is based on a multiple regression model, to provide better statistical performance. The gene-set analysis is built as a separate layer around the gene analysis for additional flexibility. This gene-set analysis also uses a regression structure to allow generalization to analysis of continuous properties of genes and simultaneous analysis of multiple gene sets and other gene properties. Simulations and an analysis of Crohn's Disease data are used to evaluate the performance of MAGMA and to compare it to a number of other gene and gene-set analysis tools. The results show that MAGMA has significantly more power than other tools for both the gene and the gene-set analysis, identifying more genes and gene sets associated with Crohn's Disease while maintaining a correct type 1 error rate. Moreover, the MAGMA analysis of the Crohn's Disease data was found to be considerably faster as well.
DOI
10.1371/journal.pcbi.1004219
RolyPoly
PUBMED_LINK
DESCRIPTION
RolyPoly is a regression-based polygenic model that can prioritize trait-relevant cell types and genes from GWAS summary statistics and gene expression data.
URL
TITLE
Inferring Relevant Cell Types for Complex Traits by Using Single-Cell Gene Expression.
Main citation
Calderon D, Bhaskar A, Knowles DA, Golan D, ...&, Pritchard JK. (2017) Inferring Relevant Cell Types for Complex Traits by Using Single-Cell Gene Expression. Am J Hum Genet, 101 (5) 686-699. doi:10.1016/j.ajhg.2017.09.009. PMID 29106824
ABSTRACT
Previous studies have prioritized trait-relevant cell types by looking for an enrichment of genome-wide association study (GWAS) signal within functional regions. However, these studies are limited in cell resolution by the lack of functional annotations from difficult-to-characterize or rare cell populations. Measurement of single-cell gene expression has become a popular method for characterizing novel cell types, and yet limited work has linked single-cell RNA sequencing (RNA-seq) to phenotypes of interest. To address this deficiency, we present RolyPoly, a regression-based polygenic model that can prioritize trait-relevant cell types and genes from GWAS summary statistics and gene expression data. RolyPoly is designed to use expression data from either bulk tissue or single-cell RNA-seq. In this study, we demonstrated RolyPoly's accuracy through simulation and validated previously known tissue-trait associations. We discovered a significant association between microglia and late-onset Alzheimer disease and an association between schizophrenia and oligodendrocytes and replicating fetal cortical cells. Additionally, RolyPoly computes a trait-relevance score for each gene to reflect the importance of expression specific to a cell type. We found that differentially expressed genes in the prefrontal cortex of individuals with Alzheimer disease were significantly enriched with genes ranked highly by RolyPoly gene scores. Overall, our method represents a powerful framework for understanding the effect of common variants on cell types contributing to complex traits.
DOI
10.1016/j.ajhg.2017.09.009
SCARlink
PUBMED_LINK
FULL NAME
single-cell ATAC + RNA linking
DESCRIPTION
Single-cell ATAC+RNA linking (SCARlink) uses multiomic single-cell ATAC and RNA to predict gene expression from chromatin accessibility and predict regulatory regions.
URL
KEYWORDS
Possion regression, scATAC, scRNA, tile-level accessibility
TITLE
Single-cell multi-ome regression models identify functional and disease-associated enhancers and enable chromatin potential analysis.
Main citation
Mitra S, Malik R, Wong W, Rahman A, ...&, Leslie CS. (2024) Single-cell multi-ome regression models identify functional and disease-associated enhancers and enable chromatin potential analysis. Nat Genet, 56 (4) 627-636. doi:10.1038/s41588-024-01689-8. PMID 38514783
ABSTRACT
We present a gene-level regulatory model, single-cell ATAC + RNA linking (SCARlink), which predicts single-cell gene expression and links enhancers to target genes using multi-ome (scRNA-seq and scATAC-seq co-assay) sequencing data. The approach uses regularized Poisson regression on tile-level accessibility data to jointly model all regulatory effects at a gene locus, avoiding the limitations of pairwise gene-peak correlations and dependence on peak calling. SCARlink outperformed existing gene scoring methods for imputing gene expression from chromatin accessibility across high-coverage multi-ome datasets while giving comparable to improved performance on low-coverage datasets. Shapley value analysis on trained models identified cell-type-specific gene enhancers that are validated by promoter capture Hi-C and are 11× to 15× and 5× to 12× enriched in fine-mapped eQTLs and fine-mapped genome-wide association study (GWAS) variants, respectively. We further show that SCARlink-predicted and observed gene expression vectors provide a robust way to compute a chromatin potential vector field to enable developmental trajectory analysis.
DOI
10.1038/s41588-024-01689-8
ARROW_SUMMARY
scRNA-seq + scATAC-seq → Tile-level chromatin accessibility modeling → Regularized Poisson regression (SCARlink) → Predict gene expression & link enhancers to genes → Identify functional and disease-associated enhancers
SCAVENGE
PUBMED_LINK
FULL NAME
Single Cell Analysis of Variant Enrichment through Network propagation of GEnomic data
URL
KEYWORDS
GWAS, scATAC, network propagation
TITLE
Variant to function mapping at single-cell resolution through network propagation.
Main citation
Yu F, Cato LD, Weng C, Liggett LA, ...&, Sankaran VG. (2022) Variant to function mapping at single-cell resolution through network propagation. Nat Biotechnol, 40 (11) 1644-1653. doi:10.1038/s41587-022-01341-y. PMID 35668323
ABSTRACT
Genome-wide association studies in combination with single-cell genomic atlases can provide insights into the mechanisms of disease-causal genetic variation. However, identification of disease-relevant or trait-relevant cell types, states and trajectories is often hampered by sparsity and noise, particularly in the analysis of single-cell epigenomic data. To overcome these challenges, we present SCAVENGE, a computational algorithm that uses network propagation to map causal variants to their relevant cellular context at single-cell resolution. We demonstrate how SCAVENGE can help identify key biological mechanisms underlying human genetic variation, applying the method to blood traits at distinct stages of human hematopoiesis, to monocyte subsets that increase the risk for severe Coronavirus Disease 2019 (COVID-19) and to intermediate lymphocyte developmental states that predispose to acute leukemia. Our approach not only provides a framework for enabling variant-to-function insights at single-cell resolution but also suggests a more general strategy for maximizing the inferences that can be made using single-cell genomic data.
DOI
10.1038/s41587-022-01341-y
SCENT
PUBMED_LINK
FULL NAME
single-cell enhancer target gene mapping
DESCRIPTION
SCENT uses single-cell multimodal data (e.g., 10X Multiome RNA/ATAC) and links ATAC-seq peaks (putative enhancers) to their target genes by modeling association between chromatin accessibility and gene expression across individual single cells.
URL
KEYWORDS
Possion regression, scATAC-seq, scRNA-seq
TITLE
Tissue-specific enhancer-gene maps from multimodal single-cell data identify causal disease alleles.
Main citation
Sakaue S, Weinand K, Isaac S, Dey KK, ...&, Raychaudhuri S. (2024) Tissue-specific enhancer-gene maps from multimodal single-cell data identify causal disease alleles. Nat Genet, 56 (4) 615-626. doi:10.1038/s41588-024-01682-1. PMID 38594305
ABSTRACT
Translating genome-wide association study (GWAS) loci into causal variants and genes requires accurate cell-type-specific enhancer-gene maps from disease-relevant tissues. Building enhancer-gene maps is essential but challenging with current experimental methods in primary human tissues. Here we developed a nonparametric statistical method, SCENT (single-cell enhancer target gene mapping), that models association between enhancer chromatin accessibility and gene expression in single-cell or nucleus multimodal RNA sequencing and ATAC sequencing data. We applied SCENT to 9 multimodal datasets including >120,000 single cells or nuclei and created 23 cell-type-specific enhancer-gene maps. These maps were highly enriched for causal variants in expression quantitative loci and GWAS for 1,143 diseases and traits. We identified likely causal genes for both common and rare diseases and linked somatic mutation hotspots to target genes. We demonstrate that application of SCENT to multimodal data from disease-relevant human tissue enables the scalable construction of accurate cell-type-specific enhancer-gene maps, essential for defining noncoding variant function.
DOI
10.1038/s41588-024-01682-1
ARROW_SUMMARY
Extract chromatin accessibility (ATAC-seq) & gene expression (RNA-seq) from single cells → Group cells by type → For each gene, define candidate enhancers within 1 Mb → Use distance-weighted non-parametric regression to model enhancer–gene associations → Assess significance via permutation testing → Build enhancer–gene links per cell type
TCSC
PUBMED_LINK
FULL NAME
Tissue co-regulation score regression
DESCRIPTION
TCSC is a statistical genetics method to identify causal tissues in diseases and complex traits. We leverage TWAS and GWAS summary statistics while explicitly modeling the genetic co-regulation of genes across tissues.
URL
TITLE
Modeling tissue co-regulation estimates tissue-specific contributions to disease.
Main citation
Amariuta T, Siewert-Rocks K, Price AL. (2023) Modeling tissue co-regulation estimates tissue-specific contributions to disease. Nat Genet, 55 (9) 1503-1511. doi:10.1038/s41588-023-01474-z. PMID 37580597
ABSTRACT
Integrative analyses of genome-wide association studies and gene expression data have implicated many disease-critical tissues. However, co-regulation of genetic effects on gene expression across tissues impedes distinguishing biologically causal tissues from tagging tissues. In the present study, we introduce tissue co-regulation score regression (TCSC), which disentangles causal tissues from tagging tissues by regressing gene-disease association statistics (from transcriptome-wide association studies) on tissue co-regulation scores, reflecting correlations of predicted gene expression across genes and tissues. We applied TCSC to 78 diseases/traits (average n = 302,000) and gene expression prediction models for 48 GTEx tissues. TCSC identified 21 causal tissue-trait pairs at a 5% false discovery rate (FDR), including well-established findings, biologically plausible new findings (for example, aorta artery and glaucoma) and increased specificity of known tissue-trait associations (for example, subcutaneous adipose, but not visceral adipose, and high-density lipoprotein). TCSC also identified 17 causal tissue-trait covariance pairs at 5% FDR. In conclusion, TCSC is a precise method for distinguishing causal tissues from tagging tissues.
DOI
10.1038/s41588-023-01474-z
cellAdmix
PUBMED_LINK
DESCRIPTION
cellAdmix detects and corrects segmentation errors in imaging-based spatial transcriptomics by factorizing local molecular neighborhoods—analogous to doublet removal in scRNA-seq—to reassign transcripts that spill across cell boundaries.
URL
KEYWORDS
spatial transcriptomics, segmentation, matrix factorization, imaging-based ST
TITLE
Impact and correction of segmentation errors in spatial transcriptomics.
Main citation
Mitchel J, Gao T, Petukhov V, Cole E, ...&, Kharchenko PV. (2026) Impact and correction of segmentation errors in spatial transcriptomics. Nat Genet, 58 (2) 434-444. doi:10.1038/s41588-025-02497-4. PMID 41559218
ABSTRACT
Spatial transcriptomics aims to elucidate how cells coordinate within tissues by connecting cellular states to their native microenvironments. Imaging-based assays are especially promising, capturing molecular and cellular features at subcellular resolution in three dimensions. Interpretation of such data, however, hinges on accurate cell segmentation. Assigning individual molecules to the correct cells remains challenging. Here we re-analyze data from multiple tissues and platforms to find that segmentation errors currently confound most downstream analysis of cellular state, including differential expression, neighbor influence and ligand-receptor interactions. The extent to which misassigned molecules impact the results can be striking, frequently dominating the results. Thus, we show that matrix factorization of local molecular neighborhoods can effectively identify and isolate such molecular admixtures, thereby reducing their impact on downstream analyses, in a manner analogous to doublet filtering in single-cell RNA sequencing. As the applications of spatial transcriptomics assays become more widespread, accounting for segmentation errors will be important for resolving molecular mechanisms of tissue biology.
DOI
10.1038/s41588-025-02497-4
gsMap
PUBMED_LINK
FULL NAME
genetically informed spatial mapping of cells for complex traits
DESCRIPTION
gsMap (genetically informed spatial mapping of cells for complex traits) integrates spatial transcriptomics (ST) data with genome-wide association study (GWAS) summary statistics to map cells to human complex traits, including diseases, in a spatially resolved manner.
URL
KEYWORDS
spatial transciptomics
TITLE
Spatially resolved mapping of cells associated with human complex traits.
Main citation
Song L, Chen W, Hou J, Guo M, ...&, Yang J. (2025) Spatially resolved mapping of cells associated with human complex traits. Nature, 641 (8064) 932-941. doi:10.1038/s41586-025-08757-x. PMID 40108460
ABSTRACT
Depicting spatial distributions of disease-relevant cells is crucial for understanding disease pathology1,2. Here we present genetically informed spatial mapping of cells for complex traits (gsMap), a method that integrates spatial transcriptomics data with summary statistics from genome-wide association studies to map cells to human complex traits, including diseases, in a spatially resolved manner. Using embryonic spatial transcriptomics datasets covering 25 organs, we benchmarked gsMap through simulation and by corroborating known trait-associated cells or regions in various organs. Applying gsMap to brain spatial transcriptomics data, we reveal that the spatial distribution of glutamatergic neurons associated with schizophrenia more closely resembles that for cognitive traits than that for mood traits such as depression. The schizophrenia-associated glutamatergic neurons were distributed near the dorsal hippocampus, with upregulated expression of calcium signalling and regulation genes, whereas depression-associated glutamatergic neurons were distributed near the deep medial prefrontal cortex, with upregulated expression of neuroplasticity and psychiatric drug target genes. Our study provides a method for spatially resolved mapping of trait-associated cells and demonstrates the gain of biological insights (such as the spatial distribution of trait-relevant cells and related signature genes) through these maps.
DOI
10.1038/s41586-025-08757-x
ARROW_SUMMARY
Spatial transcriptomics data + GWAS summary statistics → Graph Neural Network identifies homogeneous spatial domains → Compute Gene Specificity Scores (GSS) for each spot → Map GSS to nearby SNPs → Perform Stratified LD Score Regression (S-LDSC) to assess trait heritability enrichment → Aggregate spot-level p-values using the Cauchy Combination Test to identify trait-associated spatial regions
pgBoost
DESCRIPTION
pgBoost is an integrative modeling framework that trains a non-linear combination of existing linking strategies (including genomic distance) on fine-mapped eQTL data to assign a probabilistic score to each candidate SNP-gene link.
URL
KEYWORDS
eQTL-informed gradient boosting
PREPRINT_DOI
10.1101/2024.05.24.24307813
Main citation
Dorans, E. R., Jagadeesh, K., Dey, K., & Price, A. L. (2024). Linking regulatory variants to target genes by integrating single-cell multiome methods and genomic distance. medRxiv, 2024-05.
sc-linker
PUBMED_LINK
DESCRIPTION
a framework for integrating single-cell RNA-sequencing, epigenomic SNP-to-gene maps and genome-wide association study summary statistics to infer the underlying cell types and processes by which genetic variants influence disease
URL
KEYWORDS
GWAS, scRNA-seq
TITLE
Identifying disease-critical cell types and cellular processes by integrating single-cell RNA-sequencing and human genetics.
Main citation
Jagadeesh KA, Dey KK, Montoro DT, Mohan R, ...&, Regev A. (2022) Identifying disease-critical cell types and cellular processes by integrating single-cell RNA-sequencing and human genetics. Nat Genet, 54 (10) 1479-1492. doi:10.1038/s41588-022-01187-9. PMID 36175791
ABSTRACT
Genome-wide association studies provide a powerful means of identifying loci and genes contributing to disease, but in many cases, the related cell types/states through which genes confer disease risk remain unknown. Deciphering such relationships is important for identifying pathogenic processes and developing therapeutics. In the present study, we introduce sc-linker, a framework for integrating single-cell RNA-sequencing, epigenomic SNP-to-gene maps and genome-wide association study summary statistics to infer the underlying cell types and processes by which genetic variants influence disease. The inferred disease enrichments recapitulated known biology and highlighted notable cell-disease relationships, including γ-aminobutyric acid-ergic neurons in major depressive disorder, a disease-dependent M-cell program in ulcerative colitis and a disease-specific complement cascade process in multiple sclerosis. In autoimmune disease, both healthy and disease-dependent immune cell-type programs were associated, whereas only disease-dependent epithelial cell programs were prominent, suggesting a role in disease response rather than initiation. Our framework provides a powerful approach for identifying the cell types and cellular processes by which genetic variants influence disease.
DOI
10.1038/s41588-022-01187-9
ARROW_SUMMARY
scRNA-seq data →️ Derive cell-type-specific gene programs →️ Map SNPs to genes using epigenomic data →️ Integrate with GWAS summary statistics →️ Identify disease-critical cell types and processes
scDRS
PUBMED_LINK
FULL NAME
single-cell Disease Relevance Score
DESCRIPTION
an approach that links scRNA-seq with polygenic disease risk at single-cell resolution, independent of annotated cell types
URL
KEYWORDS
GWAS, scRNA-seq
TITLE
Polygenic enrichment distinguishes disease associations of individual cells in single-cell RNA-seq data.
Main citation
Zhang MJ, Hou K, Dey KK, Sakaue S, ...&, Price AL. (2022) Polygenic enrichment distinguishes disease associations of individual cells in single-cell RNA-seq data. Nat Genet, 54 (10) 1572-1580. doi:10.1038/s41588-022-01167-z. PMID 36050550
ABSTRACT
Single-cell RNA sequencing (scRNA-seq) provides unique insights into the pathology and cellular origin of disease. We introduce single-cell disease relevance score (scDRS), an approach that links scRNA-seq with polygenic disease risk at single-cell resolution, independent of annotated cell types. scDRS identifies cells exhibiting excess expression across disease-associated genes implicated by genome-wide association studies (GWASs). We applied scDRS to 74 diseases/traits and 1.3 million single-cell gene-expression profiles across 31 tissues/organs. Cell-type-level results broadly recapitulated known cell-type-disease associations. Individual-cell-level results identified subpopulations of disease-associated cells not captured by existing cell-type labels, including T cell subpopulations associated with inflammatory bowel disease, partially characterized by their effector-like states; neuron subpopulations associated with schizophrenia, partially characterized by their spatial locations; and hepatocyte subpopulations associated with triglyceride levels, partially characterized by their higher ploidy levels. Genes whose expression was correlated with the scDRS score across cells (reflecting coexpression with GWAS disease-associated genes) were strongly enriched for gold-standard drug target and Mendelian disease genes.
DOI
10.1038/s41588-022-01167-z
ARROW_SUMMARY
GWAS summary statistics → Select putative disease genes via MAGMA → Compute scDRS using Monte Carlo-based score aggregation → Normalize with control gene sets → Rank cells by disease relevance → Identify enriched subpopulations and co-expressed gene networks
scGWAS
PUBMED_LINK
FULL NAME
scRNA-seq assisted GWAS analysis
DESCRIPTION
scGWAS leverages scRNA-seq data to identify the genetically mediated associations between traits and cell types.
URL
TITLE
scGWAS: landscape of trait-cell type associations by integrating single-cell transcriptomics-wide and genome-wide association studies.
Main citation
Jia P, Hu R, Yan F, Dai Y, ...&, Zhao Z. (2022) scGWAS: landscape of trait-cell type associations by integrating single-cell transcriptomics-wide and genome-wide association studies. Genome Biol, 23 (1) 220. doi:10.1186/s13059-022-02785-w. PMID 36253801
ABSTRACT
BACKGROUND: The rapid accumulation of single-cell RNA sequencing (scRNA-seq) data presents unique opportunities to decode the genetically mediated cell-type specificity in complex diseases. Here, we develop a new method, scGWAS, which effectively leverages scRNA-seq data to achieve two goals: (1) to infer the cell types in which the disease-associated genes manifest and (2) to construct cellular modules which imply disease-specific activation of different processes. RESULTS: scGWAS only utilizes the average gene expression for each cell type followed by virtual search processes to construct the null distributions of module scores, making it scalable to large scRNA-seq datasets. We demonstrated scGWAS in 40 genome-wide association studies (GWAS) datasets (average sample size N ≈ 154,000) using 18 scRNA-seq datasets from nine major human/mouse tissues (totaling 1.08 million cells) and identified 2533 trait and cell-type associations, each with significant modules for further investigation. The module genes were validated using disease or clinically annotated references from ClinVar, OMIM, and pLI variants. CONCLUSIONS: We showed that the trait-cell type associations identified by scGWAS, while generally constrained to trait-tissue associations, could recapitulate many well-studied relationships and also reveal novel relationships, providing insights into the unsolved trait-tissue associations. Moreover, in each specific cell type, the associations with different traits were often mediated by different sets of risk genes, implying disease-specific activation of driving processes. In summary, scGWAS is a powerful tool for exploring the genetic basis of complex diseases at the cell type level using single-cell expression data.
DOI
10.1186/s13059-022-02785-w
seismic
PUBMED_LINK
FULL NAME
Single-cell Expression Integration System for Mapping genetically Implicated Cell types
DESCRIPTION
R framework that links GWAS signals to single-cell-defined cell types via a cell-type gene specificity score (expression magnitude and consistency) and regression on gene-level association statistics, with influential-gene follow-up for interpretability.
URL
KEYWORDS
GWAS, scRNA-seq, cell type, MAGMA, post-GWAS interpretation
TITLE
Disentangling associations between complex traits and cell types with seismic.
Main citation
Lai Q, Dannenfelser R, Roussarie JP, Yao V. (2025) Disentangling associations between complex traits and cell types with seismic. Nat Commun, 16 (1) 8744. doi:10.1038/s41467-025-63753-z. PMID 41034207
ABSTRACT
Integrating single-cell RNA sequencing with Genome-Wide Association Studies (GWAS) can uncover cell types involved in complex traits and disease. However, current methods often lack scalability, interpretability, and robustness. We present seismic, a framework that computes a novel specificity score capturing both expression magnitude and consistency across cell types and introduces influential gene analysis, an approach to identify genes driving each cell type-trait association. Across over 1000 cell-type characterizations at different granularities and 28 polygenic traits, seismic corroborates known associations and uncovers trait-relevant cell groups not apparent through other methodologies. In Parkinson's and Alzheimer's, seismic unveils both cell- and brain-region-specific differences in pathology. Analyzing a pathology-based Alzheimer's GWAS with seismic enables the identification of vulnerable neuron populations and molecular pathways implicated in their neurodegeneration. In general, seismic is a computationally efficient, powerful, and interpretable approach for mapping the relationships between polygenic traits and cell-type-specific expression, offering new insights into disease mechanisms.
DOI
10.1038/s41467-025-63753-z