Skip to content

Tools Population Genetics Selection

Curation of Selection within Population Genetics — listings under the GWAS Tools tab.

Summary Table

Click a column header to sort the table.

NAME Main citation YEAR
AGES
Akbari A et al., Nature, 2026
2026
CMS
Grossman SR et al., Science, 2010
2010
EHH
Klassmann A et al., PLoS One, 2022
2022
GeneBayes
Zeng T et al., Nat Genet, 2024
2024
HWE
Wigginton JE et al., Am J Hum Genet, 2005
2005
Review-Fst
Holsinger KE et al., Nat Rev Genet, 2009
2009
SDS
Field Y et al., Science, 2016
2016
XP-EHH
Klassmann A et al., PLoS One, 2022
2022
f
Moon S et al., Genome Res, 2016
2016
iHS
Voight BF et al., PLoS Biol, 2006
2006

AGES

Tool
FULL NAME
Ancient GEnome Selection
DESCRIPTION
Akbari, A., Perry, A., Barton, A. R., Kariminejad, M., Gazal, S., Li, Z., ... & Reich, D. (2026). Ancient DNA reveals pervasive directional selection across West Eurasia. Nature.
URL
http://reich-ages.rc.hms.harvard.edu
KEYWORDS
ancient DNA, directional selection, time series, allele frequency
USE
AGES detects directional selection in ancient DNA time-series data by testing whether allele frequencies show consistent temporal trends after accounting for structure and migration-related confounding.
TITLE
Ancient DNA reveals pervasive directional selection across West Eurasia.
Main citation
Akbari A, Perry A, Barton AR, Kariminejad M, Gazal S, Li Z, ... Reich D. (2026) Ancient DNA reveals pervasive directional selection across West Eurasia. Nature. doi:10.1038/s41586-026-10358-1.
ABSTRACT
The study introduces a method for detecting directional selection from ancient DNA time-series by testing for consistent allele-frequency changes over time across 15,836 West Eurasians. The framework estimates selection coefficients genome-wide and helps distinguish sustained adaptive change from shifts caused by migration, structure, or non-adaptive forces.
DOI
10.1038/s41586-026-10358-1

CMS

Tool
PUBMED_LINK
20056855
FULL NAME
Composite of multiple signals
DESCRIPTION
Grossman, S. R., Shylakhter, I., Karlsson, E. K., Byrne, E. H., Morales, S., Frieden, G., ... & Sabeti, P. C. (2010). A composite of multiple signals distinguishes causal variants in regions of positive selection. Science, 327(5967), 883-886.
TITLE
A composite of multiple signals distinguishes causal variants in regions of positive selection.
Main citation
Grossman SR, Shlyakhter I, Karlsson EK, Byrne EH, ...&, Sabeti PC. (2010) A composite of multiple signals distinguishes causal variants in regions of positive selection. Science, 327 (5967) 883-6. doi:10.1126/science.1183863. PMID 20056855
ABSTRACT
The human genome contains hundreds of regions whose patterns of genetic variation indicate recent positive natural selection, yet for most the underlying gene and the advantageous mutation remain unknown. We developed a method, composite of multiple signals (CMS), that combines tests for multiple signals of selection and increases resolution by up to 100-fold. By applying CMS to candidate regions from the International Haplotype Map, we localized population-specific selective signals to 55 kilobases (median), identifying known and novel causal variants. CMS can not just identify individual loci but implicates precise variants selected by evolution.
DOI
10.1126/science.1183863

EHH

Tool
PUBMED_LINK
35041674
FULL NAME
Extended haplotype homozygosity
DESCRIPTION
Sabeti, P. C., Reich, D. E., Higgins, J. M., Levine, H. Z., Richter, D. J., Schaffner, S. F., ... & Lander, E. S. (2002). Detecting recent positive selection in the human genome from haplotype structure. Nature, 419(6909), 832-837.
TITLE
Detecting selection using extended haplotype homozygosity (EHH)-based statistics in unphased or unpolarized data.
Main citation
Klassmann A, Gautier M. (2022) Detecting selection using extended haplotype homozygosity (EHH)-based statistics in unphased or unpolarized data. PLoS One, 17 (1) e0262024. doi:10.1371/journal.pone.0262024. PMID 35041674
ABSTRACT
Analysis of population genetic data often includes a search for genomic regions with signs of recent positive selection. One of such approaches involves the concept of extended haplotype homozygosity (EHH) and its associated statistics. These statistics typically require phased haplotypes, and some of them necessitate polarized variants. Here, we unify and extend previously proposed modifications to loosen these requirements. We compare the modified versions with the original ones by measuring the false discovery rate in simulated whole-genome scans and by quantifying the overlap of inferred candidate regions in empirical data. We find that phasing information is indispensable for accurate estimation of within-population statistics (for all but very large samples) and of cross-population statistics for small samples. Ancestry information, in contrast, is of lesser importance for both types of statistic. Our publicly available R package rehh incorporates the modified statistics presented here.
DOI
10.1371/journal.pone.0262024

GeneBayes

Tool
FULL NAME
Bayesian estimation of gene constraint from an evolutionary model with gene features
DESCRIPTION
Zeng, T., Spence, J. P., Mostafavi, H., & Pritchard, J. K. (2024). Bayesian estimation of gene constraint from an evolutionary model with gene features. Nature Genetics, 56, 1632-1643.
URL
https://github.com/tkzeng/GeneBayes
KEYWORDS
gene constraint, Bayesian inference, selection coefficient, loss-of-function, s_het
USE
GeneBayes estimates gene-level selective constraint (s_het) by combining an evolutionary population genetics model with machine learning on gene features, improving constraint inference for short genes.
TITLE
Bayesian estimation of gene constraint from an evolutionary model with gene features.
Main citation
Zeng T, Spence JP, Mostafavi H, Pritchard JK. (2024) Bayesian estimation of gene constraint from an evolutionary model with gene features. Nature Genetics, 56, 1632-1643. doi:10.1038/s41588-024-01820-9.
ABSTRACT
This study introduces GeneBayes, a framework that integrates an evolutionary model with gene features to estimate gene-level selective constraint. The method improves inference of the interpretable constraint metric s_het, especially for short genes, and outperforms existing metrics for prioritizing genes relevant to essentiality and human disease.
DOI
10.1038/s41588-024-01820-9

HWE

Tool
PUBMED_LINK
15789306
FULL NAME
Exact Tests of Hardy-Weinberg Equilibrium
DESCRIPTION
Wigginton, J. E., Cutler, D. J., & Abecasis, G. R. (2005). A note on exact tests of Hardy-Weinberg equilibrium. The American Journal of Human Genetics, 76(5), 887-893.
TITLE
A note on exact tests of Hardy-Weinberg equilibrium.
Main citation
Wigginton JE, Cutler DJ, Abecasis GR. (2005) A note on exact tests of Hardy-Weinberg equilibrium. Am J Hum Genet, 76 (5) 887-93. doi:10.1086/429864. PMID 15789306
ABSTRACT
Deviations from Hardy-Weinberg equilibrium (HWE) can indicate inbreeding, population stratification, and even problems in genotyping. In samples of affected individuals, these deviations can also provide evidence for association. Tests of HWE are commonly performed using a simple chi2 goodness-of-fit test. We show that this chi2 test can have inflated type I error rates, even in relatively large samples (e.g., samples of 1,000 individuals that include approximately 100 copies of the minor allele). On the basis of previous work, we describe exact tests of HWE together with efficient computational methods for their implementation. Our methods adequately control type I error in large and small samples and are computationally efficient. They have been implemented in freely available code that will be useful for quality assessment of genotype data and for the detection of genetic association or population stratification in very large data sets.
DOI
10.1086/429864

Review-Fst

Tool
PUBMED_LINK
19687804
DESCRIPTION
Holsinger, K. E., & Weir, B. S. (2009). Genetics in geographically structured populations: defining, estimating and interpreting F ST. Nature Reviews Genetics, 10(9), 639-650.
TITLE
Genetics in geographically structured populations: defining, estimating and interpreting F(ST).
Main citation
Holsinger KE, Weir BS. (2009) Genetics in geographically structured populations: defining, estimating and interpreting F(ST). Nat Rev Genet, 10 (9) 639-50. doi:10.1038/nrg2611. PMID 19687804
ABSTRACT
Wright's F-statistics, and especially F(ST), provide important insights into the evolutionary processes that influence the structure of genetic variation within and among populations, and they are among the most widely used descriptive statistics in population and evolutionary genetics. Estimates of F(ST) can identify regions of the genome that have been the target of selection, and comparisons of F(ST) from different parts of the genome can provide insights into the demographic history of populations. For these reasons and others, F(ST) has a central role in population and evolutionary genetics and has wide applications in fields that range from disease association mapping to forensic science. This Review clarifies how F(ST) is defined, how it should be estimated, how it is related to similar statistics and how estimates of F(ST) should be interpreted.
DOI
10.1038/nrg2611

SDS

Tool
PUBMED_LINK
27738015
FULL NAME
singleton density score
DESCRIPTION
Field, Y., Boyle, E. A., Telis, N., Gao, Z., Gaulton, K. J., Golan, D., ... & Pritchard, J. K. (2016). Detection of human adaptation during the past 2000 years. Science, 354(6313), 760-764.
URL
https://github.com/yairf/SDS
KEYWORDS
singleton, recent selection
USE
SDS is a method to infer very recent changes in allele frequencies from contemporary genome sequences
TITLE
Detection of human adaptation during the past 2000 years.
Main citation
Field Y, Boyle EA, Telis N, Gao Z, ...&, Pritchard JK. (2016) Detection of human adaptation during the past 2000 years. Science, 354 (6313) 760-764. doi:10.1126/science.aag0776. PMID 27738015
ABSTRACT
Detection of recent natural selection is a challenging problem in population genetics. Here we introduce the singleton density score (SDS), a method to infer very recent changes in allele frequencies from contemporary genome sequences. Applied to data from the UK10K Project, SDS reflects allele frequency changes in the ancestors of modern Britons during the past ~2000 to 3000 years. We see strong signals of selection at lactase and the major histocompatibility complex, and in favor of blond hair and blue eyes. For polygenic adaptation, we find that recent selection for increased height has driven allele frequency shifts across most of the genome. Moreover, we identify shifts associated with other complex traits, suggesting that polygenic adaptation has played a pervasive role in shaping genotypic and phenotypic variation in modern humans.
DOI
10.1126/science.aag0776

XP-EHH

Tool
PUBMED_LINK
35041674
FULL NAME
Cross-population extended haplotype homozygosity
DESCRIPTION
Klassmann, A., & Gautier, M. (2022). Detecting selection using extended haplotype homozygosity (EHH)-based statistics in unphased or unpolarized data. PloS one, 17(1), e0262024.
TITLE
Detecting selection using extended haplotype homozygosity (EHH)-based statistics in unphased or unpolarized data.
Main citation
Klassmann A, Gautier M. (2022) Detecting selection using extended haplotype homozygosity (EHH)-based statistics in unphased or unpolarized data. PLoS One, 17 (1) e0262024. doi:10.1371/journal.pone.0262024. PMID 35041674
ABSTRACT
Analysis of population genetic data often includes a search for genomic regions with signs of recent positive selection. One of such approaches involves the concept of extended haplotype homozygosity (EHH) and its associated statistics. These statistics typically require phased haplotypes, and some of them necessitate polarized variants. Here, we unify and extend previously proposed modifications to loosen these requirements. We compare the modified versions with the original ones by measuring the false discovery rate in simulated whole-genome scans and by quantifying the overlap of inferred candidate regions in empirical data. We find that phasing information is indispensable for accurate estimation of within-population statistics (for all but very large samples) and of cross-population statistics for small samples. Ancestry information, in contrast, is of lesser importance for both types of statistic. Our publicly available R package rehh incorporates the modified statistics presented here.
DOI
10.1371/journal.pone.0262024

f

Tool
PUBMED_LINK
27197222
FULL NAME
fraction of sites under selection
DESCRIPTION
Moon, S., & Akey, J. M. (2016). A flexible method for estimating the fraction of fitness influencing mutations from large sequencing data sets. Genome Research, 26(6), 834-843.
TITLE
A flexible method for estimating the fraction of fitness influencing mutations from large sequencing data sets.
Main citation
Moon S, Akey JM. (2016) A flexible method for estimating the fraction of fitness influencing mutations from large sequencing data sets. Genome Res, 26 (6) 834-43. doi:10.1101/gr.203059.115. PMID 27197222
ABSTRACT
A continuing challenge in the analysis of massively large sequencing data sets is quantifying and interpreting non-neutrally evolving mutations. Here, we describe a flexible and robust approach based on the site frequency spectrum to estimate the fraction of deleterious and adaptive variants from large-scale sequencing data sets. We applied our method to approximately 1 million single nucleotide variants (SNVs) identified in high-coverage exome sequences of 6515 individuals. We estimate that the fraction of deleterious nonsynonymous SNVs is higher than previously reported; quantify the effects of genomic context, codon bias, chromatin accessibility, and number of protein-protein interactions on deleterious protein-coding SNVs; and identify pathways and networks that have likely been influenced by positive selection. Furthermore, we show that the fraction of deleterious nonsynonymous SNVs is significantly higher for Mendelian versus complex disease loci and in exons harboring dominant versus recessive Mendelian mutations. In summary, as genome-scale sequencing data accumulate in progressively larger sample sizes, our method will enable increasingly high-resolution inferences into the characteristics and determinants of non-neutral variation.
DOI
10.1101/gr.203059.115

iHS

Tool
PUBMED_LINK
16494531
FULL NAME
Integrated haplotype score
DESCRIPTION
Voight, B. F., Kudaravalli, S., Wen, X., & Pritchard, J. K. (2006). A map of recent positive selection in the human genome. PLoS biology, 4(3), e72.
TITLE
A map of recent positive selection in the human genome.
Main citation
Voight BF, Kudaravalli S, Wen X, Pritchard JK. (2006) A map of recent positive selection in the human genome. PLoS Biol, 4 (3) e72. doi:10.1371/journal.pbio.0040072. PMID 16494531
ABSTRACT
The identification of signals of very recent positive selection provides information about the adaptation of modern humans to local conditions. We report here on a genome-wide scan for signals of very recent positive selection in favor of variants that have not yet reached fixation. We describe a new analytical method for scanning single nucleotide polymorphism (SNP) data for signals of recent selection, and apply this to data from the International HapMap Project. In all three continental groups we find widespread signals of recent positive selection. Most signals are region-specific, though a significant excess are shared across groups. Contrary to some earlier low resolution studies that suggested a paucity of recent selection in sub-Saharan Africans, we find that by some measures our strongest signals of selection are from the Yoruba population. Finally, since these signals indicate the existence of genetic variants that have substantially different fitnesses, they must indicate loci that are the source of significant phenotypic variation. Though the relevant phenotypes are generally not known, such loci should be of particular interest in mapping studies of complex traits. For this purpose we have developed a set of SNPs that can be used to tag the strongest approximately 250 signals of recent selection in each population.
DOI
10.1371/journal.pbio.0040072