Genealogy
Summary Table
NAME | CITATION | YEAR |
---|---|---|
ARG-Needle | Zhang BC, Biddanda A, Gunnarsson ÁF, Cooper F, ...&, Palamara PF. (2023) Biobank-scale inference of ancestral recombination graphs enables genealogical analysis of complex traits Nat. Genet., 55 (5) 768-776. doi:10.1038/s41588-023-01379-x. PMID 37127670 | 2023 |
ARGinfer | Mahmoudi A, Koskela J, Kelleher J, Chan YB, ...&, Balding D. (2022) Bayesian inference of ancestral recombination graphs PLoS Comput. Biol., 18 (3) e1009960. doi:10.1371/journal.pcbi.1009960. PMID 35263345 | 2022 |
ARGweaver-D | Hubisz MJ, Williams AL, Siepel A. (2020) Mapping gene flow between ancient hominins through demography-aware inference of the ancestral recombination graph PLoS Genet., 16 (8) e1008895. doi:10.1371/journal.pgen.1008895. PMID 32760067 | 2020 |
ARGweaver | Rasmussen MD, Hubisz MJ, Gronau I, Siepel A. (2014) Genome-wide inference of ancestral recombination graphs PLoS Genet., 10 (5) e1004342. doi:10.1371/journal.pgen.1004342. PMID 24831947 | 2014 |
ASMC | Palamara PF, Terhorst J, Song YS, Price AL. (2018) High-throughput inference of pairwise coalescence times identifies signals of selection and enriched disease heritability Nat. Genet., 50 (9) 1311-1317. doi:10.1038/s41588-018-0177-x. PMID 30104759 | 2018 |
Arbores | Heine K, Beskos A, Jasra A, Balding D, ...&, De Iorio M. (2018) Bridging trees for posterior inference on ancestral recombination graphs Proc. Math. Phys. Eng. Sci., 474 (2220) 20180568. doi:10.1098/rspa.2018.0568. PMID 30602937 | 2018 |
KwARG | Ignatieva A, Lyngsø RB, Jenkins PA, Hein J. (2021) KwARG: parsimonious reconstruction of ancestral recombination graphs with recurrent mutation Bioinformatics, 37 (19) 3277-3284. doi:10.1093/bioinformatics/btab351. PMID 33970217 | 2021 |
PSMC | Li H, Durbin R. (2011) Inference of human population history from individual whole-genome sequences Nature, 475 (7357) 493-496. doi:10.1038/nature10231. PMID 21753753 | 2011 |
RENT+ | Mirzaei S, Wu Y. (2017) RENT+: an improved method for inferring local genealogical trees from haplotypes with recombination Bioinformatics, 33 (7) 1021-1030. doi:10.1093/bioinformatics/btw735. PMID 28065901 | 2017 |
Relate | Speidel L, Forest M, Shi S, Myers SR. (2019) A method for genome-wide genealogy estimation for thousands of samples Nat. Genet., 51 (9) 1321-1329. doi:10.1038/s41588-019-0484-x. PMID 31477933 | 2019 |
SARGE | Schaefer NK, Shapiro B, Green RE. (2021) An ancestral recombination graph of human, Neanderthal, and Denisovan genomes Sci. Adv., 7 (29) eabc0776. doi:10.1126/sciadv.abc0776. PMID 34272242 | 2021 |
tsdate | Wohns AW, Wong Y, Jeffery B, Akbari A, ...&, McVean G. (2022) A unified genealogy of modern and ancient genomes Science, 375 (6583) eabi8264. doi:10.1126/science.abi8264. PMID 35201891 | 2022 |
tsinfer | Kelleher J, Wong Y, Wohns AW, Fadil C, ...&, McVean G. (2019) Inferring whole-genome histories in large population datasets Nat. Genet., 51 (9) 1330-1338. doi:10.1038/s41588-019-0483-y. PMID 31477934 | 2019 |
ARG-Needle
- NAME : ARG-Needle
- SHORT NAME : ARG-Needle
- URL : https://palamaralab.github.io/software/argneedle/
- TITLE : Biobank-scale inference of ancestral recombination graphs enables genealogical analysis of complex traits
- DOI : 10.1038/s41588-023-01379-x
- ABSTRACT : Genome-wide genealogies compactly represent the evolutionary history of a set of genomes and inferring them from genetic data has the potential to facilitate a wide range of analyses. We introduce a method, ARG-Needle, for accurately inferring biobank-scale genealogies from sequencing or genotyping array data, as well as strategies to utilize genealogies to perform association and other complex trait analyses. We use these methods to build genome-wide genealogies using genotyping data for 337,464 UK Biobank individuals and test for association across seven complex traits. Genealogy-based association detects more rare and ultra-rare signals (N = 134, frequency range 0.0007-0.1%) than genotype imputation using ~65,000 sequenced haplotypes (N = 64). In a subset of 138,039 exome sequencing samples, these associations strongly tag (average r = 0.72) underlying sequencing variants enriched (4.8×) for loss-of-function variation. These results demonstrate that inferred genome-wide genealogies may be leveraged in the analysis of complex traits, complementing approaches that require the availability of large, population-specific sequencing panels.
- CITATION : Zhang BC, Biddanda A, Gunnarsson ÁF, Cooper F, ...&, Palamara PF. (2023) Biobank-scale inference of ancestral recombination graphs enables genealogical analysis of complex traits Nat. Genet., 55 (5) 768-776. doi:10.1038/s41588-023-01379-x. PMID 37127670
- JOURNAL_INFO : Nature genetics ; Nat. Genet. ; 2023 ; 55 ; 5 ; 768-776
- PUBMED_LINK : 37127670
ARGinfer
- NAME : ARGinfer
- SHORT NAME : ARGinfer
- TITLE : Bayesian inference of ancestral recombination graphs
- DOI : 10.1371/journal.pcbi.1009960
- ABSTRACT : We present a novel algorithm, implemented in the software ARGinfer, for probabilistic inference of the Ancestral Recombination Graph under the Coalescent with Recombination. Our Markov Chain Monte Carlo algorithm takes advantage of the Succinct Tree Sequence data structure that has allowed great advances in simulation and point estimation, but not yet probabilistic inference. Unlike previous methods, which employ the Sequentially Markov Coalescent approximation, ARGinfer uses the Coalescent with Recombination, allowing more accurate inference of key evolutionary parameters. We show using simulations that ARGinfer can accurately estimate many properties of the evolutionary history of the sample, including the topology and branch lengths of the genealogical tree at each sequence site, and the times and locations of mutation and recombination events. ARGinfer approximates posterior probability distributions for these and other quantities, providing interpretable assessments of uncertainty that we show to be well calibrated. ARGinfer is currently limited to tens of DNA sequences of several hundreds of kilobases, but has scope for further computational improvements to increase its applicability.
- COPYRIGHT : http://creativecommons.org/licenses/by/4.0/
- CITATION : Mahmoudi A, Koskela J, Kelleher J, Chan YB, ...&, Balding D. (2022) Bayesian inference of ancestral recombination graphs PLoS Comput. Biol., 18 (3) e1009960. doi:10.1371/journal.pcbi.1009960. PMID 35263345
- JOURNAL_INFO : PLoS computational biology ; PLoS Comput. Biol. ; 2022 ; 18 ; 3 ; e1009960
- PUBMED_LINK : 35263345
ARGweaver
- NAME : ARGweaver
- SHORT NAME : ARGweaver
- DESCRIPTION : The ARGweaver software package contains programs and libraries for sampling and manipulating ancestral recombination graphs (ARGs). An ARG is a rich data structure for representing the ancestry of DNA sequences undergoing coalescence and recombination.
- URL : https://github.com/mdrasmus/argweaver
- TITLE : Genome-wide inference of ancestral recombination graphs
- DOI : 10.1371/journal.pgen.1004342
- ABSTRACT : The complex correlation structure of a collection of orthologous DNA sequences is uniquely captured by the "ancestral recombination graph" (ARG), a complete record of coalescence and recombination events in the history of the sample. However, existing methods for ARG inference are computationally intensive, highly approximate, or limited to small numbers of sequences, and, as a consequence, explicit ARG inference is rarely used in applied population genomics. Here, we introduce a new algorithm for ARG inference that is efficient enough to apply to dozens of complete mammalian genomes. The key idea of our approach is to sample an ARG of [Formula: see text] chromosomes conditional on an ARG of [Formula: see text] chromosomes, an operation we call "threading." Using techniques based on hidden Markov models, we can perform this threading operation exactly, up to the assumptions of the sequentially Markov coalescent and a discretization of time. An extension allows for threading of subtrees instead of individual sequences. Repeated application of these threading operations results in highly efficient Markov chain Monte Carlo samplers for ARGs. We have implemented these methods in a computer program called ARGweaver. Experiments with simulated data indicate that ARGweaver converges rapidly to the posterior distribution over ARGs and is effective in recovering various features of the ARG for dozens of sequences generated under realistic parameters for human populations. In applications of ARGweaver to 54 human genome sequences from Complete Genomics, we find clear signatures of natural selection, including regions of unusually ancient ancestry associated with balancing selection and reductions in allele age in sites under directional selection. The patterns we observe near protein-coding genes are consistent with a primary influence from background selection rather than hitchhiking, although we cannot rule out a contribution from recurrent selective sweeps.
- CITATION : Rasmussen MD, Hubisz MJ, Gronau I, Siepel A. (2014) Genome-wide inference of ancestral recombination graphs PLoS Genet., 10 (5) e1004342. doi:10.1371/journal.pgen.1004342. PMID 24831947
- JOURNAL_INFO : PLoS genetics ; PLoS Genet. ; 2014 ; 10 ; 5 ; e1004342
- PUBMED_LINK : 24831947
ARGweaver-D
- NAME : ARGweaver-D
- SHORT NAME : ARGweaver-D
- TITLE : Mapping gene flow between ancient hominins through demography-aware inference of the ancestral recombination graph
- DOI : 10.1371/journal.pgen.1008895
- ABSTRACT : The sequencing of Neanderthal and Denisovan genomes has yielded many new insights about interbreeding events between extinct hominins and the ancestors of modern humans. While much attention has been paid to the relatively recent gene flow from Neanderthals and Denisovans into modern humans, other instances of introgression leave more subtle genomic evidence and have received less attention. Here, we present a major extension of the ARGweaver algorithm, called ARGweaver-D, which can infer local genetic relationships under a user-defined demographic model that includes population splits and migration events. This Bayesian algorithm probabilistically samples ancestral recombination graphs (ARGs) that specify not only tree topologies and branch lengths along the genome, but also indicate migrant lineages. The sampled ARGs can therefore be parsed to produce probabilities of introgression along the genome. We show that this method is well powered to detect the archaic migration into modern humans, even with only a few samples. We then show that the method can also detect introgressed regions stemming from older migration events, or from unsampled populations. We apply it to human, Neanderthal, and Denisovan genomes, looking for signatures of older proposed migration events, including ancient humans into Neanderthal, and unknown archaic hominins into Denisovans. We identify 3% of the Neanderthal genome that is putatively introgressed from ancient humans, and estimate that the gene flow occurred between 200-300kya. We find no convincing evidence that negative selection acted against these regions. Finally, we predict that 1% of the Denisovan genome was introgressed from an unsequenced, but highly diverged, archaic hominin ancestor. About 15% of these "super-archaic" regions-comprising at least about 4Mb-were, in turn, introgressed into modern humans and continue to exist in the genomes of people alive today.
- COPYRIGHT : http://creativecommons.org/licenses/by/4.0/
- CITATION : Hubisz MJ, Williams AL, Siepel A. (2020) Mapping gene flow between ancient hominins through demography-aware inference of the ancestral recombination graph PLoS Genet., 16 (8) e1008895. doi:10.1371/journal.pgen.1008895. PMID 32760067
- JOURNAL_INFO : PLoS genetics ; PLoS Genet. ; 2020 ; 16 ; 8 ; e1008895
- PUBMED_LINK : 32760067
ASMC
- NAME : ASMC
- SHORT NAME : ASMC
- FULL NAME : Ascertained Sequentially Markovian Coalescent
- DESCRIPTION : The Ascertained Sequentially Markovian Coalescent is a method to efficiently estimate pairwise coalescence time along the genome. It can be run using SNP array or whole-genome sequencing (WGS) data.
- URL : https://github.com/pierpal/ASMC
- TITLE : High-throughput inference of pairwise coalescence times identifies signals of selection and enriched disease heritability
- DOI : 10.1038/s41588-018-0177-x
- ABSTRACT : Interest in reconstructing demographic histories has motivated the development of methods to estimate locus-specific pairwise coalescence times from whole-genome sequencing data. Here we introduce a powerful new method, ASMC, that can estimate coalescence times using only SNP array data, and is orders of magnitude faster than previous approaches. We applied ASMC to detect recent positive selection in 113,851 phased British samples from the UK Biobank, and detected 12 genome-wide significant signals, including 6 novel loci. We also applied ASMC to sequencing data from 498 Dutch individuals to detect background selection at deeper time scales. We detected strong heritability enrichment in regions of high background selection in an analysis of 20 independent diseases and complex traits using stratified linkage disequilibrium score regression, conditioned on a broad set of functional annotations (including other background selection annotations). These results underscore the widespread effects of background selection on the genetic architecture of complex traits.
- CITATION : Palamara PF, Terhorst J, Song YS, Price AL. (2018) High-throughput inference of pairwise coalescence times identifies signals of selection and enriched disease heritability Nat. Genet., 50 (9) 1311-1317. doi:10.1038/s41588-018-0177-x. PMID 30104759
- JOURNAL_INFO : Nature genetics ; Nat. Genet. ; 2018 ; 50 ; 9 ; 1311-1317
- PUBMED_LINK : 30104759
Arbores
- NAME : Arbores
- SHORT NAME : Arbores
- TITLE : Bridging trees for posterior inference on ancestral recombination graphs
- DOI : 10.1098/rspa.2018.0568
- ABSTRACT : We present a new Markov chain Monte Carlo algorithm, implemented in the software Arbores, for inferring the history of a sample of DNA sequences. Our principal innovation is a bridging procedure, previously applied only for simple stochastic processes, in which the local computations within a bridge can proceed independently of the rest of the DNA sequence, facilitating large-scale parallelization.
- COPYRIGHT : https://royalsociety.org/-/media/journals/author/Licence-to-Publish-20062019-final.pdf
- CITATION : Heine K, Beskos A, Jasra A, Balding D, ...&, De Iorio M. (2018) Bridging trees for posterior inference on ancestral recombination graphs Proc. Math. Phys. Eng. Sci., 474 (2220) 20180568. doi:10.1098/rspa.2018.0568. PMID 30602937
- JOURNAL_INFO : Proceedings. Mathematical, physical, and engineering sciences ; Proc. Math. Phys. Eng. Sci. ; 2018 ; 474 ; 2220 ; 20180568
- PUBMED_LINK : 30602937
KwARG
- NAME : KwARG
- SHORT NAME : KwARG
- TITLE : KwARG: parsimonious reconstruction of ancestral recombination graphs with recurrent mutation
- DOI : 10.1093/bioinformatics/btab351
- ABSTRACT : MOTIVATION: The reconstruction of possible histories given a sample of genetic data in the presence of recombination and recurrent mutation is a challenging problem, but can provide key insights into the evolution of a population. We present KwARG, which implements a parsimony-based greedy heuristic algorithm for finding plausible genealogical histories (ancestral recombination graphs) that are minimal or near-minimal in the number of posited recombination and mutation events. RESULTS: Given an input dataset of aligned sequences, KwARG outputs a list of possible candidate solutions, each comprising a list of mutation and recombination events that could have generated the dataset; the relative proportion of recombinations and recurrent mutations in a solution can be controlled via specifying a set of 'cost' parameters. We demonstrate that the algorithm performs well when compared against existing methods. AVAILABILITY AND IMPLEMENTATION: The software is available at https://github.com/a-ignatieva/kwarg. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.
- COPYRIGHT : https://creativecommons.org/licenses/by/4.0/
- CITATION : Ignatieva A, Lyngsø RB, Jenkins PA, Hein J. (2021) KwARG: parsimonious reconstruction of ancestral recombination graphs with recurrent mutation Bioinformatics, 37 (19) 3277-3284. doi:10.1093/bioinformatics/btab351. PMID 33970217
- JOURNAL_INFO : Bioinformatics (Oxford, England) ; Bioinformatics ; 2021 ; 37 ; 19 ; 3277-3284
- PUBMED_LINK : 33970217
PSMC
- NAME : PSMC
- SHORT NAME : PSMC
- DESCRIPTION : This software package infers population size history from a diploid sequence using the Pairwise Sequentially Markovian Coalescent (PSMC) model.
- URL : https://github.com/lh3/psmc
- TITLE : Inference of human population history from individual whole-genome sequences
- DOI : 10.1038/nature10231
- ABSTRACT : The history of human population size is important for understanding human evolution. Various studies have found evidence for a founder event (bottleneck) in East Asian and European populations, associated with the human dispersal out-of-Africa event around 60 thousand years (kyr) ago. However, these studies have had to assume simplified demographic models with few parameters, and they do not provide a precise date for the start and stop times of the bottleneck. Here, with fewer assumptions on population size changes, we present a more detailed history of human population sizes between approximately ten thousand and a million years ago, using the pairwise sequentially Markovian coalescent model applied to the complete diploid genome sequences of a Chinese male (YH), a Korean male (SJK), three European individuals (J. C. Venter, NA12891 and NA12878 (ref. 9)) and two Yoruba males (NA18507 (ref. 10) and NA19239). We infer that European and Chinese populations had very similar population-size histories before 10-20 kyr ago. Both populations experienced a severe bottleneck 10-60 kyr ago, whereas African populations experienced a milder bottleneck from which they recovered earlier. All three populations have an elevated effective population size between 60 and 250 kyr ago, possibly due to population substructure. We also infer that the differentiation of genetically modern humans may have started as early as 100-120 kyr ago, but considerable genetic exchanges may still have occurred until 20-40 kyr ago.
- CITATION : Li H, Durbin R. (2011) Inference of human population history from individual whole-genome sequences Nature, 475 (7357) 493-496. doi:10.1038/nature10231. PMID 21753753
- JOURNAL_INFO : Nature ; Nature ; 2011 ; 475 ; 7357 ; 493-496
- PUBMED_LINK : 21753753
RENT+
- NAME : RENT+
- SHORT NAME : RENT+
- TITLE : RENT+: an improved method for inferring local genealogical trees from haplotypes with recombination
- DOI : 10.1093/bioinformatics/btw735
- ABSTRACT : Motivation: : Haplotypes from one or multiple related populations share a common genealogical history. If this shared genealogy can be inferred from haplotypes, it can be very useful for many population genetics problems. However, with the presence of recombination, the genealogical history of haplotypes is complex and cannot be represented by a single genealogical tree. Therefore, inference of genealogical history with recombination is much more challenging than the case of no recombination. Results: : In this paper, we present a new approach called RENT+ for the inference of local genealogical trees from haplotypes with the presence of recombination. RENT+ builds on a previous genealogy inference approach called RENT , which infers a set of related genealogical trees at different genomic positions. RENT+ represents a significant improvement over RENT in the sense that it is more effective in extracting information contained in the haplotype data about the underlying genealogy than RENT . The key components of RENT+ are several greatly enhanced genealogy inference rules. Through simulation, we show that RENT+ is more efficient and accurate than several existing genealogy inference methods. As an application, we apply RENT+ in the inference of population demographic history from haplotypes, which outperforms several existing methods. Availability and Implementation: : RENT+ is implemented in Java, and is freely available for download from: https://github.com/SajadMirzaei/RentPlus . Contacts: : sajad@engr.uconn.edu or ywu@engr.uconn.edu. Supplementary information: : Supplementary data are available at Bioinformatics online.
- CITATION : Mirzaei S, Wu Y. (2017) RENT+: an improved method for inferring local genealogical trees from haplotypes with recombination Bioinformatics, 33 (7) 1021-1030. doi:10.1093/bioinformatics/btw735. PMID 28065901
- JOURNAL_INFO : Bioinformatics (Oxford, England) ; Bioinformatics ; 2017 ; 33 ; 7 ; 1021-1030
- PUBMED_LINK : 28065901
Relate
- NAME : Relate
- SHORT NAME : Relate
- DESCRIPTION : Relate estimates genome-wide genealogies in the form of trees that adapt to changes in local ancestry caused by recombination. The method, which is scalable to thousands of samples, is described in the following paper. Please cite this paper if you use our software in your study.
- URL : https://myersgroup.github.io/relate/
- TITLE : A method for genome-wide genealogy estimation for thousands of samples
- DOI : 10.1038/s41588-019-0484-x
- ABSTRACT : Knowledge of genome-wide genealogies for thousands of individuals would simplify most evolutionary analyses for humans and other species, but has remained computationally infeasible. We have developed a method, Relate, scaling to >10,000 sequences while simultaneously estimating branch lengths, mutational ages and variable historical population sizes, as well as allowing for data errors. Application to 1,000 Genomes Project haplotypes produces joint genealogical histories for 26 human populations. Highly diverged lineages are present in all groups, but most frequent in Africa. Outside Africa, these mainly reflect ancient introgression from groups related to Neanderthals and Denisovans, while African signals instead reflect unknown events unique to that continent. Our approach allows more powerful inferences of natural selection than has previously been possible. We identify multiple regions under strong positive selection, and multi-allelic traits including hair color, body mass index and blood pressure, showing strong evidence of directional selection, varying among human groups.
- CITATION : Speidel L, Forest M, Shi S, Myers SR. (2019) A method for genome-wide genealogy estimation for thousands of samples Nat. Genet., 51 (9) 1321-1329. doi:10.1038/s41588-019-0484-x. PMID 31477933
- JOURNAL_INFO : Nature genetics ; Nat. Genet. ; 2019 ; 51 ; 9 ; 1321-1329
- PUBMED_LINK : 31477933
SARGE
- NAME : SARGE
- SHORT NAME : SARGE
- TITLE : An ancestral recombination graph of human, Neanderthal, and Denisovan genomes
- DOI : 10.1126/sciadv.abc0776
- ABSTRACT : Many humans carry genes from Neanderthals, a legacy of past admixture. Existing methods detect this archaic hominin ancestry within human genomes using patterns of linkage disequilibrium or direct comparison to Neanderthal genomes. Each of these methods is limited in sensitivity and scalability. We describe a new ancestral recombination graph inference algorithm that scales to large genome-wide datasets and demonstrate its accuracy on real and simulated data. We then generate a genome-wide ancestral recombination graph including human and archaic hominin genomes. From this, we generate a map within human genomes of archaic ancestry and of genomic regions not shared with archaic hominins either by admixture or incomplete lineage sorting. We find that only 1.5 to 7% of the modern human genome is uniquely human. We also find evidence of multiple bursts of adaptive changes specific to modern humans within the past 600,000 years involving genes related to brain development and function.
- COPYRIGHT : https://creativecommons.org/licenses/by-nc/4.0/
- CITATION : Schaefer NK, Shapiro B, Green RE. (2021) An ancestral recombination graph of human, Neanderthal, and Denisovan genomes Sci. Adv., 7 (29) eabc0776. doi:10.1126/sciadv.abc0776. PMID 34272242
- JOURNAL_INFO : Science advances ; Sci. Adv. ; 2021 ; 7 ; 29 ; eabc0776
- PUBMED_LINK : 34272242
tsdate
- NAME : tsdate
- SHORT NAME : tsdate
- DESCRIPTION : The tsdate program [Wohns et al., 2022] infers dates for nodes in a genetic genealogy, sometimes loosely known as an ancestral recombination graph or ARG [Wong et al., 2023]. More precisely, it takes a genealogy in tree sequence format as an input and returns a copy of that tree sequence with altered node and mutation times. These times have been estimated on the basis of the number of mutations along the edges connecting genomes in the genealogy (i.e. using the “molecular clock”).
- URL : https://tskit.dev/tsdate/docs/latest/introduction.html
- TITLE : A unified genealogy of modern and ancient genomes
- DOI : 10.1126/science.abi8264
- ABSTRACT : The sequencing of modern and ancient genomes from around the world has revolutionized our understanding of human history and evolution. However, the problem of how best to characterize ancestral relationships from the totality of human genomic variation remains unsolved. Here, we address this challenge with nonparametric methods that enable us to infer a unified genealogy of modern and ancient humans. This compact representation of multiple datasets explores the challenges of missing and erroneous data and uses ancient samples to constrain and date relationships. We demonstrate the power of the method to recover relationships between individuals and populations as well as to identify descendants of ancient samples. Finally, we introduce a simple nonparametric estimator of the geographical location of ancestors that recapitulates key events in human history.
- CITATION : Wohns AW, Wong Y, Jeffery B, Akbari A, ...&, McVean G. (2022) A unified genealogy of modern and ancient genomes Science, 375 (6583) eabi8264. doi:10.1126/science.abi8264. PMID 35201891
- JOURNAL_INFO : Science (New York, N.Y.) ; Science ; 2022 ; 375 ; 6583 ; eabi8264
- PUBMED_LINK : 35201891
tsinfer
- NAME : tsinfer
- SHORT NAME : tsinfer
- DESCRIPTION : Infer a tree sequence from genetic variation data
- URL : https://github.com/tskit-dev/tsinfer
- TITLE : Inferring whole-genome histories in large population datasets
- DOI : 10.1038/s41588-019-0483-y
- ABSTRACT : Inferring the full genealogical history of a set of DNA sequences is a core problem in evolutionary biology, because this history encodes information about the events and forces that have influenced a species. However, current methods are limited, and the most accurate techniques are able to process no more than a hundred samples. As datasets that consist of millions of genomes are now being collected, there is a need for scalable and efficient inference methods to fully utilize these resources. Here we introduce an algorithm that is able to not only infer whole-genome histories with comparable accuracy to the state-of-the-art but also process four orders of magnitude more sequences. The approach also provides an 'evolutionary encoding' of the data, enabling efficient calculation of relevant statistics. We apply the method to human data from the 1000 Genomes Project, Simons Genome Diversity Project and UK Biobank, showing that the inferred genealogies are rich in biological signal and efficient to process.
- CITATION : Kelleher J, Wong Y, Wohns AW, Fadil C, ...&, McVean G. (2019) Inferring whole-genome histories in large population datasets Nat. Genet., 51 (9) 1330-1338. doi:10.1038/s41588-019-0483-y. PMID 31477934
- JOURNAL_INFO : Nature genetics ; Nat. Genet. ; 2019 ; 51 ; 9 ; 1330-1338
- PUBMED_LINK : 31477934