Tools Simulation
Curation of Simulation — listings under the GWAS Tools tab.
Summary Table
Click a column header to sort the table.
| NAME | Main citation | YEAR |
|---|---|---|
| G2P | Tang Y et al., Bioinformatics, 2019 |
2019 |
| GCTA | Yang J et al., Am J Hum Genet, 2011 |
2011 |
| HAPGEN2 | Su Z et al., Bioinformatics, 2011 |
2011 |
| SIMER | NA |
NA |
| ms | Hudson RR, Bioinformatics, 2002 |
2002 |
| sim1000G | Dimitromanolakis A et al., BMC Bioinformatics, 2019 |
2019 |
| simGWAS | Fortune MD et al., Bioinformatics, 2019 |
2019 |
| twas_sim | Wang X et al., Bioinformatics, 2023 |
2023 |
G2P
PUBMED_LINK
FULL NAME
A Genome-Wide-Association-Study Simulation Tool for Genotype Simulation, Phenotype Simulation, and Power Evaluation
DESCRIPTION
a Genome-Wide-Association-Study simulation tool for genotype simulation, phenotype simulation and power evaluation
URL
TITLE
G2P: a Genome-Wide-Association-Study simulation tool for genotype simulation, phenotype simulation and power evaluation.
Main citation
Tang Y, Liu X. (2019) G2P: a Genome-Wide-Association-Study simulation tool for genotype simulation, phenotype simulation and power evaluation. Bioinformatics, 35 (19) 3852-3854. doi:10.1093/bioinformatics/btz126. PMID 30848784
ABSTRACT
MOTIVATION: Plenty of Genome-Wide-Association-Study (GWAS) methods have been developed for mapping genetic markers that associated with human diseases and agricultural economic traits. Computer simulation is a nice tool to test the performances of various GWAS methods under certain scenarios. Existing tools are either inefficient in terms of computation and memory efficiency or inconvenient to use to simulate big, realistic genotype data and phenotype data to evaluate available GWAS methods. RESULTS: Here, we present a GWAS simulation tool named G2P that can be used to simulate genotype data, phenotype data and perform power evaluation of GWAS methods. G2P is a user-friendly tool with all functions is provided in both graphical user interface and pipeline manners and it is available for Windows, Mac and Linux environments. Furthermore, G2P achieves maximum efficiency in terms of both memory usage and simulation speed; with G2P, the simulation of genotype data that includes 1 000 000 samples and 2 000 000 markers can be accomplished in 5 h. AVAILABILITY AND IMPLEMENTATION: The G2P software, user manual, and example datasets are freely available at GitHub: https://github.com/XiaoleiLiuBio/G2P. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.
DOI
10.1093/bioinformatics/btz126
GCTA
PUBMED_LINK
FULL NAME
Genome-wide complex trait analysis (GCTA)
DESCRIPTION
GCTA-GREML analysis:GCTA can simulate a GWAS based on real genotype data.
URL
TITLE
GCTA: a tool for genome-wide complex trait analysis.
Main citation
Yang J, Lee SH, Goddard ME, Visscher PM. (2011) GCTA: a tool for genome-wide complex trait analysis. Am J Hum Genet, 88 (1) 76-82. doi:10.1016/j.ajhg.2010.11.011. PMID 21167468
ABSTRACT
For most human complex diseases and traits, SNPs identified by genome-wide association studies (GWAS) explain only a small fraction of the heritability. Here we report a user-friendly software tool called genome-wide complex trait analysis (GCTA), which was developed based on a method we recently developed to address the "missing heritability" problem. GCTA estimates the variance explained by all the SNPs on a chromosome or on the whole genome for a complex trait rather than testing the association of any particular SNP to the trait. We introduce GCTA's five main functions: data management, estimation of the genetic relationships from SNPs, mixed linear model analysis of variance explained by the SNPs, estimation of the linkage disequilibrium structure, and GWAS simulation. We focus on the function of estimating the variance explained by all the SNPs on the X chromosome and testing the hypotheses of dosage compensation. The GCTA software is a versatile tool to estimate and partition complex trait variation with large GWAS data sets.
DOI
10.1016/j.ajhg.2010.11.011
HAPGEN2
PUBMED_LINK
DESCRIPTION
HAPGEN2 is a an updated version of the program HAPGEN, which simulates case control datasets at SNP markers. The new version can now simulate multiple disease SNPs on a single chromosome, on the assumption that each disease SNP acts independently and are in Hardy-Weinberg equilibrium. We also supply a R package that can simulate interaction between the disease SNPs.
URL
TITLE
HAPGEN2: simulation of multiple disease SNPs.
Main citation
Su Z, Marchini J, Donnelly P. (2011) HAPGEN2: simulation of multiple disease SNPs. Bioinformatics, 27 (16) 2304-5. doi:10.1093/bioinformatics/btr341. PMID 21653516
ABSTRACT
MOTIVATION: Performing experiments with simulated data is an inexpensive approach to evaluating competing experimental designs and analysis methods in genome-wide association studies. Simulation based on resampling known haplotypes is fast and efficient and can produce samples with patterns of linkage disequilibrium (LD), which mimic those in real data. However, the inability of current methods to simulate multiple nearby disease SNPs on the same chromosome can limit their application. RESULTS: We introduce a new simulation algorithm based on a successful resampling method, HAPGEN, that can simulate multiple nearby disease SNPs on the same chromosome. The new method, HAPGEN2, retains many advantages of resampling methods and expands the range of disease models that current simulators offer. AVAILABILITY: HAPGEN2 is freely available from http://www.stats.ox.ac.uk/~marchini/software/gwas/gwas.html. CONTACT: zhan@well.ox.ac.uk SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.
DOI
10.1093/bioinformatics/btr341
SIMER
FULL NAME
Data Simulation for Life Science and Breeding
DESCRIPTION
Data Simulation for Life Science and Breeding
URL
ms
PUBMED_LINK
TITLE
Generating samples under a Wright-Fisher neutral model of genetic variation.
Main citation
Hudson RR. (2002) Generating samples under a Wright-Fisher neutral model of genetic variation. Bioinformatics, 18 (2) 337-8. doi:10.1093/bioinformatics/18.2.337. PMID 11847089
ABSTRACT
A Monte Carlo computer program is available to generate samples drawn from a population evolving according to a Wright-Fisher neutral model. The program assumes an infinite-sites model of mutation, and allows recombination, gene conversion, symmetric migration among subpopulations, and a variety of demographic histories. The samples produced can be used to investigate the sampling properties of any sample statistic under these neutral models.
DOI
10.1093/bioinformatics/18.2.337
sim1000G
PUBMED_LINK
DESCRIPTION
a user-friendly genetic variant simulator in R for unrelated individuals and family-based designs
URL
TITLE
sim1000G: a user-friendly genetic variant simulator in R for unrelated individuals and family-based designs.
Main citation
Dimitromanolakis A, Xu J, Krol A, Briollais L. (2019) sim1000G: a user-friendly genetic variant simulator in R for unrelated individuals and family-based designs. BMC Bioinformatics, 20 (1) 26. doi:10.1186/s12859-019-2611-1. PMID 30646839
ABSTRACT
BACKGROUND: Simulation of genetic variants data is frequently required for the evaluation of statistical methods in the fields of human and animal genetics. Although a number of high-quality genetic simulators have been developed, many of them require advanced knowledge in population genetics or in computation to be used effectively. In addition, generating simulated data in the context of family-based studies demands sophisticated methods and advanced computer programming. RESULTS: To address these issues, we propose a new user-friendly and integrated R package, sim1000G, which simulates variants in genomic regions among unrelated individuals or among families. The only input needed is a raw phased Variant Call Format (VCF) file. Haplotypes are extracted to compute linkage disequilibrium (LD) in the simulated genomic regions and for the generation of new genotype data among unrelated individuals. The covariance across variants is used to preserve the LD structure of the original population. Pedigrees of arbitrary sizes are generated by modeling recombination events with sim1000G. To illustrate the application of sim1000G, various scenarios are presented assuming unrelated individuals from a single population or two distinct populations, or alternatively for three-generation pedigree data. Sim1000G can capture allele frequency diversity, short and long-range linkage disequilibrium (LD) patterns and subtle population differences in LD structure without the need of any tuning parameters. CONCLUSION: Sim1000G fills a gap in the vast area of genetic variants simulators by its simplicity and independence from external tools. Currently, it is one of the few simulation packages completely integrated into R and able to simulate multiple genetic variants among unrelated individuals and within families. Its implementation will facilitate the application and development of computational methods for association studies with both rare and common variants.
DOI
10.1186/s12859-019-2611-1
simGWAS
PUBMED_LINK
DESCRIPTION
a fast method for simulation of large scale case–control GWAS summary statistics
URL
TITLE
simGWAS: a fast method for simulation of large scale case-control GWAS summary statistics.
Main citation
Fortune MD, Wallace C. (2019) simGWAS: a fast method for simulation of large scale case-control GWAS summary statistics. Bioinformatics, 35 (11) 1901-1906. doi:10.1093/bioinformatics/bty898. PMID 30371734
ABSTRACT
MOTIVATION: Methods for analysis of GWAS summary statistics have encouraged data sharing and democratized the analysis of different diseases. Ideal validation for such methods is application to simulated data, where some 'truth' is known. As GWAS increase in size, so does the computational complexity of such evaluations; standard practice repeatedly simulates and analyses genotype data for all individuals in an example study. RESULTS: We have developed a novel method based on an alternative approach, directly simulating GWAS summary data, without individual data as an intermediate step. We mathematically derive the expected statistics for any set of causal variants and their effect sizes, conditional upon control haplotype frequencies (available from public reference datasets). Simulation of GWAS summary output can be conducted independently of sample size by simulating random variates about these expected values. Across a range of scenarios, our method, produces very similar output to that from simulating individual genotypes with a substantial gain in speed even for modest sample sizes. Fast simulation of GWAS summary statistics will enable more complete and rapid evaluation of summary statistic methods as well as opening new potential avenues of research in fine mapping and gene set enrichment analysis. AVAILABILITY AND IMPLEMENTATION: Our method is available under a GPL license as an R package from http://github.com/chr1swallace/simGWAS. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.
DOI
10.1093/bioinformatics/bty898
twas_sim
PUBMED_LINK
DESCRIPTION
A python software leveraging real genotype data to simulate a complex trait as a function of latent expression, fit eQTL weights in independent data, and perform GWAS/TWAS on the complex trait.
URL
TITLE
twas_sim, a Python-based tool for simulation and power analysis of transcriptome-wide association analysis.
Main citation
Wang X, Lu Z, Bhattacharya A, Pasaniuc B, ...&, Mancuso N. (2023) twas_sim, a Python-based tool for simulation and power analysis of transcriptome-wide association analysis. Bioinformatics, 39 (5) . doi:10.1093/bioinformatics/btad288. PMID 37099718
ABSTRACT
SUMMARY: Genome-wide association studies (GWASs) have identified numerous genetic variants associated with complex disease risk; however, most of these associations are non-coding, complicating identifying their proximal target gene. Transcriptome-wide association studies (TWASs) have been proposed to mitigate this gap by integrating expression quantitative trait loci (eQTL) data with GWAS data. Numerous methodological advancements have been made for TWAS, yet each approach requires ad hoc simulations to demonstrate feasibility. Here, we present twas_sim, a computationally scalable and easily extendable tool for simplified performance evaluation and power analysis for TWAS methods. AVAILABILITY AND IMPLEMENTATION: Software and documentation are available at https://github.com/mancusolab/twas_sim.
DOI
10.1093/bioinformatics/btad288