Skip to content

Simulation

Summary Table

NAME CITATION YEAR
G2P Tang Y, Liu X. (2019) G2P: a Genome-Wide-Association-Study simulation tool for genotype simulation, phenotype simulation and power evaluation Bioinformatics, 35 (19) 3852-3854. doi:10.1093/bioinformatics/btz126. PMID 30848784 2019
GCTA Yang J, Lee SH, Goddard ME, Visscher PM. (2011) GCTA: a tool for genome-wide complex trait analysis Am. J. Hum. Genet., 88 (1) 76-82. doi:10.1016/j.ajhg.2010.11.011. PMID 21167468 2011
HAPGEN2 Su Z, Marchini J, Donnelly P. (2011) HAPGEN2: simulation of multiple disease SNPs Bioinformatics, 27 (16) 2304-2305. doi:10.1093/bioinformatics/btr341. PMID 21653516 2011
SIMER NA NA
sim1000G Dimitromanolakis A, Xu J, Krol A, Briollais L. (2019) sim1000G: a user-friendly genetic variant simulator in R for unrelated individuals and family-based designs BMC Bioinformatics, 20 (1) 26. doi:10.1186/s12859-019-2611-1. PMID 30646839 2019
simGWAS Fortune MD, Wallace C. (2019) simGWAS: a fast method for simulation of large scale case-control GWAS summary statistics Bioinformatics, 35 (11) 1901-1906. doi:10.1093/bioinformatics/bty898. PMID 30371734 2019
twas_sim Wang X, Lu Z, Bhattacharya A, Pasaniuc B, ...&, Mancuso N. (2023) twas_sim, a Python-based tool for simulation and power analysis of transcriptome-wide association analysis Bioinformatics, 39 (5) . doi:10.1093/bioinformatics/btad288. PMID 37099718 2023

G2P

  • NAME : G2P
  • SHORT NAME : G2P
  • FULL NAME : A Genome-Wide-Association-Study Simulation Tool for Genotype Simulation, Phenotype Simulation, and Power Evaluation
  • DESCRIPTION : a Genome-Wide-Association-Study simulation tool for genotype simulation, phenotype simulation and power evaluation
  • URL : https://github.com/XiaoleiLiuBio/G2P
  • TITLE : G2P: a Genome-Wide-Association-Study simulation tool for genotype simulation, phenotype simulation and power evaluation
  • DOI : 10.1093/bioinformatics/btz126
  • ABSTRACT : MOTIVATION: Plenty of Genome-Wide-Association-Study (GWAS) methods have been developed for mapping genetic markers that associated with human diseases and agricultural economic traits. Computer simulation is a nice tool to test the performances of various GWAS methods under certain scenarios. Existing tools are either inefficient in terms of computation and memory efficiency or inconvenient to use to simulate big, realistic genotype data and phenotype data to evaluate available GWAS methods. RESULTS: Here, we present a GWAS simulation tool named G2P that can be used to simulate genotype data, phenotype data and perform power evaluation of GWAS methods. G2P is a user-friendly tool with all functions is provided in both graphical user interface and pipeline manners and it is available for Windows, Mac and Linux environments. Furthermore, G2P achieves maximum efficiency in terms of both memory usage and simulation speed; with G2P, the simulation of genotype data that includes 1 000 000 samples and 2 000 000 markers can be accomplished in 5 h. AVAILABILITY AND IMPLEMENTATION: The G2P software, user manual, and example datasets are freely available at GitHub: https://github.com/XiaoleiLiuBio/G2P. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.
  • COPYRIGHT : https://academic.oup.com/journals/pages/open_access/funder_policies/chorus/standard_publication_model
  • CITATION : Tang Y, Liu X. (2019) G2P: a Genome-Wide-Association-Study simulation tool for genotype simulation, phenotype simulation and power evaluation Bioinformatics, 35 (19) 3852-3854. doi:10.1093/bioinformatics/btz126. PMID 30848784
  • JOURNAL_INFO : Bioinformatics (Oxford, England) ; Bioinformatics ; 2019 ; 35 ; 19 ; 3852-3854
  • PUBMED_LINK : 30848784

GCTA

  • NAME : GCTA
  • SHORT NAME : GCTA
  • FULL NAME : Genome-wide complex trait analysis (GCTA)
  • DESCRIPTION : GCTA-GREML analysis:GCTA can simulate a GWAS based on real genotype data.
  • URL : https://yanglab.westlake.edu.cn/software/gcta/#GWASSimulation
  • TITLE : GCTA: a tool for genome-wide complex trait analysis
  • DOI : 10.1016/j.ajhg.2010.11.011
  • ABSTRACT : For most human complex diseases and traits, SNPs identified by genome-wide association studies (GWAS) explain only a small fraction of the heritability. Here we report a user-friendly software tool called genome-wide complex trait analysis (GCTA), which was developed based on a method we recently developed to address the "missing heritability" problem. GCTA estimates the variance explained by all the SNPs on a chromosome or on the whole genome for a complex trait rather than testing the association of any particular SNP to the trait. We introduce GCTA's five main functions: data management, estimation of the genetic relationships from SNPs, mixed linear model analysis of variance explained by the SNPs, estimation of the linkage disequilibrium structure, and GWAS simulation. We focus on the function of estimating the variance explained by all the SNPs on the X chromosome and testing the hypotheses of dosage compensation. The GCTA software is a versatile tool to estimate and partition complex trait variation with large GWAS data sets.
  • CITATION : Yang J, Lee SH, Goddard ME, Visscher PM. (2011) GCTA: a tool for genome-wide complex trait analysis Am. J. Hum. Genet., 88 (1) 76-82. doi:10.1016/j.ajhg.2010.11.011. PMID 21167468
  • JOURNAL_INFO : American journal of human genetics ; Am. J. Hum. Genet. ; 2011 ; 88 ; 1 ; 76-82
  • PUBMED_LINK : 21167468

HAPGEN2

  • NAME : HAPGEN2
  • SHORT NAME : HAPGEN2
  • FULL NAME : HAPGEN2
  • DESCRIPTION : HAPGEN2 is a an updated version of the program HAPGEN, which simulates case control datasets at SNP markers. The new version can now simulate multiple disease SNPs on a single chromosome, on the assumption that each disease SNP acts independently and are in Hardy-Weinberg equilibrium. We also supply a R package that can simulate interaction between the disease SNPs.
  • URL : https://mathgen.stats.ox.ac.uk/genetics_software/hapgen/hapgen2.html
  • TITLE : HAPGEN2: simulation of multiple disease SNPs
  • DOI : 10.1093/bioinformatics/btr341
  • ABSTRACT : MOTIVATION: Performing experiments with simulated data is an inexpensive approach to evaluating competing experimental designs and analysis methods in genome-wide association studies. Simulation based on resampling known haplotypes is fast and efficient and can produce samples with patterns of linkage disequilibrium (LD), which mimic those in real data. However, the inability of current methods to simulate multiple nearby disease SNPs on the same chromosome can limit their application. RESULTS: We introduce a new simulation algorithm based on a successful resampling method, HAPGEN, that can simulate multiple nearby disease SNPs on the same chromosome. The new method, HAPGEN2, retains many advantages of resampling methods and expands the range of disease models that current simulators offer. AVAILABILITY: HAPGEN2 is freely available from http://www.stats.ox.ac.uk/~marchini/software/gwas/gwas.html. CONTACT: zhan@well.ox.ac.uk SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.
  • CITATION : Su Z, Marchini J, Donnelly P. (2011) HAPGEN2: simulation of multiple disease SNPs Bioinformatics, 27 (16) 2304-2305. doi:10.1093/bioinformatics/btr341. PMID 21653516
  • JOURNAL_INFO : Bioinformatics (Oxford, England) ; Bioinformatics ; 2011 ; 27 ; 16 ; 2304-2305
  • PUBMED_LINK : 21653516

SIMER

sim1000G

  • NAME : sim1000G
  • SHORT NAME : sim1000G
  • FULL NAME : sim1000G
  • DESCRIPTION : a user-friendly genetic variant simulator in R for unrelated individuals and family-based designs
  • URL : https://github.com/adimitromanolakis/sim1000G
  • TITLE : sim1000G: a user-friendly genetic variant simulator in R for unrelated individuals and family-based designs
  • DOI : 10.1186/s12859-019-2611-1
  • ABSTRACT : BACKGROUND: Simulation of genetic variants data is frequently required for the evaluation of statistical methods in the fields of human and animal genetics. Although a number of high-quality genetic simulators have been developed, many of them require advanced knowledge in population genetics or in computation to be used effectively. In addition, generating simulated data in the context of family-based studies demands sophisticated methods and advanced computer programming. RESULTS: To address these issues, we propose a new user-friendly and integrated R package, sim1000G, which simulates variants in genomic regions among unrelated individuals or among families. The only input needed is a raw phased Variant Call Format (VCF) file. Haplotypes are extracted to compute linkage disequilibrium (LD) in the simulated genomic regions and for the generation of new genotype data among unrelated individuals. The covariance across variants is used to preserve the LD structure of the original population. Pedigrees of arbitrary sizes are generated by modeling recombination events with sim1000G. To illustrate the application of sim1000G, various scenarios are presented assuming unrelated individuals from a single population or two distinct populations, or alternatively for three-generation pedigree data. Sim1000G can capture allele frequency diversity, short and long-range linkage disequilibrium (LD) patterns and subtle population differences in LD structure without the need of any tuning parameters. CONCLUSION: Sim1000G fills a gap in the vast area of genetic variants simulators by its simplicity and independence from external tools. Currently, it is one of the few simulation packages completely integrated into R and able to simulate multiple genetic variants among unrelated individuals and within families. Its implementation will facilitate the application and development of computational methods for association studies with both rare and common variants.
  • CITATION : Dimitromanolakis A, Xu J, Krol A, Briollais L. (2019) sim1000G: a user-friendly genetic variant simulator in R for unrelated individuals and family-based designs BMC Bioinformatics, 20 (1) 26. doi:10.1186/s12859-019-2611-1. PMID 30646839
  • JOURNAL_INFO : BMC bioinformatics ; BMC Bioinformatics ; 2019 ; 20 ; 1 ; 26
  • PUBMED_LINK : 30646839

simGWAS

  • NAME : simGWAS
  • SHORT NAME : simGWAS
  • FULL NAME : simGWAS
  • DESCRIPTION : a fast method for simulation of large scale case–control GWAS summary statistics
  • URL : https://github.com/chr1swallace/simGWAS
  • TITLE : simGWAS: a fast method for simulation of large scale case-control GWAS summary statistics
  • DOI : 10.1093/bioinformatics/bty898
  • ABSTRACT : MOTIVATION: Methods for analysis of GWAS summary statistics have encouraged data sharing and democratized the analysis of different diseases. Ideal validation for such methods is application to simulated data, where some 'truth' is known. As GWAS increase in size, so does the computational complexity of such evaluations; standard practice repeatedly simulates and analyses genotype data for all individuals in an example study. RESULTS: We have developed a novel method based on an alternative approach, directly simulating GWAS summary data, without individual data as an intermediate step. We mathematically derive the expected statistics for any set of causal variants and their effect sizes, conditional upon control haplotype frequencies (available from public reference datasets). Simulation of GWAS summary output can be conducted independently of sample size by simulating random variates about these expected values. Across a range of scenarios, our method, produces very similar output to that from simulating individual genotypes with a substantial gain in speed even for modest sample sizes. Fast simulation of GWAS summary statistics will enable more complete and rapid evaluation of summary statistic methods as well as opening new potential avenues of research in fine mapping and gene set enrichment analysis. AVAILABILITY AND IMPLEMENTATION: Our method is available under a GPL license as an R package from http://github.com/chr1swallace/simGWAS. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.
  • COPYRIGHT : http://creativecommons.org/licenses/by/4.0/
  • CITATION : Fortune MD, Wallace C. (2019) simGWAS: a fast method for simulation of large scale case-control GWAS summary statistics Bioinformatics, 35 (11) 1901-1906. doi:10.1093/bioinformatics/bty898. PMID 30371734
  • JOURNAL_INFO : Bioinformatics (Oxford, England) ; Bioinformatics ; 2019 ; 35 ; 11 ; 1901-1906
  • PUBMED_LINK : 30371734

twas_sim

  • NAME : twas_sim
  • SHORT NAME : twas_sim
  • FULL NAME : twas_sim
  • DESCRIPTION : A python software leveraging real genotype data to simulate a complex trait as a function of latent expression, fit eQTL weights in independent data, and perform GWAS/TWAS on the complex trait.
  • URL : https://github.com/mancusolab/twas_sim
  • TITLE : twas_sim, a Python-based tool for simulation and power analysis of transcriptome-wide association analysis
  • DOI : 10.1093/bioinformatics/btad288
  • ABSTRACT : SUMMARY: Genome-wide association studies (GWASs) have identified numerous genetic variants associated with complex disease risk; however, most of these associations are non-coding, complicating identifying their proximal target gene. Transcriptome-wide association studies (TWASs) have been proposed to mitigate this gap by integrating expression quantitative trait loci (eQTL) data with GWAS data. Numerous methodological advancements have been made for TWAS, yet each approach requires ad hoc simulations to demonstrate feasibility. Here, we present twas_sim, a computationally scalable and easily extendable tool for simplified performance evaluation and power analysis for TWAS methods. AVAILABILITY AND IMPLEMENTATION: Software and documentation are available at https://github.com/mancusolab/twas_sim.
  • CITATION : Wang X, Lu Z, Bhattacharya A, Pasaniuc B, ...&, Mancuso N. (2023) twas_sim, a Python-based tool for simulation and power analysis of transcriptome-wide association analysis Bioinformatics, 39 (5) . doi:10.1093/bioinformatics/btad288. PMID 37099718
  • JOURNAL_INFO : Bioinformatics (Oxford, England) ; Bioinformatics ; 2023 ; 39 ; 5 ;
  • PUBMED_LINK : 37099718