Simulation

Summary Table

NAME	CITATION	YEAR
G2P	Tang Y, Liu X. (2019) G2P: a Genome-Wide-Association-Study simulation tool for genotype simulation, phenotype simulation and power evaluation Bioinformatics, 35 (19) 3852-3854. doi:10.1093/bioinformatics/btz126. PMID 30848784	2019
GCTA	Yang J, Lee SH, Goddard ME, Visscher PM. (2011) GCTA: a tool for genome-wide complex trait analysis Am. J. Hum. Genet., 88 (1) 76-82. doi:10.1016/j.ajhg.2010.11.011. PMID 21167468	2011
HAPGEN2	Su Z, Marchini J, Donnelly P. (2011) HAPGEN2: simulation of multiple disease SNPs Bioinformatics, 27 (16) 2304-2305. doi:10.1093/bioinformatics/btr341. PMID 21653516	2011
SIMER	NA	NA
ms	Hudson, R. R. Generating samples under a Wright-Fisher neutral model of genetic variation. Bioinformatics 18, 337–338 (2002).	NA
sim1000G	Dimitromanolakis A, Xu J, Krol A, Briollais L. (2019) sim1000G: a user-friendly genetic variant simulator in R for unrelated individuals and family-based designs BMC Bioinformatics, 20 (1) 26. doi:10.1186/s12859-019-2611-1. PMID 30646839	2019
simGWAS	Fortune MD, Wallace C. (2019) simGWAS: a fast method for simulation of large scale case-control GWAS summary statistics Bioinformatics, 35 (11) 1901-1906. doi:10.1093/bioinformatics/bty898. PMID 30371734	2019
twas_sim	Wang X, Lu Z, Bhattacharya A, Pasaniuc B, ...&, Mancuso N. (2023) twas_sim, a Python-based tool for simulation and power analysis of transcriptome-wide association analysis Bioinformatics, 39 (5) . doi:10.1093/bioinformatics/btad288. PMID 37099718	2023

G2P

NAME : G2P
SHORT NAME : G2P
FULL NAME : A Genome-Wide-Association-Study Simulation Tool for Genotype Simulation, Phenotype Simulation, and Power Evaluation
DESCRIPTION : a Genome-Wide-Association-Study simulation tool for genotype simulation, phenotype simulation and power evaluation
URL : https://github.com/XiaoleiLiuBio/G2P
TITLE : G2P: a Genome-Wide-Association-Study simulation tool for genotype simulation, phenotype simulation and power evaluation
DOI : 10.1093/bioinformatics/btz126
ABSTRACT : MOTIVATION: Plenty of Genome-Wide-Association-Study (GWAS) methods have been developed for mapping genetic markers that associated with human diseases and agricultural economic traits. Computer simulation is a nice tool to test the performances of various GWAS methods under certain scenarios. Existing tools are either inefficient in terms of computation and memory efficiency or inconvenient to use to simulate big, realistic genotype data and phenotype data to evaluate available GWAS methods. RESULTS: Here, we present a GWAS simulation tool named G2P that can be used to simulate genotype data, phenotype data and perform power evaluation of GWAS methods. G2P is a user-friendly tool with all functions is provided in both graphical user interface and pipeline manners and it is available for Windows, Mac and Linux environments. Furthermore, G2P achieves maximum efficiency in terms of both memory usage and simulation speed; with G2P, the simulation of genotype data that includes 1 000 000 samples and 2 000 000 markers can be accomplished in 5 h. AVAILABILITY AND IMPLEMENTATION: The G2P software, user manual, and example datasets are freely available at GitHub: https://github.com/XiaoleiLiuBio/G2P. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.
COPYRIGHT : https://academic.oup.com/journals/pages/open_access/funder_policies/chorus/standard_publication_model
CITATION : Tang Y, Liu X. (2019) G2P: a Genome-Wide-Association-Study simulation tool for genotype simulation, phenotype simulation and power evaluation Bioinformatics, 35 (19) 3852-3854. doi:10.1093/bioinformatics/btz126. PMID 30848784
JOURNAL_INFO : Bioinformatics (Oxford, England) ; Bioinformatics ; 2019 ; 35 ; 19 ; 3852-3854
PUBMED_LINK : 30848784

GCTA

NAME : GCTA
SHORT NAME : GCTA
FULL NAME : Genome-wide complex trait analysis (GCTA)
DESCRIPTION : GCTA-GREML analysis:GCTA can simulate a GWAS based on real genotype data.
URL : https://yanglab.westlake.edu.cn/software/gcta/#GWASSimulation
TITLE : GCTA: a tool for genome-wide complex trait analysis
DOI : 10.1016/j.ajhg.2010.11.011
ABSTRACT : For most human complex diseases and traits, SNPs identified by genome-wide association studies (GWAS) explain only a small fraction of the heritability. Here we report a user-friendly software tool called genome-wide complex trait analysis (GCTA), which was developed based on a method we recently developed to address the "missing heritability" problem. GCTA estimates the variance explained by all the SNPs on a chromosome or on the whole genome for a complex trait rather than testing the association of any particular SNP to the trait. We introduce GCTA's five main functions: data management, estimation of the genetic relationships from SNPs, mixed linear model analysis of variance explained by the SNPs, estimation of the linkage disequilibrium structure, and GWAS simulation. We focus on the function of estimating the variance explained by all the SNPs on the X chromosome and testing the hypotheses of dosage compensation. The GCTA software is a versatile tool to estimate and partition complex trait variation with large GWAS data sets.
CITATION : Yang J, Lee SH, Goddard ME, Visscher PM. (2011) GCTA: a tool for genome-wide complex trait analysis Am. J. Hum. Genet., 88 (1) 76-82. doi:10.1016/j.ajhg.2010.11.011. PMID 21167468
JOURNAL_INFO : American journal of human genetics ; Am. J. Hum. Genet. ; 2011 ; 88 ; 1 ; 76-82
PUBMED_LINK : 21167468

HAPGEN2

NAME : HAPGEN2
SHORT NAME : HAPGEN2
FULL NAME : HAPGEN2
DESCRIPTION : HAPGEN2 is a an updated version of the program HAPGEN, which simulates case control datasets at SNP markers. The new version can now simulate multiple disease SNPs on a single chromosome, on the assumption that each disease SNP acts independently and are in Hardy-Weinberg equilibrium. We also supply a R package that can simulate interaction between the disease SNPs.
URL : https://mathgen.stats.ox.ac.uk/genetics_software/hapgen/hapgen2.html
TITLE : HAPGEN2: simulation of multiple disease SNPs
DOI : 10.1093/bioinformatics/btr341
ABSTRACT : MOTIVATION: Performing experiments with simulated data is an inexpensive approach to evaluating competing experimental designs and analysis methods in genome-wide association studies. Simulation based on resampling known haplotypes is fast and efficient and can produce samples with patterns of linkage disequilibrium (LD), which mimic those in real data. However, the inability of current methods to simulate multiple nearby disease SNPs on the same chromosome can limit their application. RESULTS: We introduce a new simulation algorithm based on a successful resampling method, HAPGEN, that can simulate multiple nearby disease SNPs on the same chromosome. The new method, HAPGEN2, retains many advantages of resampling methods and expands the range of disease models that current simulators offer. AVAILABILITY: HAPGEN2 is freely available from http://www.stats.ox.ac.uk/~marchini/software/gwas/gwas.html. CONTACT: zhan@well.ox.ac.uk SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.
CITATION : Su Z, Marchini J, Donnelly P. (2011) HAPGEN2: simulation of multiple disease SNPs Bioinformatics, 27 (16) 2304-2305. doi:10.1093/bioinformatics/btr341. PMID 21653516
JOURNAL_INFO : Bioinformatics (Oxford, England) ; Bioinformatics ; 2011 ; 27 ; 16 ; 2304-2305
PUBMED_LINK : 21653516

SIMER

NAME : SIMER
SHORT NAME : SIMER
FULL NAME : Data Simulation for Life Science and Breeding
DESCRIPTION : Data Simulation for Life Science and Breeding
URL : https://github.com/xiaolei-lab/SIMER#genotype-data

ms

NAME : ms
SHORT NAME : ms
FULL NAME : ms
CITATION : Hudson, R. R. Generating samples under a Wright-Fisher neutral model of genetic variation. Bioinformatics 18, 337–338 (2002).
PUBMED_LINK : 11847089

sim1000G

NAME : sim1000G
SHORT NAME : sim1000G
FULL NAME : sim1000G
DESCRIPTION : a user-friendly genetic variant simulator in R for unrelated individuals and family-based designs
URL : https://github.com/adimitromanolakis/sim1000G
TITLE : sim1000G: a user-friendly genetic variant simulator in R for unrelated individuals and family-based designs
DOI : 10.1186/s12859-019-2611-1
ABSTRACT : BACKGROUND: Simulation of genetic variants data is frequently required for the evaluation of statistical methods in the fields of human and animal genetics. Although a number of high-quality genetic simulators have been developed, many of them require advanced knowledge in population genetics or in computation to be used effectively. In addition, generating simulated data in the context of family-based studies demands sophisticated methods and advanced computer programming. RESULTS: To address these issues, we propose a new user-friendly and integrated R package, sim1000G, which simulates variants in genomic regions among unrelated individuals or among families. The only input needed is a raw phased Variant Call Format (VCF) file. Haplotypes are extracted to compute linkage disequilibrium (LD) in the simulated genomic regions and for the generation of new genotype data among unrelated individuals. The covariance across variants is used to preserve the LD structure of the original population. Pedigrees of arbitrary sizes are generated by modeling recombination events with sim1000G. To illustrate the application of sim1000G, various scenarios are presented assuming unrelated individuals from a single population or two distinct populations, or alternatively for three-generation pedigree data. Sim1000G can capture allele frequency diversity, short and long-range linkage disequilibrium (LD) patterns and subtle population differences in LD structure without the need of any tuning parameters. CONCLUSION: Sim1000G fills a gap in the vast area of genetic variants simulators by its simplicity and independence from external tools. Currently, it is one of the few simulation packages completely integrated into R and able to simulate multiple genetic variants among unrelated individuals and within families. Its implementation will facilitate the application and development of computational methods for association studies with both rare and common variants.
CITATION : Dimitromanolakis A, Xu J, Krol A, Briollais L. (2019) sim1000G: a user-friendly genetic variant simulator in R for unrelated individuals and family-based designs BMC Bioinformatics, 20 (1) 26. doi:10.1186/s12859-019-2611-1. PMID 30646839
JOURNAL_INFO : BMC bioinformatics ; BMC Bioinformatics ; 2019 ; 20 ; 1 ; 26
PUBMED_LINK : 30646839

simGWAS

NAME : simGWAS
SHORT NAME : simGWAS
FULL NAME : simGWAS
DESCRIPTION : a fast method for simulation of large scale case–control GWAS summary statistics
URL : https://github.com/chr1swallace/simGWAS
TITLE : simGWAS: a fast method for simulation of large scale case-control GWAS summary statistics
DOI : 10.1093/bioinformatics/bty898
ABSTRACT : MOTIVATION: Methods for analysis of GWAS summary statistics have encouraged data sharing and democratized the analysis of different diseases. Ideal validation for such methods is application to simulated data, where some 'truth' is known. As GWAS increase in size, so does the computational complexity of such evaluations; standard practice repeatedly simulates and analyses genotype data for all individuals in an example study. RESULTS: We have developed a novel method based on an alternative approach, directly simulating GWAS summary data, without individual data as an intermediate step. We mathematically derive the expected statistics for any set of causal variants and their effect sizes, conditional upon control haplotype frequencies (available from public reference datasets). Simulation of GWAS summary output can be conducted independently of sample size by simulating random variates about these expected values. Across a range of scenarios, our method, produces very similar output to that from simulating individual genotypes with a substantial gain in speed even for modest sample sizes. Fast simulation of GWAS summary statistics will enable more complete and rapid evaluation of summary statistic methods as well as opening new potential avenues of research in fine mapping and gene set enrichment analysis. AVAILABILITY AND IMPLEMENTATION: Our method is available under a GPL license as an R package from http://github.com/chr1swallace/simGWAS. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.
COPYRIGHT : http://creativecommons.org/licenses/by/4.0/
CITATION : Fortune MD, Wallace C. (2019) simGWAS: a fast method for simulation of large scale case-control GWAS summary statistics Bioinformatics, 35 (11) 1901-1906. doi:10.1093/bioinformatics/bty898. PMID 30371734
JOURNAL_INFO : Bioinformatics (Oxford, England) ; Bioinformatics ; 2019 ; 35 ; 11 ; 1901-1906
PUBMED_LINK : 30371734

twas_sim

NAME : twas_sim
SHORT NAME : twas_sim
FULL NAME : twas_sim
DESCRIPTION : A python software leveraging real genotype data to simulate a complex trait as a function of latent expression, fit eQTL weights in independent data, and perform GWAS/TWAS on the complex trait.
URL : https://github.com/mancusolab/twas_sim
TITLE : twas_sim, a Python-based tool for simulation and power analysis of transcriptome-wide association analysis
DOI : 10.1093/bioinformatics/btad288
ABSTRACT : SUMMARY: Genome-wide association studies (GWASs) have identified numerous genetic variants associated with complex disease risk; however, most of these associations are non-coding, complicating identifying their proximal target gene. Transcriptome-wide association studies (TWASs) have been proposed to mitigate this gap by integrating expression quantitative trait loci (eQTL) data with GWAS data. Numerous methodological advancements have been made for TWAS, yet each approach requires ad hoc simulations to demonstrate feasibility. Here, we present twas_sim, a computationally scalable and easily extendable tool for simplified performance evaluation and power analysis for TWAS methods. AVAILABILITY AND IMPLEMENTATION: Software and documentation are available at https://github.com/mancusolab/twas_sim.
CITATION : Wang X, Lu Z, Bhattacharya A, Pasaniuc B, ...&, Mancuso N. (2023) twas_sim, a Python-based tool for simulation and power analysis of transcriptome-wide association analysis Bioinformatics, 39 (5) . doi:10.1093/bioinformatics/btad288. PMID 37099718
JOURNAL_INFO : Bioinformatics (Oxford, England) ; Bioinformatics ; 2023 ; 39 ; 5 ;
PUBMED_LINK : 37099718