Tools Simulation

Curation of Simulation — listings under the GWAS Tools tab.

Summary Table

Click a column header to sort the table.

NAME	Main citation	YEAR
G2P	Tang Y et al., Bioinformatics, 2019	2019
GCTA	Yang J et al., Am J Hum Genet, 2011	2011
HAPGEN2	Su Z et al., Bioinformatics, 2011	2011
SIMER	NA	NA
ms	Hudson RR, Bioinformatics, 2002	2002
sim1000G	Dimitromanolakis A et al., BMC Bioinformatics, 2019	2019
simGWAS	Fortune MD et al., Bioinformatics, 2019	2019
twas_sim	Wang X et al., Bioinformatics, 2023	2023

G2P

Tool

PUBMED_LINK

30848784

FULL NAME

A Genome-Wide-Association-Study Simulation Tool for Genotype Simulation, Phenotype Simulation, and Power Evaluation

DESCRIPTION

a Genome-Wide-Association-Study simulation tool for genotype simulation, phenotype simulation and power evaluation

Show full descriptionShow less

URL

https://github.com/XiaoleiLiuBio/G2P

TITLE

G2P: a Genome-Wide-Association-Study simulation tool for genotype simulation, phenotype simulation and power evaluation.

Main citation

Tang Y, Liu X. (2019) G2P: a Genome-Wide-Association-Study simulation tool for genotype simulation, phenotype simulation and power evaluation. Bioinformatics, 35 (19) 3852-3854. doi:10.1093/bioinformatics/btz126. PMID 30848784

ABSTRACT

MOTIVATION: Plenty of Genome-Wide-Association-Study (GWAS) methods have been developed for mapping genetic markers that associated with human diseases and agricultural economic traits. Computer simulation is a nice tool to test the performances of various GWAS methods under certain scenarios. Existing tools are either inefficient in terms of computation and memory efficiency or inconvenient to use to simulate big, realistic genotype data and phenotype data to evaluate available GWAS methods. RESULTS: Here, we present a GWAS simulation tool named G2P that can be used to simulate genotype data, phenotype data and perform power evaluation of GWAS methods. G2P is a user-friendly tool with all functions is provided in both graphical user interface and pipeline manners and it is available for Windows, Mac and Linux environments. Furthermore, G2P achieves maximum efficiency in terms of both memory usage and simulation speed; with G2P, the simulation of genotype data that includes 1 000 000 samples and 2 000 000 markers can be accomplished in 5 h. AVAILABILITY AND IMPLEMENTATION: The G2P software, user manual, and example datasets are freely available at GitHub: https://github.com/XiaoleiLiuBio/G2P. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.

Show full abstractShow less

DOI

10.1093/bioinformatics/btz126

GCTA

Tool

PUBMED_LINK

21167468

FULL NAME

Genome-wide complex trait analysis (GCTA)

DESCRIPTION

GCTA-GREML analysis:GCTA can simulate a GWAS based on real genotype data.

Show full descriptionShow less

URL

https://yanglab.westlake.edu.cn/software/gcta/#GWASSimulation

TITLE

GCTA: a tool for genome-wide complex trait analysis.

Main citation

Yang J, Lee SH, Goddard ME, Visscher PM. (2011) GCTA: a tool for genome-wide complex trait analysis. Am J Hum Genet, 88 (1) 76-82. doi:10.1016/j.ajhg.2010.11.011. PMID 21167468

ABSTRACT

For most human complex diseases and traits, SNPs identified by genome-wide association studies (GWAS) explain only a small fraction of the heritability. Here we report a user-friendly software tool called genome-wide complex trait analysis (GCTA), which was developed based on a method we recently developed to address the "missing heritability" problem. GCTA estimates the variance explained by all the SNPs on a chromosome or on the whole genome for a complex trait rather than testing the association of any particular SNP to the trait. We introduce GCTA's five main functions: data management, estimation of the genetic relationships from SNPs, mixed linear model analysis of variance explained by the SNPs, estimation of the linkage disequilibrium structure, and GWAS simulation. We focus on the function of estimating the variance explained by all the SNPs on the X chromosome and testing the hypotheses of dosage compensation. The GCTA software is a versatile tool to estimate and partition complex trait variation with large GWAS data sets.

Show full abstractShow less

DOI

10.1016/j.ajhg.2010.11.011

HAPGEN2

Tool

PUBMED_LINK

21653516

DESCRIPTION

HAPGEN2 is a an updated version of the program HAPGEN, which simulates case control datasets at SNP markers. The new version can now simulate multiple disease SNPs on a single chromosome, on the assumption that each disease SNP acts independently and are in Hardy-Weinberg equilibrium. We also supply a R package that can simulate interaction between the disease SNPs.

Show full descriptionShow less

URL

https://mathgen.stats.ox.ac.uk/genetics_software/hapgen/hapgen2.html

TITLE

HAPGEN2: simulation of multiple disease SNPs.

Main citation

Su Z, Marchini J, Donnelly P. (2011) HAPGEN2: simulation of multiple disease SNPs. Bioinformatics, 27 (16) 2304-5. doi:10.1093/bioinformatics/btr341. PMID 21653516

ABSTRACT

MOTIVATION: Performing experiments with simulated data is an inexpensive approach to evaluating competing experimental designs and analysis methods in genome-wide association studies. Simulation based on resampling known haplotypes is fast and efficient and can produce samples with patterns of linkage disequilibrium (LD), which mimic those in real data. However, the inability of current methods to simulate multiple nearby disease SNPs on the same chromosome can limit their application. RESULTS: We introduce a new simulation algorithm based on a successful resampling method, HAPGEN, that can simulate multiple nearby disease SNPs on the same chromosome. The new method, HAPGEN2, retains many advantages of resampling methods and expands the range of disease models that current simulators offer. AVAILABILITY: HAPGEN2 is freely available from http://www.stats.ox.ac.uk/~marchini/software/gwas/gwas.html. CONTACT: zhan@well.ox.ac.uk SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.

Show full abstractShow less

DOI

10.1093/bioinformatics/btr341

SIMER

Tool

FULL NAME

Data Simulation for Life Science and Breeding

DESCRIPTION

Data Simulation for Life Science and Breeding

Show full descriptionShow less

URL

https://github.com/xiaolei-lab/SIMER#genotype-data

ms

Tool

PUBMED_LINK

11847089

TITLE

Generating samples under a Wright-Fisher neutral model of genetic variation.

Main citation

Hudson RR. (2002) Generating samples under a Wright-Fisher neutral model of genetic variation. Bioinformatics, 18 (2) 337-8. doi:10.1093/bioinformatics/18.2.337. PMID 11847089

ABSTRACT

A Monte Carlo computer program is available to generate samples drawn from a population evolving according to a Wright-Fisher neutral model. The program assumes an infinite-sites model of mutation, and allows recombination, gene conversion, symmetric migration among subpopulations, and a variety of demographic histories. The samples produced can be used to investigate the sampling properties of any sample statistic under these neutral models.

Show full abstractShow less

DOI

10.1093/bioinformatics/18.2.337

sim1000G

Tool

PUBMED_LINK

30646839

DESCRIPTION

a user-friendly genetic variant simulator in R for unrelated individuals and family-based designs

Show full descriptionShow less

URL

https://github.com/adimitromanolakis/sim1000G

TITLE

sim1000G: a user-friendly genetic variant simulator in R for unrelated individuals and family-based designs.

Main citation

Dimitromanolakis A, Xu J, Krol A, Briollais L. (2019) sim1000G: a user-friendly genetic variant simulator in R for unrelated individuals and family-based designs. BMC Bioinformatics, 20 (1) 26. doi:10.1186/s12859-019-2611-1. PMID 30646839

ABSTRACT

BACKGROUND: Simulation of genetic variants data is frequently required for the evaluation of statistical methods in the fields of human and animal genetics. Although a number of high-quality genetic simulators have been developed, many of them require advanced knowledge in population genetics or in computation to be used effectively. In addition, generating simulated data in the context of family-based studies demands sophisticated methods and advanced computer programming. RESULTS: To address these issues, we propose a new user-friendly and integrated R package, sim1000G, which simulates variants in genomic regions among unrelated individuals or among families. The only input needed is a raw phased Variant Call Format (VCF) file. Haplotypes are extracted to compute linkage disequilibrium (LD) in the simulated genomic regions and for the generation of new genotype data among unrelated individuals. The covariance across variants is used to preserve the LD structure of the original population. Pedigrees of arbitrary sizes are generated by modeling recombination events with sim1000G. To illustrate the application of sim1000G, various scenarios are presented assuming unrelated individuals from a single population or two distinct populations, or alternatively for three-generation pedigree data. Sim1000G can capture allele frequency diversity, short and long-range linkage disequilibrium (LD) patterns and subtle population differences in LD structure without the need of any tuning parameters. CONCLUSION: Sim1000G fills a gap in the vast area of genetic variants simulators by its simplicity and independence from external tools. Currently, it is one of the few simulation packages completely integrated into R and able to simulate multiple genetic variants among unrelated individuals and within families. Its implementation will facilitate the application and development of computational methods for association studies with both rare and common variants.

Show full abstractShow less

DOI

10.1186/s12859-019-2611-1

simGWAS

Tool

PUBMED_LINK

30371734

DESCRIPTION

a fast method for simulation of large scale case–control GWAS summary statistics

Show full descriptionShow less

URL

https://github.com/chr1swallace/simGWAS

TITLE

simGWAS: a fast method for simulation of large scale case-control GWAS summary statistics.

Main citation

Fortune MD, Wallace C. (2019) simGWAS: a fast method for simulation of large scale case-control GWAS summary statistics. Bioinformatics, 35 (11) 1901-1906. doi:10.1093/bioinformatics/bty898. PMID 30371734

ABSTRACT

MOTIVATION: Methods for analysis of GWAS summary statistics have encouraged data sharing and democratized the analysis of different diseases. Ideal validation for such methods is application to simulated data, where some 'truth' is known. As GWAS increase in size, so does the computational complexity of such evaluations; standard practice repeatedly simulates and analyses genotype data for all individuals in an example study. RESULTS: We have developed a novel method based on an alternative approach, directly simulating GWAS summary data, without individual data as an intermediate step. We mathematically derive the expected statistics for any set of causal variants and their effect sizes, conditional upon control haplotype frequencies (available from public reference datasets). Simulation of GWAS summary output can be conducted independently of sample size by simulating random variates about these expected values. Across a range of scenarios, our method, produces very similar output to that from simulating individual genotypes with a substantial gain in speed even for modest sample sizes. Fast simulation of GWAS summary statistics will enable more complete and rapid evaluation of summary statistic methods as well as opening new potential avenues of research in fine mapping and gene set enrichment analysis. AVAILABILITY AND IMPLEMENTATION: Our method is available under a GPL license as an R package from http://github.com/chr1swallace/simGWAS. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.

Show full abstractShow less

DOI

10.1093/bioinformatics/bty898

twas_sim

Tool

PUBMED_LINK

37099718

DESCRIPTION

A python software leveraging real genotype data to simulate a complex trait as a function of latent expression, fit eQTL weights in independent data, and perform GWAS/TWAS on the complex trait.

Show full descriptionShow less

URL

https://github.com/mancusolab/twas_sim

TITLE

twas_sim, a Python-based tool for simulation and power analysis of transcriptome-wide association analysis.

Main citation

Wang X, Lu Z, Bhattacharya A, Pasaniuc B, ...&, Mancuso N. (2023) twas_sim, a Python-based tool for simulation and power analysis of transcriptome-wide association analysis. Bioinformatics, 39 (5) . doi:10.1093/bioinformatics/btad288. PMID 37099718

ABSTRACT

SUMMARY: Genome-wide association studies (GWASs) have identified numerous genetic variants associated with complex disease risk; however, most of these associations are non-coding, complicating identifying their proximal target gene. Transcriptome-wide association studies (TWASs) have been proposed to mitigate this gap by integrating expression quantitative trait loci (eQTL) data with GWAS data. Numerous methodological advancements have been made for TWAS, yet each approach requires ad hoc simulations to demonstrate feasibility. Here, we present twas_sim, a computationally scalable and easily extendable tool for simplified performance evaluation and power analysis for TWAS methods. AVAILABILITY AND IMPLEMENTATION: Software and documentation are available at https://github.com/mancusolab/twas_sim.

Show full abstractShow less

DOI

10.1093/bioinformatics/btad288