GWAsimulator: A rapid whole genome simulation program

URL of this page:

Version 2.1, updated the Linux version to conform to C++ standards enforced in newer versions of gcc. (July 24, 2012)

Version 2.0, allowing simulation of non-multiplicative effects and two-way interaction effects! (November 19, 2007)

GWAsimulator is a C++ program that can simulate genotype data for SNP chips that are used in genome-wide association (GWA) studies. It implements a rapid moving-window algorithm (Durrant et al. 2004. AJHG 75:35-43) to simulate whole genome case-control or population samples. It also can simulate specific regions if desired. For case-control data, the program retrospectively sample cases and controls according to a user-specified multi-locus disease model. The program requires phased data as input, and the simulated data will have similar LD patterns as the input data.

The program can use HapMap phased data as input and has the flexibility of simulating genotypes for different populations and different SNP chips. Because many large-scale GWA data are becoming available, they can be used instead of the HapMap data as the input, as long as the phase information is generated. These data may provide a better representation of the population under study and more accurate LD information than the HapMap due to much larger sample sizes. See the manual for instructions and detailed description of the program.

  • Linux package: GWAsimulator_v2.1_linux.tar.gz
  • Windows package: (The executable file is standalone. However, the feature of output file compression relies on an external program, gzip, that may not be available in many Windows systems. Look at the file README.txt for details.)
  • Mac OS X package: GWAsimulator_v2.0_mac.tar.gz (The executable file is a universal program for x86/ppc/ppc64.)
  • Other OS: Download any package and recompile.
  • Note: You can always recompile the program. Recompilation often can take advantage of the latest compiler technology and can optimize the program to your local hardware/software configurations. Depending on the version of g++, the program may need a single line modification to compile. See the manual for details.

Paper: Li C, Li M (2008) GWAsimulator: A rapid whole genome simulation program. Bioinformatics 24:140-142

Supplementary materials: (1) Supplementary Figure 1: Comparison with the HapMap data using LDU maps. (2) Result comparison with the program hapgen.

HapMap CEU phased data for Illumina HumanHap300 and HumanHap550. The program requires that disease loci are known and included in the input phased data. If you want to specify disease loci that are not on the chip, you need to generate input files yourself. See the manual for details.

Other papers of relevance

Please send your comments and suggestions to

Topic revision: r30 - 24 Jul 2012, ChunLi

This site is powered by FoswikiCopyright © 2013-2017 by the contributing authors. All material on this collaboration platform is the property of the contributing authors.
Ideas, requests, problems regarding Vanderbilt Biostatistics Wiki? Send feedback