Bioinformatics Advance Access published online on September 4, 2007
Bioinformatics, doi:10.1093/bioinformatics/btm386
| ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
Simulating association studies: a data-based resampling method for candidate regions or whole genome scans
1Department of Biostatistics2Center for Genome Sciences and 3Center for Environmental Bioinformatics, University of North Carolina, Chapel Hill, 27599, USA4Renaissance Computing Institute, Europa Drive, Chapel Hill, North Carolina5School of Pharmacy and 6Department of Genetics, UNC Chapel Hill
*To whom correspondence should be addressed: Dr. Fred A. Wright, E-mail: fwright{at}bios.unc.edu
| Abstract |
|---|
Motivation: Reductions in genotyping costs have heightened interest in performing whole genome association scans and in the fine mapping of candidate regions. Improvements in study design and analytic techniques will require the simulation of datasets with realis-tic patterns of linkage disequilibrium and allele frequencies for typed SNPs.
Methods: We describe a general approach to simulate genotyped datasets for standard case-control or affected child trio data, by resampling from existing phased datasets. The approach allows for considerable flexibility in disease models, potentially involving a large number of interacting loci. The method is most applicable for diseases caused by common variants that have not been under strong selection, a class specifically targeted by the International HapMap project.
Results: Using the three population Phase I/II HapMap data as a testbed for our approach, we have implemented the approach in HAP-SAMPLE, a web-based simulation tool.
Availability: the web-based tool is available at http://www.hapsample.org
login name: guest
password: Hap!123
Contact: fwright{at}bios.unc.edu
Associate Editor: Prof. Keith Crandall
Received on August 25, 2007; revised on June 21, 2007; accepted on July 20, 2007