Skip Navigation


Bioinformatics Advance Access originally published online on June 19, 2008
Bioinformatics 2008 24(16):1821-1822; doi:10.1093/bioinformatics/btn317
This Article
Right arrow Abstract Freely available
Right arrow FREE Full Text (Print PDF) Freely available
Right arrow Supplementary Data
Right arrow All Versions of this Article:
24/16/1821    most recent
btn317v1
Right arrow Comments: Submit a response
Right arrow Alert me when this article is cited
Right arrow Alert me when Comments are posted
Right arrow Alert me if a correction is posted
Services
Right arrow Email this article to a friend
Right arrow Similar articles in this journal
Right arrow Similar articles in ISI Web of Science
Right arrow Similar articles in PubMed
Right arrow Alert me to new issues of the journal
Right arrow Add to My Personal Archive
Right arrow Download to citation manager
Right arrow Search for citing articles in:
ISI Web of Science (1)
Right arrowRequest Permissions
Google Scholar
Right arrow Articles by Lambert, B. W.
Right arrow Articles by Weiss, K. M.
Right arrow Search for Related Content
PubMed
Right arrow PubMed Citation
Right arrow Articles by Lambert, B. W.
Right arrow Articles by Weiss, K. M.
Social Bookmarking
 Add to CiteULike   Add to Connotea   Add to Del.icio.us  
What's this?

© The Author 2008. Published by Oxford University Press. All rights reserved. For Permissions, please email: journals.permissions@oxfordjournals.org

ForSim: a tool for exploring the genetic architecture of complex traits with controlled truth

Brian W. Lambert 1,*, Joseph D. Terwilliger 2,3 and Kenneth M. Weiss 1,*

1Department of Anthropology, Penn State University, University Park, PA, 2Department of Psychiatry, Genetics and Development, Columbia Genome Center, Columbia University and 3Division of Medical Genetics, New York StatePsychiatric Institute, New York, NY, USA

*To whom correspondence should be addressed.


    ABSTRACT
 TOP
 ABSTRACT
 1 INTRODUCTION
 2 METHODS
 3 DISCUSSION
 ACKNOWLEDGEMENTS
 REFERENCES
 

Many important problems in biology involve complex traits affected by multiple coding or regulatory parts of the genome. How well the underlying genetic architecture can be inferred by statistical methods such as mapping and association studies are active research areas. ForSim is a flexible forward evolutionary simulation tool for exploring the consequences of evolution by phenotype, whereby demographic, chance, behavioral and selective effects mold genetic architecture. Simulation is useful for exploring these issues as well as the choice of study design inferential methods.

Contact: bwl1{at}psu.edu

Supplementary information: Supplementary data are available at Bioinformatics online.


    1 INTRODUCTION
 TOP
 ABSTRACT
 1 INTRODUCTION
 2 METHODS
 3 DISCUSSION
 ACKNOWLEDGEMENTS
 REFERENCES
 
We have today, at best, a generic understanding of the distributional characteristics even of the most important parameters that generate the genetic architecture of complex traits, such as allelic effects on phenotypes, or their effects on evolutionary fitness. ForSim can simulate a wide range of user-constructed plausible models to test consistency with empirical data, and help optimize study designs to infer the underlying architecture from samples.

The most important simulation tool in the last 20 years for evolutionary processes as well as genetic inference has been backward, or coalescent, simulation (Hudson and Kaplan, 1988). Coalescent approaches are fast but limited in terms of the complexity of scenarios they can simulate, such as selection, complex genetic architecture, penetrance, environmental effects, recombination, population structure, and they typically assume evolutionary equilibrium.

A more realistic approach is forward simulation. A starting ancestral population is simulated forward in time from some starting time to the present (Hey, 2004; Hoggart et al., 2007; Peng et al., 2007). Forward simulation substantially raises the demand for memory and CPU time, but hardware is rapidly improving and can handle an even more flexible simulation of the evolution of complex traits, when many genes contribute. Comparisons can be made between neutrally evolving and selected traits as well as the effects of demographic complexity, and changing environments. It is the core aspects of genetic architecture that are deeply conserved in nature: the number of genes, nature of gene pathways, etc. While these are stable, what controls trait diversification are allelic effects that are laid upon this underlying structure.


    2 METHODS
 TOP
 ABSTRACT
 1 INTRODUCTION
 2 METHODS
 3 DISCUSSION
 ACKNOWLEDGEMENTS
 REFERENCES
 
ForSim simulates evolution naturally. Almost every level of this process and many aspects of the genetic architecture of traits and populations can be controlled by the user (Table 1). ForSim allows users to define the number, lengths and location of genes and chromosomes, the genetic contributions and interactions, environmental effects and other conditions. Multigenic traits and simple networks of genetic contributions can be specified. An arbitrary number of populations, their local environments, phenotype-based natural selection, gene flow and mate choice, as well as time-variable changes in these processes, can all be defined by the user. Phenotypes are determined by user-specified quantitative effects of genes and environments, and relative fitness is based on user-specified criteria.


View this table:
[in this window]
[in a new window]

 
Table 1. Current features of ForSim

 
A mutation is stochastically assigned a phenotypic effect governed by a user-specified probability and it can be neutral, lethal, negative or positive. The effect amount is determined from a user-parameterized gamma distribution. The effect of a gene is comprised of the sum of effects of the polymorphisms its haplotypes contain. Users define the number of phenotypes and the contributions of genes to these phenotypes through traditional algebraic syntax such that ‘PhenA=(G1+G2)*G3^2’ meaning the trait is the product of the contribution of gene G32 and the sum of genes G1 and G2. Genes may contribute to any number of phenotypes, and when a gene is specified to affect more than one trait, the traits become correlated. For example, defining PhenB=(G4+G2)*G3^2 will cause PhenA and PhenB to be correlated. The user can specify random universal or family-specific environmental contributions that can be specific to each phenotype and can vary over populations and time to simulate epidemiologically or ecologically important effects.

Natural selection can be modeled as a deterministic truncation or probabilistic process, based on a user-specified function relating fitness of the metric phenotype of an individual to its distance from the population mean or some user-specified optimal phenotype value (which may be changed during the simulation). Phenotype means and variances can be traced through generations in the post-run output data, from which multivariate phenotype and genotype analysis can easily be done.

Probabilities of mate choice within and between populations are user specifiable and can be phenotype based to test selective migration and assortative mating. Specified population size is reached and maintained by logistic growth scaled by adjusting expected family size (sibship size is stochastic).

All these parameters are defined in a user-authored input file, with a block-structured syntax with entities (populations, chromosomes, etc.) named and their attributes defined. ForSim is distributed with several example input files as well as a syntax-highlighting EMACS mode to assist in editing and scripting. At the end of the simulation, complete genotype and phenotype data are saved for each individual in each population, LINKAGE-format pedigrees for each individual, along with quantitatively ascertained cases and controls are saved for SNP association tests, and parent-offspring trios for LD and haplotype analysis. The entire history of every allele can be saved so that analytic approaches can be developed on the basis of full information.

ForSim is not a model-fitting or empirical hypothesis-testing tool, but can test models to see if they generate plausible results compared to empirical data. For example, using well-known human genetic parameters for mutation, recombination, effective population size and trait prevalence, ForSim produces data consistent with the observed and/or theoretically expected values of nucleotide diversity, segregating SNPs in samples, sibling relative risk, linkage disequilibrium structure and so on as these are observed in human data (e.g. www.hapmap.org). Reasonable selection scenarios generate correspondingly reduced nucleotide and haplotype heterozygosity, increased haplotype length, etc. Statistical tests of results that include traits that were, and were not, subject to selection can determine whether the specified selection, migration and so on, can be detected in the results. Examples are in the ForSim user manual and Supplementary Material.

Runtime depends on the complexity of simulated conditions, especially the population size(s) and number of genes and generations. On a 2.8 Ghz Intel Pentium D processor with 2 GB RAM, a ForSim simulation of a population of 10 000 for 10 000 generations (roughly the age and effective population size of the human species), for a chromosome of 10 Mb containing 10 neutrally evolving genes (five of 20 Kb and five of 50 Kb), a mutation of 2.5 * 10–8/nt and one cM=106 base pairs, with normally distributed environmental phenotypic noise, takes 28 min. Time scales up, sometimes non-linearly, with increase in these values, because SNP sojourn times are non-linear with respect to population size until equilibrium is established, and all current and previously fixed SNPs must be tracked. Small test runs can guide users in specifying the simplest, fastest running conditions that can adequately address their objectives.


    3 DISCUSSION
 TOP
 ABSTRACT
 1 INTRODUCTION
 2 METHODS
 3 DISCUSSION
 ACKNOWLEDGEMENTS
 REFERENCES
 
ForSim can simulate simple controlled situations effectively and efficiently, but it can also simulate complex situations, that require more memory and runtime. To our knowledge, no other current programs are as general and flexible. The program is written in portable C++, with optional wrapper scripts for the preparation and presentation of data. These wrapper scripts, written in Ruby, automate graphical output via R (www.r-project.org), as well as haplotype and SNP-tagging analysis using Haploview and haplotype alignments via ClustalW. ForSim source code and user manual are available free upon request (www.anthro.psu.edu/weiss_lab/research.shtml#ForSim), with agreement to an open source non-commercialization license. It builds and runs under Linux/Unix/MacOS and should run under Windows with CygWin or MinGW. Updates and errata will be posted and sent to registered users.


    ACKNOWLEDGEMENTS
 TOP
 ABSTRACT
 1 INTRODUCTION
 2 METHODS
 3 DISCUSSION
 ACKNOWLEDGEMENTS
 REFERENCES
 
Funding: This work was supported by NIH grant number MH63749.

Conflict of Interest: none declared.


    FOOTNOTES
 
Associate Editor: Martin Bishop

Received on March 10, 2008; revised on May 22, 2008; accepted on May 22, 2008

    REFERENCES
 TOP
 ABSTRACT
 1 INTRODUCTION
 2 METHODS
 3 DISCUSSION
 ACKNOWLEDGEMENTS
 REFERENCES
 

    Hey J. FPG–a computer program for forward population genetic simulation. (2004) Web document: available at http://lifesci.rutgers.edu/heylab/HeylabSoftware.htm#FPG (last accessed date July 14, 2008).

    Hoggart CJ, et al. Sequence-level population simulations over large genomic regions. Genetics (2007) 177:1725–1731.[Abstract/Free Full Text]

    Hudson RR, Kaplan NL. The coalescent process in models with selection and recombination. Genetics (1988) 120:831–840.[Abstract/Free Full Text]

    Peng B, et al. Forward-time simulations of human populations with complex diseases. PLoS Genet (2007) 3:e47.[CrossRef][Medline]


Add to CiteULike CiteULike   Add to Connotea Connotea   Add to Del.icio.us Del.icio.us    What's this?


This article has been cited by other articles:


Home page
Cancer Res.Home page
P. Greenwald and B. K. Dunn
Landmarks in the History of Cancer Epidemiology
Cancer Res., March 15, 2009; 69(6): 2151 - 2162.
[Abstract] [Full Text] [PDF]


Home page
Brief BioinformHome page
Y. Kim and T. Wiehe
Simulation of DNA sequence evolution under models of recent directional selection
Brief Bioinform, January 1, 2009; 10(1): 84 - 96.
[Abstract] [Full Text] [PDF]


This Article
Right arrow Abstract Freely available
Right arrow FREE Full Text (Print PDF) Freely available
Right arrow Supplementary Data
Right arrow All Versions of this Article:
24/16/1821    most recent
btn317v1
Right arrow Comments: Submit a response
Right arrow Alert me when this article is cited
Right arrow Alert me when Comments are posted
Right arrow Alert me if a correction is posted
Services
Right arrow Email this article to a friend
Right arrow Similar articles in this journal
Right arrow Similar articles in ISI Web of Science
Right arrow Similar articles in PubMed
Right arrow Alert me to new issues of the journal
Right arrow Add to My Personal Archive
Right arrow Download to citation manager
Right arrow Search for citing articles in:
ISI Web of Science (1)
Right arrowRequest Permissions
Google Scholar
Right arrow Articles by Lambert, B. W.
Right arrow Articles by Weiss, K. M.
Right arrow Search for Related Content
PubMed
Right arrow PubMed Citation
Right arrow Articles by Lambert, B. W.
Right arrow Articles by Weiss, K. M.
Social Bookmarking
 Add to CiteULike   Add to Connotea   Add to Del.icio.us  
What's this?