Bioinformatics Advance Access originally published online on April 21, 2006
Bioinformatics 2006 22(18):2317-2318; doi:10.1093/bioinformatics/btl153
| ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
GeneRecona coalescent based tool for fine-scale association mapping
1 Bioinformatics Research Center, University of Aarhus Høegh-Guldbergs Gade 10, DK-8000 Århus C, Denmark
2 Bioinformatics ApS Høegh-Guldbergs Gade 10, DK-8000 Århus C, Denmark
3 Department of Computer Science, University of Aarhus IT-Parken, DK-8200 Århus N, Denmark
4 Department of Statistics, University of Oxford Oxford, OX1 3TG, UK
*To whom correspondence should be addressed.
| ABSTRACT |
|---|
|
|
|---|
Summary: GeneRecon is a tool for fine-scale association mapping using a coalescence model. GeneRecon takes as input casecontrol data from phased or unphased SNP and microsatellite genotypes. The posterior distribution of disease locus position is obtained by Metropolis-Hastings sampling in the state space of genealogies. Input format, search strategy and the sampled statistics can be configured through the Guile Scheme programming language embedded in GeneRecon, making GeneRecon highly configurable.
Availability: The source code for GeneRecon, written in C++ and Scheme, is available under the GNU General Public License (GPL) at http://www.birc.au.dk/~mailund/GeneRecon
Contact: mailund{at}birc.au.dk
| 1 INTRODUCTION |
|---|
|
|
|---|
We have implemented a software package, GeneRecon, based on extensions of the shattered coalescence model of Morris et al. (2002) for Bayesian Markov chain Monte Carlo (MCMC) fine-scale linkage disequilibrium (LD) gene mapping. GeneRecon uses the coalescent model (Hein et al., 2005) to explicitly model the genealogy of a sample of case chromosomes. The location of the mutation influencing the disease is inferred based on the observed LD at multiple genetic markers. Given the computational complexity of the problem, a Metropolis-Hastings algorithm is deployed to integrate over unknown population genetic parameters of the coalescence model and sample the marginal posterior probability density for the parameter(s) of interest.
| 2 THE MODEL |
|---|
|
|
|---|
GeneRecon handles both SNP and microsatellite marker genotype or haplotype data from case/control design studies. Phenocopies and locus- and allele-heterogeneity are modeled in two ways. First, the shattered coalescent allows genealogical independence of coalescent subtrees (Morris et al., 2002). Second, cases are partitioned into two clusters, a null-cluster which is not evaluated by the model, and hence greatly reduces the search space, and a mutation-cluster of cases which is evaluated by the model (Liu et al., 2001).
| 3 IMPLEMENTATION |
|---|
|
|
|---|
GeneRecon can be obtained from its homepage, where instructions for the installation are provided. The MCMC engine of GeneRecon is written in C++ and is available as a command-line executable for Linux. A Getting started document provides an introduction to the functionality of GeneRecon, whereas a users manual provides examples of more advanced uses, including examples of using Guile Scheme.
| 4 FLEXIBILITY OF USING SCHEME |
|---|
|
|
|---|
Using the Guile Scheme programming language as a front-end for input specifications and execution control allows a highly flexible interaction with the MCMC engine of GeneRecon. A collection of Guile modules allow easy changes to functionality specifications. Input file format, population genetic parameters, MCMC sampling strategy and output options can be configured. Prior knowledge of population genetic parameters, such as effective population size (Ne) or local recombination rates (
) may be explicitly defined, if available from independent sources such as HapMap or the DeCODE genetic map. At Present, sampling of the likelihood, disease location, effective population size, coalescent tree and cluster indices are supported. The MCMC sampling strategy is defined by the number of iterations, the burn-in period and the proposal densities of the sampled parameters. The choice of strategy will strongly affect the mixing properties of the Markov chain and convergence to a stationary distibution [for details on MCMC stategies see Gilks et al. (1995) or Liu (2001)]. | 5 PERFORMANCE |
|---|
|
|
|---|
To evaluate the prediction capabilities of GeneRecon, we have conducted a large simulation study (Mailund et al., 2005a), where sequence data were simulated under various parameters using the CoaSim tool (Mailund et al., 2005b), and then analyzed using GeneRecon, with four independent runs for each dataset. Results for a Mendelian scenario, i.e. all cases carry the disease causing mutation and all controls are wild types, are shown in Table 1.
|
We have also tested GeneRecon on the
F508 mutation for cystic fibrosis data from Kerem et al. (1989). The results from this analysis are shown in Figure 1. GeneRecon compares favorably with other fine-maping tools (Table 2).
|
|
| 6 CONCLUSION |
|---|
|
|
|---|
GeneRecon is designed to allow flexible multimarker LD mapping using coalescent model. Adaptations and extensions of the various Scheme modules provided allow users to accomodate a wide range of scenarios, data types, sampling strategies and convergence diagnostics, without much loss of user friendliness compared to competing software.
| Acknowledgments |
|---|
T.M. is funded by the Danish Research Agency, FNU grant 272-05-0283 and FTP grant 274-05-0365. The project is supported by the ISIS project 123 to L.S. and C.N.S.P.
Conflict of Interest: none declared.
| FOOTNOTES |
|---|
Associate Editor: Martin Bishop
Received on February 8, 2006; revised on April 5, 2006; accepted on April 16, 2006
| REFERENCES |
|---|
|
|
|---|
In Gilks, W.R., Richardson, S., Spiegelhalter, D.J. (Eds.). Markov Chain Monte Carlo in Practice, . (1995) , London, UK Chapman & Hall.
Hein, J., Schierup, M., Wiuf, C. Gene Genealogies, Variation and Evolution: A Primer in Coalescent Theory, (2005) , Oxford, UK Oxford University Press.
Kerem, B., et al. (1989) Identification of the cystic fibrosis gene: genetic analysis. Science, 245, 10731080
Liu, J.S., et al. (2001) Bayesian analysis of haplotypes for linkage disequilibrium mapping. Genome Res, . 11, 17161724
Liu, J.S. Monte Carlo Strategies in Scientific Computing, (2001) , New York, NY Springer-Verlag.
Mailund, T., Pedersen, C.N.S., Bardino, J., Vinter, B., Karlsen, H.H. (2005) Initial experiences with GeneRecon on MiG. Proceedings of the 2005 International Conference on Grid Computing and Applications (GCA05)Monte Carlo Resort, Las Vegas, Nevada, USA.
Mailund, T., et al. (2005) CoaSim: a flexible environment for simulating genetic data under coalescent models. BMC Bioinformatics, 6, 252[CrossRef][Medline].
Morris, A.P., et al. (2002) Fine-scale mapping of disease loci via shattered coalescent modeling of genealogies. Am. J. Hum. Genet, . 70, 686707[CrossRef][Web of Science][Medline].
| ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
