Skip Navigation


Bioinformatics Advance Access originally published online on April 21, 2006
Bioinformatics 2006 22(18):2317-2318; doi:10.1093/bioinformatics/btl153
This Article
Right arrow Abstract Freely available
Right arrow FREE Full Text (Print PDF) Freely available
Right arrow All Versions of this Article:
22/18/2317    most recent
btl153v1
Right arrow Alert me when this article is cited
Right arrow Alert me if a correction is posted
Services
Right arrow Email this article to a friend
Right arrow Similar articles in this journal
Right arrow Similar articles in ISI Web of Science
Right arrow Similar articles in PubMed
Right arrow Alert me to new issues of the journal
Right arrow Add to My Personal Archive
Right arrow Download to citation manager
Right arrow Search for citing articles in:
ISI Web of Science (1)
Right arrowRequest Permissions
Google Scholar
Right arrow Articles by Mailund, T.
Right arrow Articles by Schauser, L.
Right arrow Search for Related Content
PubMed
Right arrow PubMed Citation
Right arrow Articles by Mailund, T.
Right arrow Articles by Schauser, L.
Social Bookmarking
 Add to CiteULike   Add to Connotea   Add to Del.icio.us  
What's this?

© The Author 2006. Published by Oxford University Press. All rights reserved. For Permissions, please email: journals.permissions@oxfordjournals.org

GeneRecon—a coalescent based tool for fine-scale association mapping

Thomas Mailund 1,2,4,*, Mikkel H. Schierup 1,2, Christian N. S. Pedersen 1,2,3, Jesper N. Madsen 2, Jotun Hein 4 and Leif Schauser 1,2

1 Bioinformatics Research Center, University of Aarhus Høegh-Guldbergs Gade 10, DK-8000 Århus C, Denmark
2 Bioinformatics ApS Høegh-Guldbergs Gade 10, DK-8000 Århus C, Denmark
3 Department of Computer Science, University of Aarhus IT-Parken, DK-8200 Århus N, Denmark
4 Department of Statistics, University of Oxford Oxford, OX1 3TG, UK

*To whom correspondence should be addressed.


    ABSTRACT
 TOP
 ABSTRACT
 1 INTRODUCTION
 2 THE MODEL
 3 IMPLEMENTATION
 4 FLEXIBILITY OF USING...
 5 PERFORMANCE
 6 CONCLUSION
 REFERENCES
 

Summary: GeneRecon is a tool for fine-scale association mapping using a coalescence model. GeneRecon takes as input case–control data from phased or unphased SNP and microsatellite genotypes. The posterior distribution of disease locus position is obtained by Metropolis-Hastings sampling in the state space of genealogies. Input format, search strategy and the sampled statistics can be configured through the Guile Scheme programming language embedded in GeneRecon, making GeneRecon highly configurable.

Availability: The source code for GeneRecon, written in C++ and Scheme, is available under the GNU General Public License (GPL) at http://www.birc.au.dk/~mailund/GeneRecon

Contact: mailund{at}birc.au.dk


    1 INTRODUCTION
 TOP
 ABSTRACT
 1 INTRODUCTION
 2 THE MODEL
 3 IMPLEMENTATION
 4 FLEXIBILITY OF USING...
 5 PERFORMANCE
 6 CONCLUSION
 REFERENCES
 
We have implemented a software package, GeneRecon, based on extensions of the shattered coalescence model of Morris et al. (2002) for Bayesian Markov chain Monte Carlo (MCMC) fine-scale linkage disequilibrium (LD) gene mapping. GeneRecon uses the coalescent model (Hein et al., 2005) to explicitly model the genealogy of a sample of case chromosomes. The location of the mutation influencing the disease is inferred based on the observed LD at multiple genetic markers. Given the computational complexity of the problem, a Metropolis-Hastings algorithm is deployed to integrate over unknown population genetic parameters of the coalescence model and sample the marginal posterior probability density for the parameter(s) of interest.


    2 THE MODEL
 TOP
 ABSTRACT
 1 INTRODUCTION
 2 THE MODEL
 3 IMPLEMENTATION
 4 FLEXIBILITY OF USING...
 5 PERFORMANCE
 6 CONCLUSION
 REFERENCES
 
GeneRecon handles both SNP and microsatellite marker genotype or haplotype data from case/control design studies. Phenocopies and locus- and allele-heterogeneity are modeled in two ways. First, the ‘shattered’ coalescent allows genealogical independence of coalescent subtrees (Morris et al., 2002). Second, cases are partitioned into two clusters, a ‘null’-cluster which is not evaluated by the model, and hence greatly reduces the search space, and a ‘mutation’-cluster of cases which is evaluated by the model (Liu et al., 2001).


    3 IMPLEMENTATION
 TOP
 ABSTRACT
 1 INTRODUCTION
 2 THE MODEL
 3 IMPLEMENTATION
 4 FLEXIBILITY OF USING...
 5 PERFORMANCE
 6 CONCLUSION
 REFERENCES
 
GeneRecon can be obtained from its homepage, where instructions for the installation are provided. The MCMC engine of GeneRecon is written in C++ and is available as a command-line executable for Linux. A ‘Getting started’ document provides an introduction to the functionality of GeneRecon, whereas a user’s manual provides examples of more advanced uses, including examples of using Guile Scheme.


    4 FLEXIBILITY OF USING SCHEME
 TOP
 ABSTRACT
 1 INTRODUCTION
 2 THE MODEL
 3 IMPLEMENTATION
 4 FLEXIBILITY OF USING...
 5 PERFORMANCE
 6 CONCLUSION
 REFERENCES
 
Using the Guile Scheme programming language as a front-end for input specifications and execution control allows a highly flexible interaction with the MCMC engine of GeneRecon. A collection of Guile modules allow easy changes to functionality specifications. Input file format, population genetic parameters, MCMC sampling strategy and output options can be configured. Prior knowledge of population genetic parameters, such as effective population size (Ne) or local recombination rates ({rho}) may be explicitly defined, if available from independent sources such as HapMap or the DeCODE genetic map. At Present, sampling of the likelihood, disease location, effective population size, coalescent tree and cluster indices are supported. The MCMC sampling strategy is defined by the number of iterations, the burn-in period and the proposal densities of the sampled parameters. The choice of strategy will strongly affect the mixing properties of the Markov chain and convergence to a stationary distibution [for details on MCMC stategies see Gilks et al. (1995) or Liu (2001)].


    5 PERFORMANCE
 TOP
 ABSTRACT
 1 INTRODUCTION
 2 THE MODEL
 3 IMPLEMENTATION
 4 FLEXIBILITY OF USING...
 5 PERFORMANCE
 6 CONCLUSION
 REFERENCES
 
To evaluate the prediction capabilities of GeneRecon, we have conducted a large simulation study (Mailund et al., 2005a), where sequence data were simulated under various parameters using the CoaSim tool (Mailund et al., 2005b), and then analyzed using GeneRecon, with four independent runs for each dataset. Results for a Mendelian scenario, i.e. all cases carry the disease causing mutation and all controls are wild types, are shown in Table 1.


View this table:
[in this window]
[in a new window]

 
Table 1 GeneRecon error, as measured by distance from the inferred to the true position of the disease locus of simulated haplotype SNP data from 100 cases and controls

 
We have also tested GeneRecon on the {Delta}F508 mutation for cystic fibrosis data from Kerem et al. (1989). The results from this analysis are shown in Figure 1. GeneRecon compares favorably with other fine-maping tools (Table 2).


Figure 1
View larger version (18K):
[in this window]
[in a new window]
[Download PowerPoint slide]
 
Fig. 1 Example of GeneRecon inferring the position of the {Delta}F508 mutation for cystic fibrosis with data from Kerem et al. (1989). The MCMC was allowed to burn in for 20-000 iterations and the posterior was sampled from the following 180-000 iteration. In the posterior plot, the true position is indicated by the solid vertical line and the 95% credibility interval is indicated by dashed vertical lines. Ticks at the x-axis indicate the position of SNP markers.

 


View this table:
[in this window]
[in a new window]

 
Table 2 Comparison of location estimates of the {Delta}F508 mutation for cystic fibrosis data Kerem et al. (1989) by GeneRecon and other coalescent-based fine mapping tools

 

    6 CONCLUSION
 TOP
 ABSTRACT
 1 INTRODUCTION
 2 THE MODEL
 3 IMPLEMENTATION
 4 FLEXIBILITY OF USING...
 5 PERFORMANCE
 6 CONCLUSION
 REFERENCES
 
GeneRecon is designed to allow flexible multimarker LD mapping using coalescent model. Adaptations and extensions of the various Scheme modules provided allow users to accomodate a wide range of scenarios, data types, sampling strategies and convergence diagnostics, without much loss of user friendliness compared to competing software.


    Acknowledgments
 
T.M. is funded by the Danish Research Agency, FNU grant 272-05-0283 and FTP grant 274-05-0365. The project is supported by the ISIS project 123 to L.S. and C.N.S.P.

Conflict of Interest: none declared.


    FOOTNOTES
 
Associate Editor: Martin Bishop

Received on February 8, 2006; revised on April 5, 2006; accepted on April 16, 2006

    REFERENCES
 TOP
 ABSTRACT
 1 INTRODUCTION
 2 THE MODEL
 3 IMPLEMENTATION
 4 FLEXIBILITY OF USING...
 5 PERFORMANCE
 6 CONCLUSION
 REFERENCES
 

    In Gilks, W.R., Richardson, S., Spiegelhalter, D.J. (Eds.). Markov Chain Monte Carlo in Practice, . (1995) , London, UK Chapman & Hall.

    Hein, J., Schierup, M., Wiuf, C. Gene Genealogies, Variation and Evolution: A Primer in Coalescent Theory, (2005) , Oxford, UK Oxford University Press.

    Kerem, B., et al. (1989) Identification of the cystic fibrosis gene: genetic analysis. Science, 245, 1073–1080[Abstract/Free Full Text].

    Liu, J.S., et al. (2001) Bayesian analysis of haplotypes for linkage disequilibrium mapping. Genome Res, . 11, 1716–1724[Abstract/Free Full Text].

    Liu, J.S. Monte Carlo Strategies in Scientific Computing, (2001) , New York, NY Springer-Verlag.

    Mailund, T., Pedersen, C.N.S., Bardino, J., Vinter, B., Karlsen, H.H. (2005) Initial experiences with GeneRecon on MiG. Proceedings of the 2005 International Conference on Grid Computing and Applications (GCA’05)Monte Carlo Resort, Las Vegas, Nevada, USA.

    Mailund, T., et al. (2005) CoaSim: a flexible environment for simulating genetic data under coalescent models. BMC Bioinformatics, 6, 252[CrossRef][Medline].

    Morris, A.P., et al. (2002) Fine-scale mapping of disease loci via shattered coalescent modeling of genealogies. Am. J. Hum. Genet, . 70, 686–707[CrossRef][ISI][Medline].


Add to CiteULike CiteULike   Add to Connotea Connotea   Add to Del.icio.us Del.icio.us    What's this?



This Article
Right arrow Abstract Freely available
Right arrow FREE Full Text (Print PDF) Freely available
Right arrow All Versions of this Article:
22/18/2317    most recent
btl153v1
Right arrow Alert me when this article is cited
Right arrow Alert me if a correction is posted
Services
Right arrow Email this article to a friend
Right arrow Similar articles in this journal
Right arrow Similar articles in ISI Web of Science
Right arrow Similar articles in PubMed
Right arrow Alert me to new issues of the journal
Right arrow Add to My Personal Archive
Right arrow Download to citation manager
Right arrow Search for citing articles in:
ISI Web of Science (1)
Right arrowRequest Permissions
Google Scholar
Right arrow Articles by Mailund, T.
Right arrow Articles by Schauser, L.
Right arrow Search for Related Content
PubMed
Right arrow PubMed Citation
Right arrow Articles by Mailund, T.
Right arrow Articles by Schauser, L.
Social Bookmarking
 Add to CiteULike   Add to Connotea   Add to Del.icio.us  
What's this?