Bioinformatics Advance Access originally published online on May 22, 2007
Bioinformatics 2007 23(14):1851-1853; doi:10.1093/bioinformatics/btm253
| ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
Recombination-filtered genomic datasets by information maximization
1Arizona Research Laboratories – Biotechnology, University of Arizona, Tucson, AZ 85721 and 2Santa Fe Institute, 1399 Hyde Park Road, Santa Fe, NM 87501, USA
*To whom correspondence should be addressed.
| Abstract |
|---|
Summary: With the increasing amount of DNA sequence data available from natural populations, new computational methods are needed to efficiently process raw sequences into formats that are applicable to a variety of analytical methods. One highly successful approach to inferring aspects of demographic history is grounded in coalescent theory. Many of these methods restrict themselves to perfectly tree-like genealogies (i.e. regions with no observed recombination), because theoretical difficulties prevent ready statistical evaluation of recombining regions. However, determining which recombination-filtered dataset to analyze from a larger recombination-rich genomic region is a non-trivial problem. Current applications primarily aim to quantify recombination rates (rather than produce optimal recombination-filtered blocks), require significant manual intervention, and are impractical for multiple genomic datasets in high-throughput, automated research environments. Here, we present a fast, simple and automatable command-line program that extracts optimal recombination-filtered blocks (no four-gamete violations) from recombination-rich genomic re-sequence data.
Availability: http://hammerlab.biosci.arizona.edu/software.html
Contact: mpcox{at}email.arizona.edu
Associate Editor: Martin Bishop
Received on February 28, 2007; revised on May 3, 2007; accepted on May 4, 2007