Skip Navigation


Bioinformatics Advance Access originally published online on November 16, 2006
Bioinformatics 2006 22(24):3096-3098; doi:10.1093/bioinformatics/btl474
This Article
Right arrow Abstract Freely available
Right arrow FREE Full Text (Print PDF) Freely available
Right arrow All Versions of this Article:
22/24/3096    most recent
btl474v1
Right arrow Comments: Submit a response
Right arrow Alert me when this article is cited
Right arrow Alert me when Comments are posted
Right arrow Alert me if a correction is posted
Services
Right arrow Email this article to a friend
Right arrow Similar articles in this journal
Right arrow Similar articles in PubMed
Right arrow Alert me to new issues of the journal
Right arrow Add to My Personal Archive
Right arrow Download to citation manager
Right arrowRequest Permissions
Google Scholar
Right arrow Articles by Kosakovsky Pond, S. L.
Right arrow Articles by Frost, S. D.W.
Right arrow Search for Related Content
PubMed
Right arrow PubMed Citation
Right arrow Articles by Kosakovsky Pond, S. L.
Right arrow Articles by Frost, S. D.W.
Social Bookmarking
 Add to CiteULike   Add to Connotea   Add to Del.icio.us  
What's this?

© The Author 2006. Published by Oxford University Press. All rights reserved. For Permissions, please email: journals.permissions@oxfordjournals.org

GARD: a genetic algorithm for recombination detection

Sergei L. Kosakovsky Pond *, David Posada 1, Michael B. Gravenor 2, Christopher H. Woelk and Simon D.W. Frost

Department of Pathology, University of California San Diego La Jolla, CA 92093, USA
1 Departamento de Bioquímica, Genética e Inmunología Facultad de Biología, Universidad de Vigo, Vigo 36310, Spain
2 School of Medicine, University of Wales Swansea, UK

*To whom correspondence should be addressed.


    ABSTRACT
 TOP
 ABSTRACT
 1 INTRODUCTION
 2 METHODS AND ALGORITHMS
 3 DISCUSSION
 REFERENCES
 

Motivation: Phylogenetic and evolutionary inference can be severely misled if recombination is not accounted for, hence screening for it should be an essential component of nearly every comparative study. The evolution of recombinant sequences can not be properly explained by a single phylogenetic tree, but several phylogenies may be used to correctly model the evolution of non-recombinant fragments.

Results: We developed a likelihood-based model selection procedure that uses a genetic algorithm to search multiple sequence alignments for evidence of recombination breakpoints and identify putative recombinant sequences. GARD is an extensible and intuitive method that can be run efficiently in parallel. Extensive simulation studies show that the method nearly always outperforms other available tools, both in terms of power and accuracy and that the use of GARD to screen sequences for recombination ensures good statistical properties for methods aimed at detecting positive selection.

Availability: Freely available http://www.datamonkey.org/GARD/

Contact: spond{at}ucsd.edu


    1 INTRODUCTION
 TOP
 ABSTRACT
 1 INTRODUCTION
 2 METHODS AND ALGORITHMS
 3 DISCUSSION
 REFERENCES
 
Recombination can have a profound impact on the evolutionary process and is of interest in its own right. In HIV-1, for instance, recombination rates can rival mutation rates (Zhuang et al., 2002). Recombination can adversely affect the power and accuracy of fundamentally important tools of molecular evolutionary analyses: phylogenetic reconstruction (Posada and Crandall, 2002), molecular clock inference (Schierup and Hein, 2000) and the detection of positively selected sites (Shriner et al., 2003). Consequently, reliable tools for discovering recombination are a critical part of any phylogenetic analysis. A diverse array of algorithms and software tools for detection of recombination have been published. However, when benchmarked on simulated (Posada and Crandall, 2001) and biological (Posada, 2002) data, the methods often gave contradictory results, and no definitive recommendation on which approach should be considered the ‘gold standard' could be made. We developed a robust and extensible approach—Genetic Algorithm Recombination Detection (GARD)—to screen multiple sequence alignments for evidence of phylogenetic incongruence, identify the number and location of breakpoints and sequences involved in putative recombination events. Using simulated and biological datasets we have shown (Kosakovsky Pond et al., 2006) that GARD outperforms the best currently available tools in terms of power and accuracy in a wide-range of evolutionary scenarios.


    2 METHODS AND ALGORITHMS
 TOP
 ABSTRACT
 1 INTRODUCTION
 2 METHODS AND ALGORITHMS
 3 DISCUSSION
 REFERENCES
 
We model recombinant sequences by allowing S ≥ 1 non-recombinant alignment fragments, reconstructing a separate phylogenetic tree for each fragment and evaluating the goodness-of-fit for the model using the small sample Akaike's Information Criterion (Sugiura, 1978) computed with standard phylogenetic likelihood methods and point substitution models [see Kosakovsky Pond et al. (2006) for details]. The computationally challenging component of the model is the search for the locations of S – 1 breakpoints—a problem of O(LS) complexity (L denotes the length of the alignment). When S = 2, an exhaustive examination of all possible locations for the single breakpoint can be undertaken. This single breakpoint (SBP) method performs surprisingly well (Kosakovsky Pond et al., 2006) when a dichotomous classification of alignments into recombinant or non-recombinant is desired, and can be run quickly in a parallel computing environment.

When S > 2, we utilize an aggressive population based hill-climber—the CHC genetic algorithm (Eshelman, 1991)—to search the space of breakpoint locations, encoded as a binary vector of sorted concatenated breakpoint positions. CHC always retains the most fit individual from the previous generation and performs two basic operations on individuals currently in the population:

  1. When two individuals, b1 and b2 are picked to mate, their offspring is equally likely to inherit bit bi from either parent.
  2. If the diversity of the sample (measured by the range of AICc scores normalized by the score of the best individual) falls below a fixed threshold, then all individuals in the population, excluding the most fit one, have a proportion of randomly selected bits toggled.

For fixed S, the algorithm terminates if the best score remains unchanged over 100 consecutive generations. A typical GA run considers 103–104 possible models before converging. To infer S, we start with S = 1 segments and increase S by 1 for subsequent GA runs, until the AICc score of the best model fails to improve further. GARD and SBP have been implemented as HyPhy (Kosakovsky Pond et al., 2005) language scripts enabled to run in an MPI environment. Presently, GARD is hosted on our 80-node cluster and can be accessed via a Web front-end. Standalone scripts or cluster installation instructions can be obtained from the authors upon request and will be made available online if there is sufficient interest. The current implementation, shown schematically in Figure 1, allows the user to:

  1. Upload an alignment of sequences to screen. At present up to 50 aligned DNA/RNA sequences with up to 10 000 nt will be accepted. Both numbers will be increased periodically.
  2. Select an appropriate model of nucleotide evolution (Kosakovsky Pond and Frost, 2005) and specify the distribution used to model site-to-site variation in substitution rates.
  3. Run SBP or GARD screens for recombination.
  4. Visualize and download the results of recombination screens, including: (i) the number and best location of inferred breakpoints, and the improvement in AICc score achieved by the multiple breakpoint model (if any); (ii) model averaged supportfor the location of breakpoints, useful for assessing the degree of confidence; (iii) phylogenetic trees inferred from each non-recombinant breakpoint; (iv) a NEXUS file containing the alignment, inferred partitions and trees.
  5. Result files and HyPhy scripts needed for additional processing and inference (e.g. for further tests of phylogenetic incongruence) can be downloaded and run locally.


Figure 1
View larger version (30K):
[in this window]
[in a new window]
[Download PowerPoint slide]
 
Fig. 1 GARD and SBP server schematic flowchart and sample output.

 
We intend to add new features and analysis options (e.g. protein sequence analysis) with time.


    3 DISCUSSION
 TOP
 ABSTRACT
 1 INTRODUCTION
 2 METHODS AND ALGORITHMS
 3 DISCUSSION
 REFERENCES
 
In practice many widely-used molecular analyses may be confounded by the presence or absence of recombination. Hence, screening for recombination should be an integral part of phylogenetic analyses. We have developed an intuitive and powerful method for detecting evidence of recombination in alignments of DNA sequences. It is able to provide estimates for the number and location of breakpoints, and infer segment-specific phylogenetic trees. GARD does not require a non-recombinant reference alignment and recombination between ancestral sequences is also accommodated. Arbitrarily complex models of point substitution (e.g. those allowing site-to-site variation in substitution rates, or codon models) can be easily incorporated. GARD outperforms other methods and can be run in parallel on a cluster of computers, and so is well suited to screen for recombination in large datasets.


    Acknowledgments
 
This research was supported in part by the National Institutes of Health (AI43638, AI47745, and AI57167, R01-GM66276), the University of California Universitywide AIDS Research Program (IS 02-SD-701), and by a University of California, San Diego Center for AIDS Research/NIAID Developmental Award to SDWF and SLKP (AI36214). D.P. was also supported by grant BFU2004-02700 of the Spanish Ministry of Education and Science and by the ‘Ramón y Cajal’ program of the Spanish government.

Conflict of Interest statement. None declared.


    FOOTNOTES
 
Associate Editor: Christos Ouzounis

Received on July 3, 2006; accepted on September 2, 2006

    REFERENCES
 TOP
 ABSTRACT
 1 INTRODUCTION
 2 METHODS AND ALGORITHMS
 3 DISCUSSION
 REFERENCES
 

    Eshelman, L.J. (1991) The CHC adaptive search algorithm: How to do safe search when engaging in nontraditional genetic recombination. In Spatz, B.M. (Ed.). Foundations of Genetic Algorithms, , San Mateo, CA Morgan Kaufmann, pp. 265–283.

    Kosakovsky Pond, S.L. and Frost, S.D.W. (2005) Datamonkey: rapid detection of selective pressure on individual sites of codon alignments. Bioinformatics, 21, 2531–2533[Abstract/Free Full Text].

    Kosakovsky Pond, S.L. (2005) HyPhy: Hypothesis testing using phylogenies. Bioinformatics, 21, 676–679[Abstract/Free Full Text].

    Kosakovsky Pond, S.L., et al. (2006) Automated phylogenetic detection of recombination using a genetic algorithm. Mol. Biol. Evol, . 23, 1891–1901[Abstract/Free Full Text].

    Posada, D. and Crandall, K.A. (2001) Evaluation of methods for detecting recombination from DNA sequences: computer simulations. Proc. Natl. Acad. Sci. USA, 98, 13757–13762[Abstract/Free Full Text].

    Posada, D. and Crandall, K.A. (2002) The effect of recombination on the accuracy of phylogeny estimation. J. Mol. Evol, . 54, 396–402[Web of Science][Medline].

    Posada, D. (2002) Evaluation of methods for detecting recombination from DNA sequences: empirical data. Mol. Biol. Evol, . 19, 708–717[Abstract/Free Full Text].

    Schierup, M. and Hein, J. (2000) Recombination and the molecular clock. Mol. Biol. Evol, . 17, 1578–1579[Free Full Text].

    Shriner, D., et al. (2003) Potential impact of recombination on sitewise approaches for detecting positive natural selection. Genet. Res, . 81, 115–121[CrossRef][Web of Science][Medline].

    Sugiura, N. (1978) Further analysis of the data by Akaike's information criterion and the finite corrections. Comm. Stat. Theory Meth, . A7, 13–26.

    Zhuang, J., et al. (2002) Human immunodeficiency virus type 1 recombination: rate, fidelity, and putative hot spots. J. Virol, . 76, 11273–11282[Abstract/Free Full Text].


Add to CiteULike CiteULike   Add to Connotea Connotea   Add to Del.icio.us Del.icio.us    What's this?


This article has been cited by other articles:


Home page
J HeredHome page
S. S. Steiger, A. E. Fidler, J. C. Mueller, and B. Kempenaers
Evidence for Adaptive Evolution of Olfactory Receptor Genes in 9 Bird Species
J. Hered., December 4, 2009; (2009) esp105v1.
[Abstract] [Full Text] [PDF]


Home page
Genome ResHome page
T. Lefebure and M. J. Stanhope
Pervasive, genome-wide positive selection leading to functional divergence in the bacterial genus Campylobacter
Genome Res., July 1, 2009; 19(7): 1224 - 1232.
[Abstract] [Full Text] [PDF]


Home page
Nucleic Acids ResHome page
L. C. J. Alcantara, S. Cassol, P. Libin, K. Deforche, O. G. Pybus, M. Van Ranst, B. Galvao-Castro, A.-M. Vandamme, and T. de Oliveira
A standardized framework for accurate, high-throughput genotyping of recombinant and non-recombinant viral sequences
Nucleic Acids Res., July 1, 2009; 37(suppl_2): W634 - W642.
[Abstract] [Full Text] [PDF]


Home page
JEMHome page
N. Goonetilleke, M. K.P. Liu, J. F. Salazar-Gonzalez, G. Ferrari, E. Giorgi, V. V. Ganusov, B. F. Keele, G. H. Learn, E. L. Turnbull, M. G. Salazar, et al.
The first T cell response to transmitted/founder virus contributes to the control of acute viremia in HIV-1 infection
J. Exp. Med., June 8, 2009; 206(6): 1253 - 1272.
[Abstract] [Full Text] [PDF]


Home page
Genome ResHome page
E. E. Smith and H. S. Malik
The apolipoprotein L family of programmed cell death and immunity genes rapidly evolved in primates at discrete sites of host-pathogen interactions
Genome Res., May 1, 2009; 19(5): 850 - 858.
[Abstract] [Full Text] [PDF]


Home page
J. Virol.Home page
M.-R. Abrahams, J. A. Anderson, E. E. Giorgi, C. Seoighe, K. Mlisana, L.-H. Ping, G. S. Athreya, F. K. Treurnicht, B. F. Keele, N. Wood, et al.
Quantitating the Multiplicity of Infection with Human Immunodeficiency Virus Type 1 Subtype C Reveals a Non-Poisson Distribution of Transmitted Variants
J. Virol., April 15, 2009; 83(8): 3556 - 3567.
[Abstract] [Full Text] [PDF]


Home page
J. Virol.Home page
J. Zoll, J. M. D. Galama, and F. J. M. van Kuppeveld
Identification of Potential Recombination Breakpoints in Human Parechoviruses
J. Virol., April 1, 2009; 83(7): 3379 - 3383.
[Abstract] [Full Text] [PDF]


Home page
Nucleic Acids ResHome page
D. A. MacDuff, Z. L. Demorest, and R. S. Harris
AID can restrict L1 retrotransposition suggesting a dual role in innate and adaptive immunity
Nucleic Acids Res., April 1, 2009; 37(6): 1854 - 1867.
[Abstract] [Full Text] [PDF]


Home page
Mol Biol EvolHome page
M. Anisimova and C. Kosiol
Investigating Protein-Coding Sequence Evolution with Probabilistic Codon Substitution Models
Mol. Biol. Evol., February 1, 2009; 26(2): 255 - 271.
[Abstract] [Full Text] [PDF]


Home page
Proc. Natl. Acad. Sci. USAHome page
B. Joos, M. Fischer, H. Kuster, S. K. Pillai, J. K. Wong, J. Boni, B. Hirschel, R. Weber, A. Trkola, H. F. Gunthard, et al.
HIV rebounds from latently infected cells, rather than from continuing low-level replication
PNAS, October 28, 2008; 105(43): 16725 - 16730.
[Abstract] [Full Text] [PDF]


Home page
J. Virol.Home page
G. Palacios, J. Hui, P. L. Quan, A. Kalkstein, K. S. Honkavuori, A. V. Bussetti, S. Conlan, J. Evans, Y. P. Chen, D. vanEngelsdorp, et al.
Genetic Analysis of Israel Acute Paralysis Virus: Distinct Clusters Are Circulating in the United States
J. Virol., July 1, 2008; 82(13): 6209 - 6217.
[Abstract] [Full Text] [PDF]


Home page
J. Virol.Home page
C.-C. Hon, T.-Y. Lam, Z.-L. Shi, A. J. Drummond, C.-W. Yip, F. Zeng, P.-Y. Lam, and F. C.-C. Leung
Evidence of the Recombinant Origin of a Bat Severe Acute Respiratory Syndrome (SARS)-Like Coronavirus and Its Implications on the Direct Ancestor of SARS Coronavirus
J. Virol., February 15, 2008; 82(4): 1819 - 1826.
[Abstract] [Full Text] [PDF]


Home page
J. Virol.Home page
C. M. Noviello, S. L. K. Pond, M. J. Lewis, D. D. Richman, S. K. Pillai, O. O. Yang, S. J. Little, D. M. Smith, and J. C. Guatelli
Maintenance of Nef-Mediated Modulation of Major Histocompatibility Complex Class I and CD4 after Sexual Transmission of Human Immunodeficiency Virus Type 1
J. Virol., May 1, 2007; 81(9): 4776 - 4786.
[Abstract] [Full Text] [PDF]


This Article
Right arrow Abstract Freely available
Right arrow FREE Full Text (Print PDF) Freely available
Right arrow All Versions of this Article:
22/24/3096    most recent
btl474v1
Right arrow Comments: Submit a response
Right arrow Alert me when this article is cited
Right arrow Alert me when Comments are posted
Right arrow Alert me if a correction is posted
Services
Right arrow Email this article to a friend
Right arrow Similar articles in this journal
Right arrow Similar articles in PubMed
Right arrow Alert me to new issues of the journal
Right arrow Add to My Personal Archive
Right arrow Download to citation manager
Right arrowRequest Permissions
Google Scholar
Right arrow Articles by Kosakovsky Pond, S. L.
Right arrow Articles by Frost, S. D.W.
Right arrow Search for Related Content
PubMed
Right arrow PubMed Citation
Right arrow Articles by Kosakovsky Pond, S. L.
Right arrow Articles by Frost, S. D.W.
Social Bookmarking
 Add to CiteULike   Add to Connotea   Add to Del.icio.us  
What's this?