Skip Navigation


Bioinformatics Advance Access originally published online on July 14, 2006
Bioinformatics 2006 22(18):2313-2314; doi:10.1093/bioinformatics/btl387
This Article
Right arrow Abstract Freely available
Right arrow FREE Full Text (Print PDF) Freely available
Right arrow All Versions of this Article:
22/18/2313    most recent
btl387v1
Right arrow Comments: Submit a response
Right arrow Alert me when this article is cited
Right arrow Alert me when Comments are posted
Right arrow Alert me if a correction is posted
Services
Right arrow Email this article to a friend
Right arrow Similar articles in this journal
Right arrow Similar articles in ISI Web of Science
Right arrow Similar articles in PubMed
Right arrow Alert me to new issues of the journal
Right arrow Add to My Personal Archive
Right arrow Download to citation manager
Right arrow Search for citing articles in:
ISI Web of Science (1)
Right arrowRequest Permissions
Google Scholar
Right arrow Articles by Buendia, P.
Right arrow Articles by Narasimhan, G.
Right arrow Search for Related Content
PubMed
Right arrow PubMed Citation
Right arrow Articles by Buendia, P.
Right arrow Articles by Narasimhan, G.
Social Bookmarking
 Add to CiteULike   Add to Connotea   Add to Del.icio.us  
What's this?

© The Author 2006. Published by Oxford University Press. All rights reserved. For Permissions, please email: journals.permissions@oxfordjournals.org

Serial NetEvolve: a flexible utility for generating serially-sampled sequences along a tree or recombinant network

Patricia Buendia and Giri Narasimhan *

1 Bioinformatics Research Group (BioRG), School of Computing and Information Science, Florida International University Miami, FL 33199, USA

*To whom correspondence should be addressed.


    ABSTRACT
 TOP
 ABSTRACT
 1 INTRODUCTION
 2 PROGRAM FEATURES
 3 DISCUSSION AND CONCLUSION
 REFERENCES
 

Summary: Serial NetEvolve is a flexible simulation program that generates DNA sequences evolved along a tree or recombinant network. It offers a user-friendly Windows graphical interface and a Windows or Linux simulator with a diverse selection of parameters to control the evolutionary model. Serial NetEvolve is a modification of the Treevolve program with the following additional features: simulation of serially-sampled data, the choice of either a clock-like or a variable rate model of sequence evolution, sampling from the internal nodes and the output of the randomly generated tree or network in our newly proposed NeTwick format.

Availability: From website http://biorg.cis.fiu.edu/SNE

Contacts: giri{at}cis.fiu.edu

Supplementary information: Manual and examples available from http://biorg.cis.fiu.edu/SNE


    1 INTRODUCTION
 TOP
 ABSTRACT
 1 INTRODUCTION
 2 PROGRAM FEATURES
 3 DISCUSSION AND CONCLUSION
 REFERENCES
 
There has been considerable recent interest in understanding recombination and developing new methods for recombination detection and phylogenetic network reconstruction (Martin et al., 2005; Nakhleh et al., 2003). Another area that has received increased attention is the analysis of serially-sampled sequence data derived from viruses sampled from a single infected patient (Drummond and Rodrigo, 2000; Nickle et al., 2003). The simulation program, SeqGen (Rambaut and Grassly, 1997), assists in the evaluation of phylogenetic software by generating synthetic sequences evolved along a user specified tree. For a specified set of parameters, Treevolve (Grassly et al., 1999) generates a coalescent tree (Kingman, 1982) or recombinant network (Hudson, 1983) and evolves a set of sequences along that structure. Neither supports serial samples. Moreover, although Treevolve does not require a user specified topology, it does not output the topology it generates. Here we present Serial NetEvolve, a modification of Treevolve, which generates serially-sampled sequences along a randomly generated reticulate network. It also provides the network topology, which may be used in recombination detection programs. The option to generate a recombinant network is not featured in the recently published Serial SIMCOAL (Anderson et al., 2005) and in the earlier Serial Coalescent Simulator (Drummond and Strimmer, 2001). Serial NetEvolve offers a flexible set of population parameters (migration rate, population growth rate, among others) present in the original Treevolve and additional options for the generation of both trees and recombinant networks.


    2 PROGRAM FEATURES
 TOP
 ABSTRACT
 1 INTRODUCTION
 2 PROGRAM FEATURES
 3 DISCUSSION AND CONCLUSION
 REFERENCES
 
The new features of Serial NetEvolve are described in detail below. Additional details on how to use the features of Serial NetEvolve may be found at the software website mentioned above. The program GUI was implemented in Visual Basic 6 and the simulator was written in ANSI C. Source code and executables are available for Windows 2000/XP and Linux.

2.1 Serially-sampled sequences
A theoretical framework for generating serial samples in the coalescent was developed (Rodrigo and Felsenstein, 1999) and implemented in Java in Serial Coalescent Simulator (Drummond and Strimmer, 2001). In Serial NetEvolve, we implemented the technique by modifying Treevolve to have sequences assigned to different time points instead of assigning them to the zero-time baseline.

2.2 Molecular clock
Distance to the root and time are equivalent when enforcing a molecular clock. To simulate datasets with a constant rate of evolution (enforced molecular clock), sequences from the same sampling time were assigned to the same distance from the root, as implemented in Serial Coalescent Simulator. For the variable rates setting, all sequences (independent of the sampling time they belonged to) were assigned to randomly generated distances from the root. For consistency, distances of the nodes were constrained to decrease as one traverses towards the root.

2.3 Internal node sampling
Serial NetEvolve allows the user to set the probability of internal node sampling as a parameter. Sampling of internal nodes makes sense only if the effective population is small (Drummond and Rodrigo, 2000), which in turn is more probable when the length of the sequences sampled is small. It is worth noting that fairly short sequences [~700 bp in Shankarappa et al. (1999)] were used in many large studies of viral populations, and that effective population size for serially-sampled HIV-1 was estimated to be only 4232.2 and negatively correlated to the evolutionary rate (Seo et al., 2002).

2.4 Tree or network output
Treevolve was modified to output the coalescent tree or network. When the recombination rate is zero the tree is output to a file in Newick format, which represents trees using nested parentheses (Felsenstein 1999, http://evolution.genetics.washington.edu/phylip/newicktree.html). If internal nodes are sampled, they are assigned to zero-length branches. In order to write a recombinant network to a file, we devised the ‘NeTwick’ format, a variant of the Newick format, incorporating additional information (breakpoint position, right and left parent) to represent recombinant nodes. Unlike tree nodes, recombinant nodes have more than one parental node. In Serial NetEvolve, we (arbitrarily) chose the left parental node of a recombinant sequence to appear twice in the NeTwick format to indicate the linking relationship. One of the copies of the left parental node appears followed by the symbol ‘#’, along with the breakpoint position and it represents a link, not a taxon. If the left parent was not sampled, it also appears with a ‘~’ prefix. Figure 1 shows a network with nine taxa and its tree equivalent. The left parent X was not sampled (indicated by the ~), but is present in the tree to indicate the linking relationship as shown in the network. In the proposed network representation a backward (forward, respectively) slash followed by the breakpoint number indicates whether the left parent is below (above, respectively) in a horizontally drawn network. The advantage of making two copies of the left parental node of every recombinant node is that the network can then be represented by an equivalent tree. The tree can be viewed using any tree-viewing program; a network viewer is currently being developed. NeTwick supports tree and network polytomies, but is restricted to one child per recombinant parent.


Figure 1
View larger version (9K):
[in this window]
[in a new window]
[Download PowerPoint slide]
 
Fig. 1 The (a) tree representation and the (b) network representation for a network represented by the NeTwick code: ((A:4,C:1):5,((((B:2,~X#234:0):4,((E:3,(~X:1,D:3):1):1,(F:2,G:3):4):1):1,C#535:0):2,H:1):4).

 

    3 DISCUSSION AND CONCLUSION
 TOP
 ABSTRACT
 1 INTRODUCTION
 2 PROGRAM FEATURES
 3 DISCUSSION AND CONCLUSION
 REFERENCES
 
The power of Serial NetEvolve lies in the ease with which it is possible to generate a collection of datasets using a wide range of parameters with the goal of comparing different programs that analyze any aspect of serially-sampled sequence data. For instance, it is possible to compare with relative ease the topologies of the output of several programs. An example of such an experiment can be found in the supplemental website. Serial NetEvolve differs from the majority of simulation methods in that it incorporates both serial sampling and recombination along with additional features (heterogeneous evolution rate, sampling of internal nodes), while at the same time, maintaining the population parameters from Treevolve (migration rate, population growth rate, among others). Analysis of serially-sampled data has applications in the study of the evolution of fast-evolving DNA and RNA viruses, for which it is often the case that recombination is an integral part of their life cycle. The recombination rate in HIV-1 is one of the highest among all organisms (Rambaut et al., 2004). Thus coalescent simulators for serial samples need to incorporate a recombination feature in their sequence generation, as is the case with Serial NetEvolve. Finally, the option to output the tree or network to a file is critical for automated evaluations of the resulting tree or network. Note that network topologies can now be compared using a measure proposed by Nakhleh et al. (2003), who extended the well-known Robinson–Foulds (RF) score (Robinson and Foulds, 1981) for tree topologies.


    Acknowledgments
 
P.B. was supported by MBRS-RISE Fellowship (NIH/NIGMS R25GM61347). G.N. was supported in part by NIH Grant P01 DA15027-01. Conflict of Interest: none declared.


    FOOTNOTES
 
Associate Editor: Keith A Crandall

Received on May 1, 2006; revised on July 5, 2006; accepted on July 6, 2006

    REFERENCES
 TOP
 ABSTRACT
 1 INTRODUCTION
 2 PROGRAM FEATURES
 3 DISCUSSION AND CONCLUSION
 REFERENCES
 

    Anderson, C.N.K., et al. (2005) Serial SIMCOAL: a population genetics model for data from multiple populations and points in time. Bioinformatics, 21, 1733–1734[Abstract/Free Full Text].

    Drummond, A. and Rodrigo, A.G. (2000) Reconstructing genealogies of serial samples under the assumption of a molecular clock using serial-sample UPGMA (sUPGMA). Mol. Biol. Evol, . 17, 1807–1815[Abstract/Free Full Text].

    Drummond, A. and Strimmer, K. (2001) PAL: An object-oriented programming library for molecular evolution and phylogenetics. Bioinformatics, 17, 662–663[Abstract/Free Full Text].

    Felsenstein, J. (1999) The Newick tree format.

    Grassly, N., et al. (1999) Population dynamics of HIV-1 inferred from gene sequences. Genetics, 151, 427–438[Abstract/Free Full Text].

    Hudson, R.R. (1983) Properties of a neutral allele model with intragenic recombination. Theor. Popul. Biol, . 23, 183–201[CrossRef][Web of Science][Medline].

    Kingman, J.F.C. (1982) The coalescent. Stochastic Process. Appl, . 13, 235–248[CrossRef].

    Martin, D.P., et al. (2005) RDP2: recombination detection and analysis from sequence alignments. Bioinformatics, 21, 260–262[Abstract/Free Full Text].

    Nakhleh, L., et al. (2003) Towards the development of computational tools for evaluating phylogenetic network reconstruction methods. Pac. Symp. Biocomput, . 315–326.

    Nickle, D.C., et al. (2003) Evolutionary indicators of Human Immunodeficiency Virus type 1 reservoirs and compartments. J. Virol, . 77, 5540–5546[Abstract/Free Full Text].

    Rambaut, A. and Grassly, N. (1997) Seq-Gen: an application for the Monte Carlo simulation of DNA sequence evolution along phylogenetic trees. Comput. Appl. Biosci, . 13, 235–238[Abstract/Free Full Text].

    Rambaut, A., et al. (2004) The causes and consequences of HIV evolution. Nat. Rev. Genet, . 5, 52–61[CrossRef][Web of Science][Medline].

    Robinson, D.F. and Foulds, L.R. (1981) Comparison of phylogenetic trees. Math. Biosci, . 53, 131–147[CrossRef][Web of Science].

    Rodrigo, A. and Felsenstein, J. (1999) Coalescent approaches to HIV-1 population genetics. In Crandall, K. (Ed.). Molecular Evolution of HIV, , MD Johns Hopkins University Press, pp. 233–272.

    Seo, T.-K., et al. (2002) Estimation of effective population size of HIV-1 within a host: A pseudomaximum-likelihood approach. Genetics, 1283–1293.

    Shankarappa, R., et al. (1999) Consistent viral evolutionary changes associated with the progression of HIV 1 infection. J. Virol, . 73, 10489–10502[Abstract/Free Full Text].


Add to CiteULike CiteULike   Add to Connotea Connotea   Add to Del.icio.us Del.icio.us    What's this?


This article has been cited by other articles:


Home page
BioinformaticsHome page
P. Buendia and G. Narasimhan
Sliding MinPD: building evolutionary networks of serial samples via an automated recombination detection approach
Bioinformatics, November 15, 2007; 23(22): 2993 - 3000.
[Abstract] [Full Text] [PDF]


This Article
Right arrow Abstract Freely available
Right arrow FREE Full Text (Print PDF) Freely available
Right arrow All Versions of this Article:
22/18/2313    most recent
btl387v1
Right arrow Comments: Submit a response
Right arrow Alert me when this article is cited
Right arrow Alert me when Comments are posted
Right arrow Alert me if a correction is posted
Services
Right arrow Email this article to a friend
Right arrow Similar articles in this journal
Right arrow Similar articles in ISI Web of Science
Right arrow Similar articles in PubMed
Right arrow Alert me to new issues of the journal
Right arrow Add to My Personal Archive
Right arrow Download to citation manager
Right arrow Search for citing articles in:
ISI Web of Science (1)
Right arrowRequest Permissions
Google Scholar
Right arrow Articles by Buendia, P.
Right arrow Articles by Narasimhan, G.
Right arrow Search for Related Content
PubMed
Right arrow PubMed Citation
Right arrow Articles by Buendia, P.
Right arrow Articles by Narasimhan, G.
Social Bookmarking
 Add to CiteULike   Add to Connotea   Add to Del.icio.us  
What's this?