Skip Navigation


Bioinformatics Advance Access originally published online on December 6, 2005
Bioinformatics 2006 22(6):716-722; doi:10.1093/bioinformatics/bti812
This Article
Right arrow Abstract Freely available
Right arrow FREE Full Text (Print PDF) Freely available
Right arrowOA All Versions of this Article:
22/6/716    most recent
bti812v1
Right arrow Alert me when this article is cited
Right arrow Alert me if a correction is posted
Services
Right arrow Email this article to a friend
Right arrow Similar articles in this journal
Right arrow Similar articles in ISI Web of Science
Right arrow Similar articles in PubMed
Right arrow Alert me to new issues of the journal
Right arrow Add to My Personal Archive
Right arrow Download to citation manager
Right arrow Search for citing articles in:
ISI Web of Science (4)
Google Scholar
Right arrow Articles by Gesell, T.
Right arrow Articles by von Haeseler, A.
Right arrow Search for Related Content
PubMed
Right arrow PubMed Citation
Right arrow Articles by Gesell, T.
Right arrow Articles by von Haeseler, A.
Social Bookmarking
 Add to CiteULike   Add to Connotea   Add to Del.icio.us  
What's this?

© The Author 2005. Published by Oxford University Press. All rights reserved. For Permissions, please email: journals.permissions@oxfordjournals.org
The online version of this article has been published under an open access model. Users are entitled to use, reproduce, disseminate, or display the open access version of this article for non-commercial purposes provided that: the original authorship is properly and fully attributed; the Journal and Oxford University Press are attributed as the original place of publication with the correct citation details given; if an article is subsequently reproduced or disseminated not in its entirety but only in part or as a derivative work this must be clearly indicated. For commercial re-use, please contact journals.permissions@oxfordjournals.org

In silico sequence evolution with site-specific interactions along phylogenetic trees

Tanja Gesell 1 and Arndt von Haeseler 2,3,4,5,*

1Heinrich-Heine University Duesseldorf, Universitaetsstrasse 1 40225 Duesseldorf, Germany
2Center for Integrative Bioinformatics Vienna, Max F. Perutz Laboratories Dr. Bohr-Gasse 9, A-1030 Vienna, Austria
3University of Vienna Austria
4Medical University of Vienna Austria
5University of Veterinary Medicine Vienna Austria

*To whom correspondence should be addressed.


    ABSTRACT
 TOP
 ABSTRACT
 INTRODUCTION
 THE NEIGHBOURHOOD SYSTEM OF...
 AN EVOLUTIONARY MODEL INCLUDING...
 SIMULATIONS
 RESULTS
 DISCUSSION
 REFERENCES
 

Motivation: A biological sequence usually has many sites whose evolution depends on other positions of the sequence, but this is not accounted for by commonly used models of sequence evolution. Here we introduce a Markov model of nucleotide sequence evolution in which the instantaneous substitution rate at a site depends on the states of other sites. Based on the concept of neighbourhood systems, our model represents a universal description of arbitrarily complex dependencies among sites.

Results: We show how to define complex models for some illustrative examples and demonstrate that our method provides a versatile resource for simulations of sequence evolution with site-specific interactions along a tree. For example, we are able to simulate the evolution of RNA taking into account both secondary structure as well as pseudoknots and other tertiary interactions. To this end, we have developed a program Simulating Site-Specific Interactions (SISSI) that simulates evolution of a nucleotide sequence along a phylogenetic tree incorporating user defined site-specific interactions. Furthermore, our method allows to simulate more complex interactions among nucleotide and other character based sequences.

Availability: We implemented our method in an ANSI C program SISSI which runs on UNIX/Linux, Windows and Mac OS systems, including Mac OS X. SISSI is available at http://www.bi.uni-duesseldorf.de/software/sissi/

Contact: sissi{at}cs.uni-duesseldorf.de


    INTRODUCTION
 TOP
 ABSTRACT
 INTRODUCTION
 THE NEIGHBOURHOOD SYSTEM OF...
 AN EVOLUTIONARY MODEL INCLUDING...
 SIMULATIONS
 RESULTS
 DISCUSSION
 REFERENCES
 
Evolutionary analysis of biological sequences typically assumes that sites are evolving independently of each other (cf. Tavaré, 1986). However, this simplifying assumption does not hold true generally. Thus, in recent years evolutionary models have been suggested to remedy this unsatisfactory situation. Markov models that take the base-pairings in stem regions of RNA molecules into account were among the first to model the process of evolution more realistically (Schöniger and von Haeseler, 1994; Tillier, 1994; Muse, 1995; Rzhetsky, 1995; Tillier and Collins, 1998; Savill et al., 2000; Smith et al., 2004). Models to detect protein sites with correlated patterns of evolution have also been proposed (e.g. Pollock et al., 1999). Furthermore, models including selection against CpG-dinucleotides were studied as an example of overlapping context dependencies (Jensen and Pedersen, 2000; Arndt et al., 2003; Siepel and Haussler, 2004). More special models with overlapping reading frames (Pedersen and Jensen, 2001) and with focus on protein structure, were also suggested (Robinson et al., 2003). Recently, irreversible complex models with overlapping neighbouring nucleotide pairs were developed (Lunter and Hein, 2004) and Pedersen et al. (2004) modeled the substitution process in protein-coding regions with embedded conserved RNA structures.

Modeling the evolution of a collection of homologous sequences is not simply an academic gimmick. A profound knowledge of how sequences evolve can help us to improve the reconstruction of phylogenetic trees based on sequences. However, the true mode of sequence evolution between homologous sequences is unknown with few exceptions. Therefore simulating sequence evolution is helpful to investigate the performance of tree-building methods (Huelsenbeck, 1995). So far different programs have been designed to simulate nucleotide sequences and protein sequences along a tree (Schöniger and von Haeseler, 1995; Rambaut and Grassly, 1997; Grassly et al., 1997; Yang, 1997; Hudelot et al., 2003; Tufféry, 2002; Kosakovsky Pond et al., 2005). One of the most widely used programs Seq-Gen (Rambaut and Grassly, 1997) has implemented a wide range of independent nucleotide substitution models (e.g. Jukes and Cantor, 1969; Kimura, 1980; Felsenstein, 1981; Hasegawa et al., 1985). The PHASE package (Hudelot et al., 2003) implements base-paired substitution models, but is specially designed for RNA sequences with secondary structure. To the best of our knowledge a general sequence simulation program including site-specific interactions based on a well defined neighbourhood system does not exist.

While progress has been made in devising new models for inference of sequences with dependencies among sites, there is a lack of simulation models that would allow the assessment of this progress. Especially if one wants to assess the robustness of phylogenetic inference, the models used for simulation need to be more accurate and complex descriptions of nature than those used for inference. Furthermore, simulations taking into account site-specific interactions that evolved along a phylogeny are of value in themselves, because they contribute to our understanding of the intertwined relationship between structure and substitution process. The use of supervised sequence evolution (i.e. simulations) allows us to control and study the extent of structural and sequence conservation.

In the following section we introduce the representation of site-specific interactions using a neighbourhood system. It allows for a universal description of arbitrarily complex dependencies among sites. We give some simple examples of how to apply neighbourhood systems to define various structural elements in RNA sequences. Then, we define a site and neighbourhood-dependent substitution process that permits a versatile description of the various evolutionary forces acting on single sites. We show that our method is useful to simulate the evolution e.g. of RNA sequences and structure simultaneously since it can take into account both, base-pairing counterparts as well as further interactions between nucleotides in sequences.


    THE NEIGHBOURHOOD SYSTEM OF A SEQUENCE
 TOP
 ABSTRACT
 INTRODUCTION
 THE NEIGHBOURHOOD SYSTEM OF...
 AN EVOLUTIONARY MODEL INCLUDING...
 SIMULATIONS
 RESULTS
 DISCUSSION
 REFERENCES
 
In the following, we describe for each site k = 1, ... , l in a (nucleotide) sequence x = (x1, ... , xl) the interaction of k with other sites in x. To this end, we introduce the neighbourhood system Formula = (Nk)k=1,2, ... , l such that:

  1. Nk sub {1, ..., l}, k {notin} Nk for each k
  2. if i isin Nk then k isin Ni for each i, k.
Nk contains all sites that interact with site k. With nk we denote the cardinality of Nk, i.e. the number of sites that interact with k. The sites {1, ... , l} of the sequence together with the neighbourhood sytem Formula can be visualized as a graph, with vertices Formula = {1, ..., l} and edge set E = {(k, i) | 1 ≤ k ≤ l, i isin Nk}. One visualization of this graph are circle plots, where the vertices are arranged in a circle and the edges connect two vertices inside the circle. Using the notation of a neighbourhood system it is easily possible to encode various secondary and tertiary structural elements in a unifying framework.

Figure 1 illustrates some well known RNA structures together with the corresponding neighbourhoods in the circle plot. A stem region of an RNA molecule is encoded by a neighbourhood system, with nk = 1 for sites in a stem and nk = 0 for sites in a loop, where i isin Nk is the site that base-pairs with k (Fig. 1a). Similarly, we can encode a pseudoknot, again nk is either zero or one. Here, the resulting circle plot shows intersecting edges (Fig. 1b). One can proceed to model even more elaborate interactions. Figure 1c displays the neighbourhood system that results if interactions owing to base-stacking are incorporated in our model. The corresponding circle graph displays many intersecting edges and takes into account overlapping dependencies. For example, site 11 is inter alia an element of the neighbourhoods N10 and N12, while site 10 is not in N12 and vice versa. Figure 2 finally displays the interactions deduced from a ribozyme domain (Cate et al., 1996), where site 153 interacts with sites 150, 223 and 250. The described interactions are crucial for the integrity of the molecule and should be taken into account when modeling the process of evolution.


Figure 1
View larger version (25K):
[in this window]
[in a new window]
 
Fig. 1 Three examples show how the neighbourhood system may be used to encode various structural elements in an RNA sequence. Left: schematic representations; middle: neighbourhood system notation; right: circle plots, useful to display complex features of molecules. (Sites are written in the circumference of a circle and interacting sites are connected by chords.) (a) Typical examples for interacting sites are base pairs in RNA stems. (b) Pseudoknots show intersecting edges in circle plots. (c) To take base stacking in RNA stems into account a lot of overlapping dependencies must be considered.

 

Figure 2
View larger version (8K):
[in this window]
[in a new window]
 
Fig. 2 Example of a sequence x with overlapping dependencies on site 153. Such dependencies occur e.g. in ribozyme domains (Cate et al., 1996). The substitution rate for the whole sequence q(x) is the sum of the rates of each site q(k) = Qk(sk, sk). The mononucleotide instantaneous substitution rate depends on the states of the neighbourhood system of this site at the instant of the substitution, described in the instantaneous rate matrix Qk. The dimension of Qk depends on the number of neighbours nk at this site k.

 

    AN EVOLUTIONARY MODEL INCLUDING NEIGHBOURHOODS
 TOP
 ABSTRACT
 INTRODUCTION
 THE NEIGHBOURHOOD SYSTEM OF...
 AN EVOLUTIONARY MODEL INCLUDING...
 SIMULATIONS
 RESULTS
 DISCUSSION
 REFERENCES
 
In the previous section, we have introduced a tool to succinctly summarize interactions among sites in a (nucleotide) sequence. In the following, we need to superimpose an evolutionary dynamics which acts on the sites of a sequence and which takes into account these interactions.

Hence, we define a substitution process for every site k, where the substitution of a given nucleotide xk by another one depends on the states Formula of the sites Formula. To be more formal, we introduce at each site k a site-specific rate matrix Qk. Thus Q = {Qk | k = 1, ... , l} constitutes a collection of possibly different substitutions models acting on the sequence and an annotation of correlations among sites. Contrary to standard models which assume independent evolution of the sites, Qk has dimensions Formula, where Formula is the size of the alphabet. For the examples discussed here Formula, or Formula, hence Formula. Thus if nk = 0, Qk can be defined as one of the usual rate matrices on Formula, i.e. we may assume a Jukes–Cantor matrix, a Hasegawa–Kishino-Matrix or another independent model (Jukes and Cantor, 1969; Kimura, 1980; Felsenstein, 1981; Hasegawa et al., 1985). If nk > 0, then Qk acts on subsequences of length nk + 1. We impose the usual restriction, that only one substitution per unit time is admissible (Schöniger and von Haeseler, 1994, 1995). Moreover, in the matrix Qk a substitution is only possible at site k. This restriction leads to sparse rate matrices. Let Formula represent the actual subsequence of sequence x, where Formula and let Formula denote an arbitrary sequence of the same length. To avoid notational confusion, we assume Formula. With ({pi}k(y)) we denote the stationary distribution of rate matrix Qk. Because only site k is allowed to vary, the entries of Qk are given by

Formula 1(1)
where the Hamming distance H(sk, y) counts the number of differences between the sites of the subsequence sk and y. In other words, an element of Qk is greater than zero if the last nk sites in sequence sk and y are pairwise identical. Thus, Qk has Formula 1 non-zero entries.

We scale Qk such that the number of substitutions dk equals 1:

Formula 2(2)
The rate matrix Qk defined by Equation (1) defines the ‘strength’ of interactions among sites in a neighbourhood by the frequencies of subsequences Formula 2. Further generalizations are possible and we will discuss them later. The framework outlined here allows a rate matrix for each site in the sequence. To complete the discussion of the evolutionary process, we define the total instantaneous substitution rate for x as

Formula 3(3)
where sk is the subsequence of x induced by Nk. Thus, if a nucleotide in x is substituted the instantaneous rate may change. The new rate can be computed easily.

To illustrate the notation of Qk, we continue with the examples from Figure 1. Figure 1a displays the neighbourhood system for a stem region that mimics the doublet model for base-paired sites i and k, with Ni = {k} and Nk = {i}. Table 1 displays a possible rate matrix acting on site k while taking into account the state at site i. Note that this matrix is reversible and has 15 parameters. Accordingly, we may define a similar rate matrix for site i (Table 2). If {pi}{alpha}ß = {pi}ß{alpha} for {alpha}, ß isin {A,C,G,U}, then both matrices have identical entries with nine free parameters. If the matrices in Tables 1 and 2 are applied to all sites in a stem, this gives the evolutionary process defined by Schöniger and von Haeseler (1994). The matrices can be summarized in a more condensed form. With Formula 3 we denote a sequence of length nk and (Formula 3) represents the current subsequence in x as induced by Nk. The admissible substitutions for one site are written in the following submatrix:


Formula 4

(4)
For example the 16 x 16 doublet model (Tables 1 and 2) is defined by four matrices of this type (Table 3). Generally, after normalisation we can divide the Formula 3 instantaneous rate matrix in Formula 3 submatrices of the type as illustrated in (4).


View this table:
[in this window]
[in a new window]
 
Table 1 One example for an instantaneous rate matrix Qk of a site k with nk = 1

 

View this table:
[in this window]
[in a new window]
 
Table 2 Corresponding rate matrix Qk for site i. Here only substitutions on the current site i (bold) are allowed

 

View this table:
[in this window]
[in a new window]
 
Table 3 The condensed matrix form (nk = 1) is defined by four submatrices (A,C,G,U) of the type (4) as introduced in the text

 
The reader may notice that our definition of a rate matrix is not limited to F81 types of substitution matrices (Felsenstein, 1981). The submatrix (4) can be extended to any type of rate matrix e.g. by introducing specific substitution rates. However, for the time being, we think that the frequencies of subsequences provide a reasonably good description of interactions among sites.


    SIMULATIONS
 TOP
 ABSTRACT
 INTRODUCTION
 THE NEIGHBOURHOOD SYSTEM OF...
 AN EVOLUTIONARY MODEL INCLUDING...
 SIMULATIONS
 RESULTS
 DISCUSSION
 REFERENCES
 
In the following, a neighbourhood system Formula 3 and a collection of site-specific rate matrices Q = {Qk | k = 1, ..., l} are defined. We start at mutational time 0 with sequence x(0) which evolves according to (Formula 3, Q) and we want to generate a sequence x(d) after d expected substitutions. To simulate this procedure we adopt algorithm 1 of von Haeseler and Schöniger (1998). At time d = 0 the instantaneous substitution rate equals q(x) [Equation (3)]. We draw a random time dr from an exponential distribution with parameter q(x). If dr < d then a substitution takes place in x. We pick a site k with probability

Formula 5(5)
the relative mutability at that site. For a chosen site k, the nucleotide xk will be replaced by a new nucleotide y0 with probability

Formula 6(6)
Subsequently, the actual time is updated to d <- ddr and q(x) is recomputed based on the new sequence and the simulation continues. This procedure is summarized in the following pseudo code:

Finally, this procedure is applied recursively through a given rooted or unrooted tree topology, where the branch lengths are specified by the expected number of substitutions. This method is implemented in the program Simulating Site-Specific Interactions (SISSI).


    RESULTS
 TOP
 ABSTRACT
 INTRODUCTION
 THE NEIGHBOURHOOD SYSTEM OF...
 AN EVOLUTIONARY MODEL INCLUDING...
 SIMULATIONS
 RESULTS
 DISCUSSION
 REFERENCES
 
Simulations employing a neighbourhood system, incorporating artificial or known structural features, were run on an ordinary PC. Although the models are more complex than the well known independent models the computing time for the simulations is satisfactory. If nk = l – 1 for all k = 1, ··· l then the run time increases quadratically with sequence length l. Run time increases linearly as a function of the number of taxa or the total branch length of the tree.

As an illustrative example, we used the neighbourhood system of RNase P from Bacillus subtilis with 401 sites (Fig. 3) taken from the RNase P database (Brown, 1999). To specify the rate matrices we used the frequencies of the nucleotides {A, C, G, U} for sites evolving independently (nk = 0) and the doublet frequencies for sites with nk = 1 from one corresponding sequence in the database (Table 4). We used only these two matrices in the simulations. In the sequence 41.15% of sites evolve independently and 58.85% evolve under dependencies. Having specified (Formula 6, Q) it took on average 1 s to simulate a dataset of 100 sequences along a tree with mean branch length 0.3.


Figure 3
View larger version (48K):
[in this window]
[in a new window]
 
Fig. 3 This circle plot illustrates known structure features of the RNase P of B.subtilis with 401 sites according to the RNase P database (Brown, 1999).

 

View this table:
[in this window]
[in a new window]
 
Table 4 Counted doublet frequencies Formula 6 and single frequencies Formula 6 from a RNase P sequence of B. subtilis (Accessionsnumber: M13175) taken from the RNase P database (Brown, 1999)

 
Figure 4 shows the accumulation of observed sequence differences per site as the number of substitutions per site increases. The curve shows the expected saturation behaviour as d goes to infinity. The simulated curve lies between the theoretical curves we obtain for the F81 model and the doublet model.


Figure 4
View larger version (9K):
[in this window]
[in a new window]
 
Fig. 4 Relationship between number of substitutions per site d and number of observed differences per site h. Lines: Analytically calculated with the frequencies in Table 4. Upper line: Only sites with nk = 0 (F81); lower line: Only sites with nk = 1 (doublet model of Schöniger and von Haeseler, 1994); middle line: 41.15% sites with nk = 0 and 58.85% sites with nk = 1. Circles: Mean and standard deviation for number of substitutions and the corresponding differences under 1000 simulations with SISSI, the neighbourhood system of RNase P of B.subtilis (Fig. 3), the frequencies in Table 4 and the expected number of substitutions 0.01, 0.5, 1, 2, 3, 4 and 5 (x-axis).

 
In real data we do not know the expected number of substitutions. The different speeds of accumulation of observed differences have a great impact on estimation of the number of substitutions. With our method we can investigate the relationship between the numbers of substitutions per site and the number of observed differences simultaneously for different neighbourhood systems with varying complexity.

For non-overlapping sites in the neighbourhood system and small number of neighbours nk, it is possible to calculate the number of substitutions and the number of observed differences analytically (von Haeseler and Schöniger, 1998). Our simulation results agree with the expected numbers derived from appropriately weighting the expected numbers of observed differences for independent and dependent sites of the neighbourhoud system of the RNase P of B.subtilis (Fig. 4).

However, as indicated in Figures 1c and 2 more complex neighbourhoods can be simulated, where we allow for overlapping neighbourhoods. The example given here simply illustrates the basic principle. To introduce more realistic applications is beyond the scope of the paper.


    DISCUSSION
 TOP
 ABSTRACT
 INTRODUCTION
 THE NEIGHBOURHOOD SYSTEM OF...
 AN EVOLUTIONARY MODEL INCLUDING...
 SIMULATIONS
 RESULTS
 DISCUSSION
 REFERENCES
 
In this paper, we have introduced a general framework to take site-specific interactions into account. We can mimic sequence evolution with various complex dependencies among sites. The basic idea is the application of different substitution matrices for each site defined by the interactions with other sites in the sequence. Our implementation, SISSI, allows the evolution of nucleotide sequences along a tree for user defined systems of neighbourhoods and instantaneous rate matrices.

Simulations have shown that SISSI produces sequences under constrained evolution in reasonable time. While for simulations with independent sites Seq-Gen (Rambaut and Grassly, 1997) should be used, because it is more time efficient, we have shown that the runtime is not really an issue for our general approach. Thus it should be possible to generate large simulated datasets that may be used to analyse the reliability of tree reconstruction methods under deviations from the independent site assumptions.

Although we discussed Felsenstein F81 types of rate matrices as an example, SISSI is principally not limited to this type of substitution process. It is for example easily possible to include a transition-transversion parameter. SISSI also allows the inclusion of rate heterogeneity or codon position-specific heterogeneity (Yang, 1993).

Models with site-specific rate matrices have also been studied, as well as various mixture models (e.g. Koshi and Goldstein, 1995; Bruno, 1996; Thorne et al., 1996; Koshi and Goldstein, 1997; Halpern and Bruno, 1998; Goldman et al., 1998; Lartillot and Philippe, 2004; Pagel and Meade, 2004). Recently, a number of models of protein evolution have been developed to account for protein structure by accepting randomly generated mutations if they do not affect the structure too much (e.g. Parisi and Echave, 2001, 2005). With our method, allowing the specification of site-specific rate matrices or different rate matrices for different regions of the simulated sequence is straightforward. Moreover, our framework allows the introduction of mechanistic parameters, thus making all model assumptions explicit.

However, introducing more and more realistic features to model the evolutionary process, requires the specification of a large number of parameters. This does not pose a problem for the simulations, the user simply has to define everything. Fortunately, our definition is flexible enough to introduce simpler models that capture the major feature of interaction between sites and need less parameters. Thus, it is possibly more insightful to confine the simulation to the relevant parameters.

Besides applications in phylogenetic inference, simulated datasets with dependencies can be used to test structure analysis methods, e.g. RNA structure prediction (Chiu and Kolodziejczak, 1991; Gutell et al., 1992; Gorodkin et al., 1997; Tabaska et al., 1998; Lueck et al., 1999; Akmaev et al., 2000; Knudsen and Hein, 2003; Hofacker et al., 2002, and references there in). Simultaneous structural RNA sequence alignment, structure prediction and phylogenetic reconstruction is still a problem. Another application of SISSI is the systematic study of the influence of phylogentic relationships among the sequences that are subject to structure prediction. Thus, SISSI illustrates the evolutionary path with compensatory mutation along the tree, e.g. with programs that detect nucleotide interactions. As a consequence this may result in intermediate structures that may show a large deviation from the structure defined by the neighbourhood system. If a huge fraction of closely related sequences happen to deviate by chance from the underlying structure, this will mislead structure prediction programs, which do not account in a proper way for the phylogenetic relationship. SISSI may help to address this particular problem, to distinguish structural (functional) from phylogenetic (ancestral) correlations.

A challenging extension of our model is the inclusion of energy values in the process of RNA evolution, e.g. to model base-pair stacking effects due to {pi}-orbital overlap (Figure 1c). This would add another realistic feature and therefore the evolutionary path through sequence space guided by the tree is more easily comparable with results produced by RNAinverse (Hofacker et al., 1994). RNAinverse searches for all sequences folding into a predefined structure but takes no phylogenetic relationship into account.

Finally, we would like to point out that this paper has focussed on applications of SISSI within the context of RNA. However, our method is applicable for other inter- and intra-site-specific interactions among nucleotides and other character based sequences, like amino acids, codons or discrete character states. Furthermore, it is not necessary to restrict the simulation to one neighbourhood system. It is very well possible to define different neighbourhood systems for different regions of the tree. Such simulations may be the basis for studies about structure evolution.


View this table:
[in this window]
[in a new window]
 
Algorithm 1 Computing a sequence x(d), d substitutions away from x(0).

 


    Acknowledgments
 
We wish to thank the Goldman Group at the EBI, the Biocomputing Group of the Biophysics Institute and the Bioinformatics Institute at Duesseldorf University. We would like to thank Andrew Rambaut for allowing us to use some code from Seq-Gen and for various help Heiko Schmidt, Andreas Wilm, Jutta Buschbom and Thomas Schlegel. Finally, thanks to Carolin Kosiol and Roland Fleißner for helpful comments on the manuscript. This work was supported by DFG grant SFB-TR1 (Deutsche Forschungsgemeinschaft). Financial support from the Marie Curie Foundation for a fellowship at the EBI is gratefully acknowledged. Funding to pay the Open Access publication charges for this article was provided by DFG grant SFB-TRI.

Conflict of Interest: none declared.


    FOOTNOTES
 
Associate Editor: Joaquin Dopazo

Received on July 22, 2005; revised on December 1, 2005; accepted on December 1, 2005

    REFERENCES
 TOP
 ABSTRACT
 INTRODUCTION
 THE NEIGHBOURHOOD SYSTEM OF...
 AN EVOLUTIONARY MODEL INCLUDING...
 SIMULATIONS
 RESULTS
 DISCUSSION
 REFERENCES
 

    Akmaev, V.R., et al. (2000) Phylogenetically enhanced statistical tools for RNA structure prediction. Bioinformatics, 16, 501–512[Abstract/Free Full Text].

    Arndt, P.F., et al. (2003) DNA sequence evolution with neighbor-dependent mutation. J. Comput. Biol, . 10, 313–322[CrossRef][ISI][Medline].

    Brown, J.W. (1999) The Ribonuclease P Database. Nucleic Acids Res, . 27, 314[Abstract/Free Full Text].

    Bruno, W.J. (1996) Modeling residue usage in aligned protein sequences via maximum likelihood. Mol. Biol. Evol, . 13, 1368–1374[Abstract].

    Cate, J.H., et al. (1996) Crystal structure of a group I ribozyme domain: principles of RNA packing. Science, 273, 1678–1685[Abstract/Free Full Text].

    Chiu, D.K. and Kolodziejczak, T. (1991) Inferring consensus structure from nucleic acid sequences. Comput. Appl. Biosci, . 7, 347–352[Abstract/Free Full Text].

    Felsenstein, J. (1981) Evolutionary trees from DNA sequences: a maximum likelihood approach. J. Mol. Evol, . 17, 368–376[CrossRef][ISI][Medline].

    Goldman, N., et al. (1998) Assessing the impact of secondary structure and solvent accessibility on protein evolution. Genetics, 149, 445–458[Abstract/Free Full Text].

    Gorodkin, J., et al. (1997) Displaying the information contents of structural RNA alignments: the structure logos. CABIOS, 13, 583–586.

    Grassly, N.C., et al. (1997) PSeq-Gen: an application for the Monte Carlo simulation of protein sequence evolution along phylogenetic trees. Comput. Appl. Biosci, . 13, 559–560[Free Full Text].

    Gutell, R.R., et al. (1992) Identifying constraints on the higher-order structure of RNA: continued development and application of comparative sequence analysis methods. Nucleic Acid Res, . 20, 5785–5795[Abstract/Free Full Text].

    von Haeseler, A. and Schöniger, M. (1998) Evolution of DNA or amino acid sequences with dependent sites. J. Comput. Biol, . 5, 149–163[ISI][Medline].

    Halpern, A.L. and Bruno, W.J. (1998) Evolutionary distances for protein-coding sequences: modeling site-specific residue frequencies. Mol. Biol. Evol, . 15, 910–917[Abstract].

    Hasegawa, M., et al. (1985) Dating of the human-ape splitting by a molecular clock of mitochondrial DNA. J. Mol. Evol, . 22, 160–174[CrossRef][ISI][Medline].

    Hofacker, I.L., et al. (2002) Secondary structure prediction for aligned RNA sequences. J. Mol. Biol, . 319, 1059–1066[CrossRef][ISI][Medline].

    Hofacker, I.L., et al. (1994) Fast folding and comparison of RNA secondary structures. Monatsh Chem, . 125, 167–188[CrossRef].

    Hudelot, C., et al. (2003) RNA-based phylogenetic methods: application to mammalian mitochondrial RNA sequences. Mol. Phylogenet. Evol, . 28, 241–252[CrossRef][ISI][Medline].

    Huelsenbeck, C. (1995) The performance of phylogenetic methods in simulation. Syst. Biol, . 44, 17–48.

    Jensen, J. and Pedersen, A-M.K. (2000) Probabilistic models of DNA sequence evolution with context dependent rates of substitution. Adv. Appl. Prob, . 32, 499–517[CrossRef].

    Jukes, T.H. and Cantor, C.R. (1969) Evolution of protein molecules. In Munro, H.N. (Ed.). Mammalian Protein Metabolism, , NY Academic Press Vol. 3, , pp. 21–132.

    Kimura, M. (1980) A simple method for estimating evolutionary rates of base substitutions through comparative studies of nucleotide sequences. J. Mol. Evol, . 16, 111–120[CrossRef][ISI][Medline].

    Knudsen, B. and Hein, J. (2003) Pfold: RNA secondary structure prediction using stochastic context-free grammars. Nucleic Acids Res, . 31, 3423–3428[Abstract/Free Full Text].

    Kosakovsky Pond, S.L., et al. (2005) HyPhy: hypothesis testing using phylogenies. Bioinformatics, 21, 676–679[Abstract/Free Full Text].

    Koshi, J.M. and Goldstein, R.A. (1995) Context dependent optimal substitution matrices. Protein Eng, . 8, 641–645[ISI][Medline].

    Koshi, J.M. and Goldstein, R.A. (1997) Mutation matrices and physical-chemical properties: correlations and implications. Proteins, 27, 336–344[CrossRef][ISI][Medline].

    Lartillot, N. and Philippe, H. (2004) A Bayesian mixture model for across-site heterogeneities in the amino-acid replacement process. Mol. Biol. Evol, . 21, 1095–1109[Abstract/Free Full Text].

    Lueck, R., et al. (1999) ConStruct: a tool for thermodynamic controlled prediction of conserved secondary structure. Nucleic Acid Res, . 27, 4208–4217[Abstract/Free Full Text].

    Lunter, G. and Hein, J. (2004) A nucleotide substitution model with nearest-neighbour interactions. Bioinformatics, 20, I216–I223.

    Muse, S.V. (1995) Evolutionary analyses of DNA sequences subject to constraints on secondary structure. Genetics, 139, 1429–1439[Abstract].

    Pagel, M. and Meade, A. (2004) A phyogenetic mixture model for detecting pattern-heterogeneity in gene sequence or character-state data. Syst. Biol, . 53, 571–581[CrossRef][ISI][Medline].

    Parisi, G. and Echave, J. (2001) Structural constraints and emergence of sequence patterns in protein evolution. Mol. Biol. Evol, . 18, 750–756[Abstract/Free Full Text].

    Parisi, G. and Echave, J. (2005) Generality of the structurally constrained protein evolution model: assessment on representatives of the four main fold classes. Gene, 345, 45–53[CrossRef][ISI][Medline].

    Pedersen, A-M. and Jensen, J.L. (2001) A dependent rates model and MCMC-based methodology for the maximum-likelihood analysis of sequences with overlapping reading frames. Mol. Biol. Evol, . 18, 763–776[Abstract/Free Full Text].

    Pedersen, J.S., et al. (2004) An evolutionary model for protein-coding regions with conserved RNA structure. Mol. Biol. Evol, . 21, 1913–1922[Abstract/Free Full Text].

    Pollock, D.D., et al. (1999) Coevolving protein residues: maximum likelihood identification and relationship to structure. J. Mol. Biol, . 287, 187–198[CrossRef][ISI][Medline].

    Rambaut, A. and Grassly, N.C. (1997) Seq-Gen: an application for the Monte Carlo simulation of DNA sequence evolution along phylogenetic trees. Comput. Appl. Biosci, . 13, 235–238[Abstract/Free Full Text].

    Robinson, D.M., et al. (2003) Protein evolution with dependence among codons due to tertiary structure. Mol. Biol. Evol, . 20, 1692–1704[Abstract/Free Full Text].

    Rzhetsky, A. (1995) Estimating substitution rates in ribosomal RNA genes. Genetics, 141, 771–783[Abstract].

    Savill, N.J., et al. (2000) RNA sequence evolution with secondary structure constraints: comparison of substitution rate models using maximum-likelihood methods. Genetics, 157, 399–411.

    Schöniger, M. and von Haeseler, A. (1994) A stochastic model for the evolution of autocorrelated DNA sequences. Mol. Phylogenet. Evol, . 3, 240–247[CrossRef][Medline].

    Schöniger, M. and von Haeseler, A. (1995) Simulating efficiently the evolution of DNA sequences. Comput. Appl. Biosci, . 11, 111–115[Abstract/Free Full Text].

    Siepel, A. and Haussler, D. (2004) Phylogenetic estimation of context-dependent substitution rates by maximum likelihood. Mol. Biol. Evol, . 21, 468–488[Abstract/Free Full Text].

    Smith, A.D., et al. (2004) Empirical models for substitution in ribosomal RNA. Mol. Biol. Evol, . 21, 419–427[Abstract/Free Full Text].

    Stoye, J., et al. (1998) Rose: generating sequence families. Bioinformatics, 14, 157–163[Abstract/Free Full Text].

    Tabaska, J.E., et al. (1998) An RNA folding method capable of identifying pseudoknots and base triples. Bioinformatics, 14, 691–699[Abstract/Free Full Text].

    Tavaré, S. (1986) Some probabilistic and statistical problems on the analysis of DNA sequences. Lec. Math. Life Sci, . 17, 57–86.

    Thorne, J.L., et al. (1996) Combining protein evolution and secondary structure. Mol. Biol. Evol, . 13, 666–673[Abstract].

    Tillier, E.R. (1994) Maximum likelihood with multiparameter models of substitution. J. Mol. Evol, . 39, 409–417[CrossRef].

    Tillier, E.R. and Collins, R.A. (1998) High apparent rate of simultaneous compensatory base-pair substitutions in ribosomal RNA. Genetics, 148, 1993–2002[Abstract/Free Full Text].

    Tufféry, P. (2002) CS-PSeq-Gen: simulating the evolution of protein sequence under constraints. Bioinformatics, 18, 1015–1016[Abstract/Free Full Text].

    Yang, Z. (1993) Maximum-likelihood estimation of phylogeny from DNA sequences when substitution rates differ over sites. Mol. Biol. Evol, . 10, 1396–1401[Abstract].

    Yang, Z. (2004) PAML: a program package for phylogenetic analysis by maximum likelihood. Comput. Appl. BioSci, . 13, 555–556.


Add to CiteULike CiteULike   Add to Connotea Connotea   Add to Del.icio.us Del.icio.us    What's this?


This article has been cited by other articles:


Home page
Mol Biol EvolHome page
C. L. Strope, S. D. Scott, and E. N. Moriyama
indel-Seq-Gen: A New Protein Family Simulator Incorporating Domains, Motifs, and Indels
Mol. Biol. Evol., March 1, 2007; 24(3): 640 - 649.
[Abstract] [Full Text] [PDF]


This Article
Right arrow Abstract Freely available
Right arrow FREE Full Text (Print PDF) Freely available
Right arrowOA All Versions of this Article:
22/6/716    most recent
bti812v1
Right arrow Alert me when this article is cited
Right arrow Alert me if a correction is posted
Services
Right arrow Email this article to a friend
Right arrow Similar articles in this journal
Right arrow Similar articles in ISI Web of Science
Right arrow Similar articles in PubMed
Right arrow Alert me to new issues of the journal
Right arrow Add to My Personal Archive
Right arrow Download to citation manager
Right arrow Search for citing articles in:
ISI Web of Science (4)
Google Scholar
Right arrow Articles by Gesell, T.
Right arrow Articles by von Haeseler, A.
Right arrow Search for Related Content
PubMed
Right arrow PubMed Citation
Right arrow Articles by Gesell, T.
Right arrow Articles by von Haeseler, A.
Social Bookmarking
 Add to CiteULike   Add to Connotea   Add to Del.icio.us  
What's this?