Bioinformatics Advance Access originally published online on December 6, 2005
Bioinformatics 2006 22(6):716-722; doi:10.1093/bioinformatics/bti812
| ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
In silico sequence evolution with site-specific interactions along phylogenetic trees
1Heinrich-Heine University Duesseldorf, Universitaetsstrasse 1 40225 Duesseldorf, Germany
2Center for Integrative Bioinformatics Vienna, Max F. Perutz Laboratories Dr. Bohr-Gasse 9, A-1030 Vienna, Austria
3University of Vienna Austria
4Medical University of Vienna Austria
5University of Veterinary Medicine Vienna Austria
*To whom correspondence should be addressed.
| ABSTRACT |
|---|
|
|
|---|
Motivation: A biological sequence usually has many sites whose evolution depends on other positions of the sequence, but this is not accounted for by commonly used models of sequence evolution. Here we introduce a Markov model of nucleotide sequence evolution in which the instantaneous substitution rate at a site depends on the states of other sites. Based on the concept of neighbourhood systems, our model represents a universal description of arbitrarily complex dependencies among sites.
Results: We show how to define complex models for some illustrative examples and demonstrate that our method provides a versatile resource for simulations of sequence evolution with site-specific interactions along a tree. For example, we are able to simulate the evolution of RNA taking into account both secondary structure as well as pseudoknots and other tertiary interactions. To this end, we have developed a program Simulating Site-Specific Interactions (SISSI) that simulates evolution of a nucleotide sequence along a phylogenetic tree incorporating user defined site-specific interactions. Furthermore, our method allows to simulate more complex interactions among nucleotide and other character based sequences.
Availability: We implemented our method in an ANSI C program SISSI which runs on UNIX/Linux, Windows and Mac OS systems, including Mac OS X. SISSI is available at http://www.bi.uni-duesseldorf.de/software/sissi/
Contact: sissi{at}cs.uni-duesseldorf.de
| INTRODUCTION |
|---|
|
|
|---|
Evolutionary analysis of biological sequences typically assumes that sites are evolving independently of each other (cf. Tavaré, 1986). However, this simplifying assumption does not hold true generally. Thus, in recent years evolutionary models have been suggested to remedy this unsatisfactory situation. Markov models that take the base-pairings in stem regions of RNA molecules into account were among the first to model the process of evolution more realistically (Schöniger and von Haeseler, 1994; Tillier, 1994; Muse, 1995; Rzhetsky, 1995; Tillier and Collins, 1998; Savill et al., 2000; Smith et al., 2004). Models to detect protein sites with correlated patterns of evolution have also been proposed (e.g. Pollock et al., 1999). Furthermore, models including selection against CpG-dinucleotides were studied as an example of overlapping context dependencies (Jensen and Pedersen, 2000; Arndt et al., 2003; Siepel and Haussler, 2004). More special models with overlapping reading frames (Pedersen and Jensen, 2001) and with focus on protein structure, were also suggested (Robinson et al., 2003). Recently, irreversible complex models with overlapping neighbouring nucleotide pairs were developed (Lunter and Hein, 2004) and Pedersen et al. (2004) modeled the substitution process in protein-coding regions with embedded conserved RNA structures.
Modeling the evolution of a collection of homologous sequences is not simply an academic gimmick. A profound knowledge of how sequences evolve can help us to improve the reconstruction of phylogenetic trees based on sequences. However, the true mode of sequence evolution between homologous sequences is unknown with few exceptions. Therefore simulating sequence evolution is helpful to investigate the performance of tree-building methods (Huelsenbeck, 1995). So far different programs have been designed to simulate nucleotide sequences and protein sequences along a tree (Schöniger and von Haeseler, 1995; Rambaut and Grassly, 1997; Grassly et al., 1997; Yang, 1997; Hudelot et al., 2003; Tufféry, 2002; Kosakovsky Pond et al., 2005). One of the most widely used programs Seq-Gen (Rambaut and Grassly, 1997) has implemented a wide range of independent nucleotide substitution models (e.g. Jukes and Cantor, 1969; Kimura, 1980; Felsenstein, 1981; Hasegawa et al., 1985). The PHASE package (Hudelot et al., 2003) implements base-paired substitution models, but is specially designed for RNA sequences with secondary structure. To the best of our knowledge a general sequence simulation program including site-specific interactions based on a well defined neighbourhood system does not exist.
While progress has been made in devising new models for inference of sequences with dependencies among sites, there is a lack of simulation models that would allow the assessment of this progress. Especially if one wants to assess the robustness of phylogenetic inference, the models used for simulation need to be more accurate and complex descriptions of nature than those used for inference. Furthermore, simulations taking into account site-specific interactions that evolved along a phylogeny are of value in themselves, because they contribute to our understanding of the intertwined relationship between structure and substitution process. The use of supervised sequence evolution (i.e. simulations) allows us to control and study the extent of structural and sequence conservation.
In the following section we introduce the representation of site-specific interactions using a neighbourhood system. It allows for a universal description of arbitrarily complex dependencies among sites. We give some simple examples of how to apply neighbourhood systems to define various structural elements in RNA sequences. Then, we define a site and neighbourhood-dependent substitution process that permits a versatile description of the various evolutionary forces acting on single sites. We show that our method is useful to simulate the evolution e.g. of RNA sequences and structure simultaneously since it can take into account both, base-pairing counterparts as well as further interactions between nucleotides in sequences.
| THE NEIGHBOURHOOD SYSTEM OF A SEQUENCE |
|---|
|
|
|---|
In the following, we describe for each site k = 1, ... , l in a (nucleotide) sequence x = (x1, ... , xl) the interaction of k with other sites in x. To this end, we introduce the neighbourhood system
= (Nk)k=1,2, ... , l such that:- Nk
{1, ..., l}, k
Nk for each k
- if i
Nk then k
Ni for each i, k.
can be visualized as a graph, with vertices
= {1, ..., l} and edge set
= {(k, i) | 1
k
l, i
Nk}. One visualization of this graph are circle plots, where the vertices are arranged in a circle and the edges connect two vertices inside the circle. Using the notation of a neighbourhood system it is easily possible to encode various secondary and tertiary structural elements in a unifying framework.
Figure 1 illustrates some well known RNA structures together with the corresponding neighbourhoods in the circle plot. A stem region of an RNA molecule is encoded by a neighbourhood system, with nk = 1 for sites in a stem and nk = 0 for sites in a loop, where i
Nk is the site that base-pairs with k (Fig. 1a). Similarly, we can encode a pseudoknot, again nk is either zero or one. Here, the resulting circle plot shows intersecting edges (Fig. 1b). One can proceed to model even more elaborate interactions. Figure 1c displays the neighbourhood system that results if interactions owing to base-stacking are incorporated in our model. The corresponding circle graph displays many intersecting edges and takes into account overlapping dependencies. For example, site 11 is inter alia an element of the neighbourhoods N10 and N12, while site 10 is not in N12 and vice versa. Figure 2 finally displays the interactions deduced from a ribozyme domain (Cate et al., 1996), where site 153 interacts with sites 150, 223 and 250. The described interactions are crucial for the integrity of the molecule and should be taken into account when modeling the process of evolution.
|
|
| AN EVOLUTIONARY MODEL INCLUDING NEIGHBOURHOODS |
|---|
|
|
|---|
In the previous section, we have introduced a tool to succinctly summarize interactions among sites in a (nucleotide) sequence. In the following, we need to superimpose an evolutionary dynamics which acts on the sites of a sequence and which takes into account these interactions.
Hence, we define a substitution process for every site k, where the substitution of a given nucleotide xk by another one depends on the states
of the sites
. To be more formal, we introduce at each site k a site-specific rate matrix Qk. Thus Q = {Qk | k = 1, ... , l} constitutes a collection of possibly different substitutions models acting on the sequence and an annotation of correlations among sites. Contrary to standard models which assume independent evolution of the sites, Qk has dimensions
, where
is the size of the alphabet. For the examples discussed here
, or
, hence
. Thus if nk = 0, Qk can be defined as one of the usual rate matrices on
, i.e. we may assume a JukesCantor matrix, a HasegawaKishino-Matrix or another independent model (Jukes and Cantor, 1969; Kimura, 1980; Felsenstein, 1981; Hasegawa et al., 1985). If nk > 0, then Qk acts on subsequences of length nk + 1. We impose the usual restriction, that only one substitution per unit time is admissible (Schöniger and von Haeseler, 1994, 1995). Moreover, in the matrix Qk a substitution is only possible at site k. This restriction leads to sparse rate matrices. Let
represent the actual subsequence of sequence x, where
and let
denote an arbitrary sequence of the same length. To avoid notational confusion, we assume
. With (
k(y)) we denote the stationary distribution of rate matrix Qk. Because only site k is allowed to vary, the entries of Qk are given by
![]() | (1) |
non-zero entries.
We scale Qk such that the number of substitutions dk equals 1:
![]() | (2) |
. Further generalizations are possible and we will discuss them later. The framework outlined here allows a rate matrix for each site in the sequence. To complete the discussion of the evolutionary process, we define the total instantaneous substitution rate for x as
![]() | (3) |
To illustrate the notation of Qk, we continue with the examples from Figure 1. Figure 1a displays the neighbourhood system for a stem region that mimics the doublet model for base-paired sites i and k, with Ni = {k} and Nk = {i}. Table 1 displays a possible rate matrix acting on site k while taking into account the state at site i. Note that this matrix is reversible and has 15 parameters. Accordingly, we may define a similar rate matrix for site i (Table 2). If 
ß =
ß
for
, ß
{A,C,G,U}, then both matrices have identical entries with nine free parameters. If the matrices in Tables 1 and 2 are applied to all sites in a stem, this gives the evolutionary process defined by Schöniger and von Haeseler (1994). The matrices can be summarized in a more condensed form. With
we denote a sequence of length nk and (
) represents the current subsequence in x as induced by Nk. The admissible substitutions for one site are written in the following submatrix:
|
| (4) |
instantaneous rate matrix in
submatrices of the type as illustrated in (4).
|
|
|
The reader may notice that our definition of a rate matrix is not limited to F81 types of substitution matrices (Felsenstein, 1981). The submatrix (4) can be extended to any type of rate matrix e.g. by introducing specific substitution rates. However, for the time being, we think that the frequencies of subsequences provide a reasonably good description of interactions among sites.
| SIMULATIONS |
|---|
|
|
|---|
In the following, a neighbourhood system
and a collection of site-specific rate matrices Q = {Qk | k = 1, ..., l} are defined. We start at mutational time 0 with sequence x(0) which evolves according to (
, Q) and we want to generate a sequence x(d) after d expected substitutions. To simulate this procedure we adopt algorithm 1 of von Haeseler and Schöniger (1998). At time d = 0 the instantaneous substitution rate equals q(x) [Equation (3)]. We draw a random time dr from an exponential distribution with parameter q(x). If dr < d then a substitution takes place in x. We pick a site k with probability
![]() | (5) |
![]() | (6) |
d dr and q(x) is recomputed based on the new sequence and the simulation continues. This procedure is summarized in the following pseudo code: Finally, this procedure is applied recursively through a given rooted or unrooted tree topology, where the branch lengths are specified by the expected number of substitutions. This method is implemented in the program Simulating Site-Specific Interactions (SISSI).
| RESULTS |
|---|
|
|
|---|
Simulations employing a neighbourhood system, incorporating artificial or known structural features, were run on an ordinary PC. Although the models are more complex than the well known independent models the computing time for the simulations is satisfactory. If nk = l 1 for all k = 1,
l then the run time increases quadratically with sequence length l. Run time increases linearly as a function of the number of taxa or the total branch length of the tree.
As an illustrative example, we used the neighbourhood system of RNase P from Bacillus subtilis with 401 sites (Fig. 3) taken from the RNase P database (Brown, 1999). To specify the rate matrices we used the frequencies of the nucleotides {A, C, G, U} for sites evolving independently (nk = 0) and the doublet frequencies for sites with nk = 1 from one corresponding sequence in the database (Table 4). We used only these two matrices in the simulations. In the sequence 41.15% of sites evolve independently and 58.85% evolve under dependencies. Having specified (
, Q) it took on average 1 s to simulate a dataset of 100 sequences along a tree with mean branch length 0.3.
|
|
Figure 4 shows the accumulation of observed sequence differences per site as the number of substitutions per site increases. The curve shows the expected saturation behaviour as d goes to infinity. The simulated curve lies between the theoretical curves we obtain for the F81 model and the doublet model.
|
In real data we do not know the expected number of substitutions. The different speeds of accumulation of observed differences have a great impact on estimation of the number of substitutions. With our method we can investigate the relationship between the numbers of substitutions per site and the number of observed differences simultaneously for different neighbourhood systems with varying complexity.
For non-overlapping sites in the neighbourhood system and small number of neighbours nk, it is possible to calculate the number of substitutions and the number of observed differences analytically (von Haeseler and Schöniger, 1998). Our simulation results agree with the expected numbers derived from appropriately weighting the expected numbers of observed differences for independent and dependent sites of the neighbourhoud system of the RNase P of B.subtilis (Fig. 4).
However, as indicated in Figures 1c and 2 more complex neighbourhoods can be simulated, where we allow for overlapping neighbourhoods. The example given here simply illustrates the basic principle. To introduce more realistic applications is beyond the scope of the paper.
| DISCUSSION |
|---|
|
|
|---|
In this paper, we have introduced a general framework to take site-specific interactions into account. We can mimic sequence evolution with various complex dependencies among sites. The basic idea is the application of different substitution matrices for each site defined by the interactions with other sites in the sequence. Our implementation, SISSI, allows the evolution of nucleotide sequences along a tree for user defined systems of neighbourhoods and instantaneous rate matrices.
Simulations have shown that SISSI produces sequences under constrained evolution in reasonable time. While for simulations with independent sites Seq-Gen (Rambaut and Grassly, 1997) should be used, because it is more time efficient, we have shown that the runtime is not really an issue for our general approach. Thus it should be possible to generate large simulated datasets that may be used to analyse the reliability of tree reconstruction methods under deviations from the independent site assumptions.
Although we discussed Felsenstein F81 types of rate matrices as an example, SISSI is principally not limited to this type of substitution process. It is for example easily possible to include a transition-transversion parameter. SISSI also allows the inclusion of rate heterogeneity or codon position-specific heterogeneity (Yang, 1993).
Models with site-specific rate matrices have also been studied, as well as various mixture models (e.g. Koshi and Goldstein, 1995; Bruno, 1996; Thorne et al., 1996; Koshi and Goldstein, 1997; Halpern and Bruno, 1998; Goldman et al., 1998; Lartillot and Philippe, 2004; Pagel and Meade, 2004). Recently, a number of models of protein evolution have been developed to account for protein structure by accepting randomly generated mutations if they do not affect the structure too much (e.g. Parisi and Echave, 2001, 2005). With our method, allowing the specification of site-specific rate matrices or different rate matrices for different regions of the simulated sequence is straightforward. Moreover, our framework allows the introduction of mechanistic parameters, thus making all model assumptions explicit.
However, introducing more and more realistic features to model the evolutionary process, requires the specification of a large number of parameters. This does not pose a problem for the simulations, the user simply has to define everything. Fortunately, our definition is flexible enough to introduce simpler models that capture the major feature of interaction between sites and need less parameters. Thus, it is possibly more insightful to confine the simulation to the relevant parameters.
Besides applications in phylogenetic inference, simulated datasets with dependencies can be used to test structure analysis methods, e.g. RNA structure prediction (Chiu and Kolodziejczak, 1991; Gutell et al., 1992; Gorodkin et al., 1997; Tabaska et al., 1998; Lueck et al., 1999; Akmaev et al., 2000; Knudsen and Hein, 2003; Hofacker et al., 2002, and references there in). Simultaneous structural RNA sequence alignment, structure prediction and phylogenetic reconstruction is still a problem. Another application of SISSI is the systematic study of the influence of phylogentic relationships among the sequences that are subject to structure prediction. Thus, SISSI illustrates the evolutionary path with compensatory mutation along the tree, e.g. with programs that detect nucleotide interactions. As a consequence this may result in intermediate structures that may show a large deviation from the structure defined by the neighbourhood system. If a huge fraction of closely related sequences happen to deviate by chance from the underlying structure, this will mislead structure prediction programs, which do not account in a proper way for the phylogenetic relationship. SISSI may help to address this particular problem, to distinguish structural (functional) from phylogenetic (ancestral) correlations.
A challenging extension of our model is the inclusion of energy values in the process of RNA evolution, e.g. to model base-pair stacking effects due to
-orbital overlap (Figure 1c). This would add another realistic feature and therefore the evolutionary path through sequence space guided by the tree is more easily comparable with results produced by RNAinverse (Hofacker et al., 1994). RNAinverse searches for all sequences folding into a predefined structure but takes no phylogenetic relationship into account.
Finally, we would like to point out that this paper has focussed on applications of SISSI within the context of RNA. However, our method is applicable for other inter- and intra-site-specific interactions among nucleotides and other character based sequences, like amino acids, codons or discrete character states. Furthermore, it is not necessary to restrict the simulation to one neighbourhood system. It is very well possible to define different neighbourhood systems for different regions of the tree. Such simulations may be the basis for studies about structure evolution.
|
| Acknowledgments |
|---|
We wish to thank the Goldman Group at the EBI, the Biocomputing Group of the Biophysics Institute and the Bioinformatics Institute at Duesseldorf University. We would like to thank Andrew Rambaut for allowing us to use some code from Seq-Gen and for various help Heiko Schmidt, Andreas Wilm, Jutta Buschbom and Thomas Schlegel. Finally, thanks to Carolin Kosiol and Roland Fleißner for helpful comments on the manuscript. This work was supported by DFG grant SFB-TR1 (Deutsche Forschungsgemeinschaft). Financial support from the Marie Curie Foundation for a fellowship at the EBI is gratefully acknowledged. Funding to pay the Open Access publication charges for this article was provided by DFG grant SFB-TRI.
Conflict of Interest: none declared.
| FOOTNOTES |
|---|
Associate Editor: Joaquin Dopazo
Received on July 22, 2005; revised on December 1, 2005; accepted on December 1, 2005
| REFERENCES |
|---|
|
|
|---|
Akmaev, V.R., et al. (2000) Phylogenetically enhanced statistical tools for RNA structure prediction. Bioinformatics, 16, 501512
Arndt, P.F., et al. (2003) DNA sequence evolution with neighbor-dependent mutation. J. Comput. Biol, . 10, 313322[CrossRef][ISI][Medline].
Brown, J.W. (1999) The Ribonuclease P Database. Nucleic Acids Res, . 27, 314
Bruno, W.J. (1996) Modeling residue usage in aligned protein sequences via maximum likelihood. Mol. Biol. Evol, . 13, 13681374[Abstract].
Cate, J.H., et al. (1996) Crystal structure of a group I ribozyme domain: principles of RNA packing. Science, 273, 16781685
Chiu, D.K. and Kolodziejczak, T. (1991) Inferring consensus structure from nucleic acid sequences. Comput. Appl. Biosci, . 7, 347352
Felsenstein, J. (1981) Evolutionary trees from DNA sequences: a maximum likelihood approach. J. Mol. Evol, . 17, 368376[CrossRef][ISI][Medline].
Goldman, N., et al. (1998) Assessing the impact of secondary structure and solvent accessibility on protein evolution. Genetics, 149, 445458
Gorodkin, J., et al. (1997) Displaying the information contents of structural RNA alignments: the structure logos. CABIOS, 13, 583586.
Grassly, N.C., et al. (1997) PSeq-Gen: an application for the Monte Carlo simulation of protein sequence evolution along phylogenetic trees. Comput. Appl. Biosci, . 13, 559560
Gutell, R.R., et al. (1992) Identifying constraints on the higher-order structure of RNA: continued development and application of comparative sequence analysis methods. Nucleic Acid Res, . 20, 57855795
von Haeseler, A. and Schöniger, M. (1998) Evolution of DNA or amino acid sequences with dependent sites. J. Comput. Biol, . 5, 149163[ISI][Medline].
Halpern, A.L. and Bruno, W.J. (1998) Evolutionary distances for protein-coding sequences: modeling site-specific residue frequencies. Mol. Biol. Evol, . 15, 910917[Abstract].
Hasegawa, M., et al. (1985) Dating of the human-ape splitting by a molecular clock of mitochondrial DNA. J. Mol. Evol, . 22, 160174[CrossRef][ISI][Medline].
Hofacker, I.L., et al. (2002) Secondary structure prediction for aligned RNA sequences. J. Mol. Biol, . 319, 10591066[CrossRef][ISI][Medline].
Hofacker, I.L., et al. (1994) Fast folding and comparison of RNA secondary structures. Monatsh Chem, . 125, 167188[CrossRef].
Hudelot, C., et al. (2003) RNA-based phylogenetic methods: application to mammalian mitochondrial RNA sequences. Mol. Phylogenet. Evol, . 28, 241252[CrossRef][ISI][Medline].
Huelsenbeck, C. (1995) The performance of phylogenetic methods in simulation. Syst. Biol, . 44, 1748.
Jensen, J. and Pedersen, A-M.K. (2000) Probabilistic models of DNA sequence evolution with context dependent rates of substitution. Adv. Appl. Prob, . 32, 499517[CrossRef].
Jukes, T.H. and Cantor, C.R. (1969) Evolution of protein molecules. In Munro, H.N. (Ed.). Mammalian Protein Metabolism, , NY Academic Press Vol. 3, , pp. 21132.
Kimura, M. (1980) A simple method for estimating evolutionary rates of base substitutions through comparative studies of nucleotide sequences. J. Mol. Evol, . 16, 111120[CrossRef][ISI][Medline].
Knudsen, B. and Hein, J. (2003) Pfold: RNA secondary structure prediction using stochastic context-free grammars. Nucleic Acids Res, . 31, 34233428
Kosakovsky Pond, S.L., et al. (2005) HyPhy: hypothesis testing using phylogenies. Bioinformatics, 21, 676679
Koshi, J.M. and Goldstein, R.A. (1995) Context dependent optimal substitution matrices. Protein Eng, . 8, 641645[ISI][Medline].
Koshi, J.M. and Goldstein, R.A. (1997) Mutation matrices and physical-chemical properties: correlations and implications. Proteins, 27, 336344[CrossRef][ISI][Medline].
Lartillot, N. and Philippe, H. (2004) A Bayesian mixture model for across-site heterogeneities in the amino-acid replacement process. Mol. Biol. Evol, . 21, 10951109
Lueck, R., et al. (1999) ConStruct: a tool for thermodynamic controlled prediction of conserved secondary structure. Nucleic Acid Res, . 27, 42084217
Lunter, G. and Hein, J. (2004) A nucleotide substitution model with nearest-neighbour interactions. Bioinformatics, 20, I216I223.
Muse, S.V. (1995) Evolutionary analyses of DNA sequences subject to constraints on secondary structure. Genetics, 139, 14291439[Abstract].
Pagel, M. and Meade, A. (2004) A phyogenetic mixture model for detecting pattern-heterogeneity in gene sequence or character-state data. Syst. Biol, . 53, 571581[CrossRef][ISI][Medline].
Parisi, G. and Echave, J. (2001) Structural constraints and emergence of sequence patterns in protein evolution. Mol. Biol. Evol, . 18, 750756
Parisi, G. and Echave, J. (2005) Generality of the structurally constrained protein evolution model: assessment on representatives of the four main fold classes. Gene, 345, 4553[CrossRef][ISI][Medline].
Pedersen, A-M. and Jensen, J.L. (2001) A dependent rates model and MCMC-based methodology for the maximum-likelihood analysis of sequences with overlapping reading frames. Mol. Biol. Evol, . 18, 763776
Pedersen, J.S., et al. (2004) An evolutionary model for protein-coding regions with conserved RNA structure. Mol. Biol. Evol, . 21, 19131922
Pollock, D.D., et al. (1999) Coevolving protein residues: maximum likelihood identification and relationship to structure. J. Mol. Biol, . 287, 187198[CrossRef][ISI][Medline].
Rambaut, A. and Grassly, N.C. (1997) Seq-Gen: an application for the Monte Carlo simulation of DNA sequence evolution along phylogenetic trees. Comput. Appl. Biosci, . 13, 235238
Robinson, D.M., et al. (2003) Protein evolution with dependence among codons due to tertiary structure. Mol. Biol. Evol, . 20, 16921704
Rzhetsky, A. (1995) Estimating substitution rates in ribosomal RNA genes. Genetics, 141, 771783[Abstract].
Savill, N.J., et al. (2000) RNA sequence evolution with secondary structure constraints: comparison of substitution rate models using maximum-likelihood methods. Genetics, 157, 399411.
Schöniger, M. and von Haeseler, A. (1994) A stochastic model for the evolution of autocorrelated DNA sequences. Mol. Phylogenet. Evol, . 3, 240247[CrossRef][Medline].
Schöniger, M. and von Haeseler, A. (1995) Simulating efficiently the evolution of DNA sequences. Comput. Appl. Biosci, . 11, 111115
Siepel, A. and Haussler, D. (2004) Phylogenetic estimation of context-dependent substitution rates by maximum likelihood. Mol. Biol. Evol, . 21, 468488
Smith, A.D., et al. (2004) Empirical models for substitution in ribosomal RNA. Mol. Biol. Evol, . 21, 419427
Stoye, J., et al. (1998) Rose: generating sequence families. Bioinformatics, 14, 157163
Tabaska, J.E., et al. (1998) An RNA folding method capable of identifying pseudoknots and base triples. Bioinformatics, 14, 691699
Tavaré, S. (1986) Some probabilistic and statistical problems on the analysis of DNA sequences. Lec. Math. Life Sci, . 17, 5786.
Thorne, J.L., et al. (1996) Combining protein evolution and secondary structure. Mol. Biol. Evol, . 13, 666673[Abstract].
Tillier, E.R. (1994) Maximum likelihood with multiparameter models of substitution. J. Mol. Evol, . 39, 409417[CrossRef].
Tillier, E.R. and Collins, R.A. (1998) High apparent rate of simultaneous compensatory base-pair substitutions in ribosomal RNA. Genetics, 148, 19932002
Tufféry, P. (2002) CS-PSeq-Gen: simulating the evolution of protein sequence under constraints. Bioinformatics, 18, 10151016
Yang, Z. (1993) Maximum-likelihood estimation of phylogeny from DNA sequences when substitution rates differ over sites. Mol. Biol. Evol, . 10, 13961401[Abstract].
Yang, Z. (2004) PAML: a program package for phylogenetic analysis by maximum likelihood. Comput. Appl. BioSci, . 13, 555556.
This article has been cited by other articles:
![]() |
C. L. Strope, S. D. Scott, and E. N. Moriyama indel-Seq-Gen: A New Protein Family Simulator Incorporating Domains, Motifs, and Indels Mol. Biol. Evol., March 1, 2007; 24(3): 640 - 649. [Abstract] [Full Text] [PDF] |
||||
| ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||









and single frequencies
from a RNase P sequence of B. subtilis (Accessionsnumber: M13175) taken from the RNase P database (Brown, 1999)
