Bioinformatics Advance Access originally published online on April 13, 2006
Bioinformatics 2006 22(13):1593-1599; doi:10.1093/bioinformatics/btl142
| ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
STRAL: progressive alignment of non-coding RNA using base pairing probability vectors in quadratic time
Heinrich-Heine-Universität Düsseldorf, Institut für Physikalische Biologie D-40225 Düsseldorf, Germany
*To whom correspondence should be addressed.
| ABSTRACT |
|---|
|
|
|---|
Motivation: Alignment of RNA has a wide range of applications, for example in phylogeny inference, consensus structure prediction and homology searches. Yet aligning structural or non-coding RNAs (ncRNAs) correctly is notoriously difficult as these RNA sequences may evolve by compensatory mutations, which maintain base pairing but destroy sequence homology. Ideally, alignment programs would take RNA structure into account. The Sankoff algorithm for the simultaneous solution of RNA structure prediction and RNA sequence alignment was proposed 20 years ago but suffers from its exponential complexity. A number of programs implement lightweight versions of the Sankoff algorithm by restricting its application to a limited type of structure and/or only pairwise alignment. Thus, despite recent advances, the proper alignment of multiple structural RNA sequences remains a problem.
Results: Here we present STRAL, a heuristic method for alignment of ncRNA that reduces sequencestructure alignment to a two-dimensional problem similar to standard multiple sequence alignment. The scoring function takes into account sequence similarity as well as up- and downstream pairing probability. To test the robustness of the algorithm and the performance of the program, we scored alignments produced by STRAL against a large set of published reference alignments. The quality of alignments predicted by STRAL is far better than that obtained by standard sequence alignment programs, especially when sequence homologies drop below
65%; nevertheless STRALs runtime is comparable to that of CLUSTALW.
Availability: STRAL is implemented in C. Source code (under GNU public license) as well as a precompiled Debian package can be downloaded at http://www.biophys.uni-duesseldorf.de/stral/
Contact: stral{at}biophys.uni-duesseldorf.de
Supplementary information: Supplementary data available at Bioinformatics online.
| 1 INTRODUCTION |
|---|
|
|
|---|
Non-coding RNAs (ncRNAs) are RNA molecules or elements that do not code for proteins but nevertheless are functional in biological processes, including localization, replication, translation, degradation and stabilization of biological macromolecules (for review and further references see Eddy, 2001; Storz, 2002; Winkler and Breaker, 2005; Fedor and Williamson, 2005; Gottesmann, 2005). Prominent examples are small nuclear RNAs, which are involved in mRNA splicing, and riboswitches, which are located in non-translated regions of mRNAs, where they bind metabolites and control gene expression.
For analysis of ncRNA function, knowledge about their secondary and tertiary structure is crucial. Structure prediction for single sequences is performed by dynamic programming, which allows the thermodynamically optimal structure or structure ensemble to be found (Zuker, 2000, 2003; Hofacker, 2003; Rivas and Eddy, 1999). These algorithms rely on correctness of thermodynamic parameters, neglect the influence of kinetics on structure formation and are not able to take into account interactions with other macromolecules. The alternative methodcalled comparative sequence analysisneeds a set of homologous RNAs and predicts basebase interactions on the basis of compensatory mutations (Chiu and Kolodziejczak, 1991; Gautheret et al., 1995; Lescoute et al., 2005). Its reliability increases with the number and divergence of sequences, but it needs an alignment (nearly) perfect with respect to sequence and structure. Other approaches for consensus structure prediction based on RNA alignments include PFOLD (Knudsen and Hein, 2003) and RNAALIFOLD (Hofacker et al., 2002). Furthermore, RNA alignments are an essential basis for phylogeny inference (e.g. Hudelot et al., 2003), homology searches (e.g. Gräf et al., 2006; Eddy, 2002) and approaches to searching for new ncRNAs (e.g. Washietl et al., 2005).
The structurally correct alignment of RNA sequences is, however, a difficult problem. An algorithm for simultaneously optimizing the sequence and structure of an RNA set was published by Sankoff in 1985; however, the algorithm is not employable due to its computational complexity
and memory usage
for m sequences of length n. Thus, several variants of this algorithm have been developed which are restricted to pairwise alignment only and implement other simplifications to make the calculation tractable (Mathews and Turner, 2002; Hofacker et al., 2004; Havgaard et al., 2005a; Holmes, 2005).
This situation resembles that of (pure) sequence alignment: the algorithm for aligning two sequences is relatively cheap (Smith and Waterman, 1981), whereas the same approach cannot be applied to multiple sequence alignment due to its complexity of
(Fuellen, G., 1997). This led to the development of very successful heuristic alignment methods, for example CLUSTAL (Thompson et al., 1994a). Here we follow the same heuristic multiple sequence alignment approach but enhance it using a scoring function that emphasizes structural features. For this purpose, we project the structure features calculated by a thermodynamic approach (RNAFOLD; Hofacker, 2003) on top of the sequence. Similar approaches have already been proposed in the literature (Bonhoeffer et al., 1993; Yang and Blanchette, 2004) but have not been implemented. We call our program STRAL.
To test the performance of STRAL, we compare it with two sequence alignment programsCLUSTAL (Thompson et al., 1994a) and PROALIGN (Löytynoja and Milinkovitch, 2003)and three different structure alignment programs:
- PMSTRING (PMMULTI in string-like alignment mode; see Hofacker et al., 2004) follows a concept related to that of our approach: it uses strings of pairing probabilities obtained from Sankoff-like pairwise alignments to produce a guide tree for a multiple alignment. This procedure, however, often produces misaligned [sequence] pairs (Hofacker et al., 2004).
- MARNA (Siebert and Backofen, 2005) uses a distantly related approach: it performs pairwise alignments of RNAs using sequence and structure information (based on RNAFOLD predictions) and combines the pairwise, weighted information into a multiple alignment with T-COFFEE (Notredame et al., 2000).
- STEMLOC (Holmes, 2005) implements the pairwise Sankoff algorithm using a pair stochastic context-free grammar and a heuristic for multiple alignment. To reduce computing cost and memory usage, constraints are imposed by fold and alignment envelopes, which take into account, for example, the 1000 best single sequence structure predictions and the 100 best alignments.
- MARNA (Siebert and Backofen, 2005) uses a distantly related approach: it performs pairwise alignments of RNAs using sequence and structure information (based on RNAFOLD predictions) and combines the pairwise, weighted information into a multiple alignment with T-COFFEE (Notredame et al., 2000).
| 2 SYSTEMs AND METHODS |
|---|
|
|
|---|
STRAL is implemented in C and should compile under any Unix system. We used the GNU C compiler (GCC) versions 3 and 4. The program was thoroughly tested on several Linux distributions, including Debian Version 3.1 and Red Hat Fedora Core 3. To facilitate installation, support for GNU autotools is built in. We also provide a precompiled Debian package. STRAL requires RNAfold's RNAlib (Hofacker et al., 1994; Hofacker, 2003) version 1.5 and the squid library (Eddy, 2005) version 1.9g. Static versions of these libraries compiled for i386 are included in the package, which can be downloaded at http://www.biophys.uni-duesseldorf.de/stral/. All alignments have been computed using STRAL version 0.5.2. Runtime comparison was performed on a 1.8 GHz Dual-Opteron machine with 4 GB memory; STEMLOC computations had to be performed on 2.4 GHz Opterons with 16 GB memory. Programs have been compiled with optimization level 3 (GCC 4).
2.1 Reference alignments
As reference alignments we used dataset-1 from the RNA alignment benchmark database BRAlibase (Gardner et al., 2005). This dataset consists of 388 alignments of Group I introns, 5S rRNAs, tRNAs, and U5 spliceosomal RNAs with five sequences per alignment.
2.2 Scoring of alignments
As proposed by Gardner et al. (2005), we used two independent yet complementary scores to evaluate alignment quality: the widely used sum-of-pairs score (SPS) implemented in BaliScore (Thompson et al., 1999) and the Structure Conservation Index (SCI; see Washietl et al., 2005). The SPS measures the level of sequence consistency between a test and a reference alignment by comparing all possible character pairs per column between both alignments; it ranges from 0 to 1 (complete agreement). The SCI is a measure for structural conservation and works independently of a reference alignment. It ranges from 0 (no detectable conservation) to values slightly above 1.0 (sequence agreement and structure conservation). For statistical analysis of results we used R (http://cran.r-project.org/).
2.3 Parameters of alignment programs
The parameter choices for STRAL are described in Section 4.1. PROALIGN v0.5a3 was allowed to use up to 256 MB of memory and a bandwidth of 400 nt (-Xmx256m -bwidth=400). PMMULTI v1.1 was used either in its slow and thorough variant or in string-like alignment mode (- -fast_progressive). Other programs (CLUSTALW v1.83, MARNA, STEMLOC v0.19b) were used with default options.
| 3 ALGORITHM |
|---|
|
|
|---|
The steps of the algorithm include
- a pairwise alignment of all sequences of a set of homologous ncRNAs,
- production of a guide tree using the alignment scores from step 1, and
- a progressive alignment of the sequences, guided by the tree.
3.1 Pairwise alignment
We use the RNAFOLD library (Hofacker et al., 1994; Hofacker, 2003) to compute the partition function and matrix of base-pairing probabilities Pij of base i with base j for a single sequence. This probability matrix is then condensed into three linear vectors (Bonhoeffer et al., 1993), holding for each base i the probabilities of being paired downstream
, paired upstream
or unpaired
, respectively. Thus we lose the specific pairing information but we can apply an alignment method that is cheap in terms of computing costs while still using thermodynamic information. Next, we choose a particular combination of these vectorsa structural part Sstruct = f(p1, p2) and a sequence part Sseq = f(p0)as a similarity score
![]() | (1) |
gives the ratio of structure over sequence similarity. The positive 4 x 4 single nucleotide substitution matrix d for aligning single-stranded regions is adapted either from Klein and Eddy (2003) (RIBOSUM85-60) or from Gotoh (1999) (see Section 4.1.2).
Let Vi,k be the value of the optimal alignment of prefixes A[1...i] and B[1...k] with base conditions V0,0 = 0, Vi,0 = Ei,0 = go i · ge, V0,k = F0,k = go k · ge and an affine gap weight model. That is, a single gap of length q is given by weight go + q · ge; go and ge denote gap-open and gap-extension values, respectively. If we do not charge end gaps, which is the default setting, set Vi, 0 = V0, k = 0. Then follow the dynamic programming recurrences (Gusfield, 1999; Gotoh, 1999) for aligning two sequences:
![]() | (2a) |
![]() | (2b) |
![]() | (2c) |
![]() | (2d) |
3.2 Guide tree
The m(m 1)/2 scores from the previous step are converted into a distance matrix
![]() |
3.3 Progressive alignment
During the progressive alignment, the sequences of the set S = S(1), ... , S(m) are aligned according to their position in the tree, starting with the two sequences with lowest distance, resulting in a group of aligned sequences. To this group, a further sequence or a group of sequences is added, which makes necessary modifications of equations (1) and (2), and the base conditions. We measure the SPS, so Vi,0 = Ei,0 = n · (go i · ge) and V0,k = F0,k = n · (go k · ge), with n being the number of all pairwise alignments of groups A and B. In the case of free end gaps, Vi,0 = V0,k = 0.
For (2c) the scoring function (1) has to be modified. Let C be the desired alignment of groups A and B (C = A
B); then
![]() |
![]() |
and gap sizes, respectively, into small numbers of classes (Gotoh, 1993). | 4 RESULTS |
|---|
|
|
|---|
4.1 Choice of parameters
For parameter optimization we used the RNA alignment benchmark datasets published by Gardner et al. (2005) (see also Section 2). The determination of correct parameters was quite difficult; for example, the ratio
of structure over sequence similarity and gap values go and ge depend on each other. So we examined a rather huge parameter space. Contrary to our expectations, however, modification of the final parameters by a factor of 2 leads to only marginally different alignments.
4.1.1 Scoring function
The scoring function (1) implies a large influence of sequence on the alignment only in predicted loop regions and not in predicted helical regions. A major improvement in alignment quality arose by skipping this restriction; that is, formally we set the probability of a base to be unpaired
. Consequently, sequence information is used for both structured and unstructured regions.
4.1.2 Substitution matrix d
The 4 x 4 single nucleotide substitution matrix given by Gotoh (1999) emphasizes identity substitutions [d(X, X) = 4] and allows for pyrimidine to pyrimidine and purine to purine substitutions only [d(Y, Y) = d(R, R) = 1]. The RIBOSUM85-60 matrix from Klein and Eddy (2003) overall contains higher values [3.51
d(X, X)
4.70, d(Y, Y) = 1.43, d(R, R) = 1.02]; only d(G, C) = 0. The former performed better for RNA sets of high similarity; the latter yielded a slightly better performance over the full similarity range.
4.1.3 Structure over sequence ratio 
Overall, scoring values were relatively constant in an
range from 3 to 11. A factor
= 7 produced the best results (see Table 1) when end gaps were free of costs. Nevertheless, scoring values improved when a higher
was applied to sequence sets containing fewer than five sequences and/or a sequence similarity lower than
50%. In contrast,
values lower than 2 did not lead to an improvement on high-similarity sets.
|
4.1.4 Gap values
To determine appropriate values for gap costs, we aligned the complete dataset-1 several times with different gap opening, gap extension and
values. The alignments were then scored by the product of SPS and SCI. These values were averaged over all 388 alignments for each parameter combination. An example with
= 7 is shown in Figure 1. Several parameter combinations are near optimal. We implemented a gap opening value of go = 8 and a gap extension value of ge = 0.5 as default, as this combination produced high-scoring alignments for other values of
, too (data not shown). Overall alignment quality, however, does not vary significantly upon 2-fold changes of either gap opening or gap extension values.
|
4.1.5 Guide tree
Alignments based on guide trees derived from UPGMA showed the highest accuracy. Trees produced by the other methods showed similar alignment accuracy, except for the trees derived from the midpoint method (data not shown).
4.2 Benchmark
To demonstrate the power of STRAL we compared its performance with that of CLUSTAL (Thompson et al., (1994a), PROALIGN (Löytynoja and Milinkovitch, 2003) (the best-performing programs according to the evaluation by Gardner et al., 2005), PMMULTI in string-like alignment mode (PMSTRING; see Hofacker et al., 2004, MARNA (Siebert and Backofen, 2005) and STEMLOC (Holmes, 2005). For the comparison we used the multiple RNA sequence set (dataset-1) from the above-mentioned benchmark. Aligning all 388 RNA sets (with five sequences each) from this data set took
106 s using STRAL. This time reduced to only
9 s when probability matrices were precomputed. CLUSTALW, PROALIGN, PMSTRING, MARNA, and STEMLOC needed
8 s,
15 min,
1 h,
5 h and nearly 2 days, respectively. Note that PMSTRING is only prototyped in Perl; for STEMLOC calculations a (faster) machine with more memory had to be used (see Section 2). MARNA was not able to align those 46 sequence sets containing ambiguity code, whereas STEMLOC failed to align 5 sequence sets for unknown reasons.
Program performance was measured using SPS and SCI and plotted as a function of the sequence similarity (see Fig. 2). The structure and sequence alignment program STEMLOC performed best, but at the cost of long time and high memory usage. With the exception of STEMLOC, STRAL clearly outperformed all other programs: alignments produced by STRAL not only showed higher structural conservation (Fig. 2B) than the structure alignment programs marna and PMSTRING but also achieved more conservation on the sequence level (see Fig. 2A) than the pure sequence alignment programs CLUSTAL and PROALIGN. The performance difference became drastic when sequence similarity dropped below 60%. Here, the performance of STRAL is clearly better as the alignment process is guided by structural features, too, whereas pure sequence alignment programs cannot handle these sets, which are only poorly conserved on the sequence level.
|
A rather ad hoc approach to measuring RNA alignment quality is to compare consensus structures predicted from calculated alignments. Figure 3 shows such consensus structures predicted by RNAALIFOLD (Hofacker et al., 2002) on the basis of different alignments of aphto- and cardiovirus IRES regions (see also Hofacker et al., 2004). CLUSTAL clearly failed to create a correct alignment which comprised enough conserved structure. The consensus structures predicted from alignments computed by STRAL and PMMULTI (slow variant) are almost identical. A more detailed comparison of the predicted structures is given in Supplementary Table S1.
|
To give a visual impression of the differences in alignment quality obtained using CLUSTALW and STRAL, alignments of 14 selenocysteine insertion sequences (SECIS; taken from Kryukov and Gladyshev, 2004) from methanogenic organisms are shown in Figure 4. In the CLUSTAL alignment (Fig. 4C) the thermodynamically predicted stemloop structures are not superimposed (top right triangle) and no statistically significant base pairs are predicted (lower left triangle); the sequence alignment has a length of 44 nt. In contrast, the alignment computed by means of STRAL (Fig. 4B) is close to the correct one (Fig. 4A), which was manually refined by means of CONSTRUCT (Lück et al., 1999): the thermodynamically predicted stemloop structures are perfectly superimposed (except for a single sequence), the highly conserved internal loop nucleotides are aligned and alignment length is only 41 nucleotides.
|
| 5 DISCUSSION |
|---|
|
|
|---|
We have implemented a multiple RNA alignment program named STRAL that combines structural and sequence information in a cheap dynamic programming approach. That is, when pairing vectors are precomputed, STRAL is nearly as fast as CLUSTALW. STRAL requires computational resources similar to other sequence alignment programs with
time and
memory cost, whereas true structure alignment programs such as DYNALIGN (Mathews, 2005), FOLDALIGN v. 2 (Havgaard et al., 2005a,b); PMCOMP (Hofacker et al., 2004) and STEMLOC (Holmes, 2005), have costs of at least
for pairwise alignment. Nevertheless, it has been shown that these structural alignment programs do not necessarily produce high-quality alignments (Gardner et al., 2005).
The parameters used in the algorithm of STRAL have been optimized using the benchmark data set BRAliBase (Gardner et al., 2005). Clearly, the inclusion of sequence and structure into the scoring function improves predictions in comparison with both pure sequence alignment and pure structure alignment (as, for example, in PMSTRING). With respect to all parameters, STRAL's performance is quite robust to modifications by a factor of
2 from the default parameters. It is likely, however, that the performance of other programs will also be improved by such a parameter optimization (e.g. cf. Katoh et al., 2005).
The use of a condensed vector representation of the pairing probabilities instead of the full pairing matrixas done in PMMULTI in pairwise modeallows for a very fast pairwise alignment computation during every alignment step. Yet a future improvement could be the use of such perfect pairwise alignments in combination with STRAL for the multiple alignment step(s). A further improvement of STRAL will be the inclusion of a recursive step in addition to the purely progressive approach (for review, see Gotoh, 1999).
Our approach offers a fast and reliable compromise between the computationally very demanding true structural alignment and pure sequence alignment.
| Acknowledgments |
|---|
Acknowledgement is made to Dr M. Schmitz for critical reading of the manuscript. We thank Drs I. Hofacker and P. Stadler for providing the IRES reference alignment. A.W. was supported by a grant from the German National Academic Foundation.
Conflict of Interest: none declared.
| FOOTNOTES |
|---|
Associate Editor: Alex Bateman
Received on February 10, 2006; revised on April 10, 2006; accepted on April 10, 2006
| REFERENCES |
|---|
|
|
|---|
Bonhoeffer, S., et al. (1993) RNA multi-structure landscapes. A study based on temperature dependent partition functions. Eur. Biophys. J, . 22, 1324[Web of Science][Medline].
Bruno, W., et al. (2000) Weighted neighbor joining: A likelihood-based approach to distance-based phylogeny reconstruction. Mol. Biol. Evol, . 17, 189197
Chiu, D. and Kolodziejczak, T. (1991) Inferring consensus structure from nucleic acid sequences. Comput. Appl. Biosci, . 7, 347352
Eddy, S. (2001) Noncoding RNA genes and the modern RNA world. Nat. Rev. Genet, . 2, 919929[CrossRef][Web of Science][Medline].
Eddy, S. (2005) SQUIDC function library for sequence analysis. http://selab.wustl.edu/cgi-bin/selab.pl?mode=software#squid.
Eddy, S.R. (2002) A memory-efficient dynamic programming algorithm for optimal alignment of a sequence to an RNA secondary structure. BMC Bioinformatics, 3, 18[CrossRef][Medline].
Fedor, M. and Williamson, J. (2005) The catalytic diversity of RNAs. Nat. Rev. Mol. Cell. Biol, . 6, 399412[CrossRef][Web of Science][Medline].
Felsenstein, J. (1989) PHYLIPPhylogeny Inference Package (Version 3.2). Cladistics, 5, 164166.
Felsenstein, J. (1997) An alternating least squares approach to inferring phylogenies from pairwise distances. Syst. Biol, . 46, 101111[CrossRef][Web of Science][Medline].
Fuellen, G. (1997) A gentle guide to multiple alignment. Complexity International, 4, http://www.csu.edu.au/ci/vol04/mulali/mulali.html.
Gardner, P.P., et al. (2005) A benchmark of multiple sequence alignment programs upon structural RNAs. Nucleic Acids Res, . 33, 24332439
Gascuel, O. (1997) BIONJ: an improved version of the NJ algorithm based on a simple model of sequence data. Mol. Biol. Evol, . 14, 685695[Abstract].
Gautheret, D., et al. (1995) Identification of base-triples in RNA using comparative sequence analysis. J. Mol. Biol, . 248, 2743[CrossRef][Web of Science][Medline].
Gotoh, O. (1993) Optimal alignment between groups of sequences and its application to multiple sequence alignment. Comput. Appl. Biosci, . 9, 361370
Gotoh, O. (1999) Multiple sequence alignment: algorithms and applications. Adv. Biophys, . 36, 159206[CrossRef][Web of Science][Medline].
Gottesmann, S. (2005) Micros for microbes: non-coding regulatory RNAs in bacteria. Trends Genet, . 21, 399404[CrossRef][Web of Science][Medline].
Gräf, S., et al. (2006) A computational approach to search for non-coding RNAs in large genomic data. In Nellen, W. and Hammann, C. (Eds.). Small RNAs: Analysis and Regulatory Functions, volume 17 of Nucleic Acids and Molecular Biology, Springer Verlag, pp. 5774.
Gribskov, M., et al. (1990) Profile analysis. Methods Enzymol, . 183, 146159[Web of Science][Medline].
Gusfield, D. Algorithms on Strings, Trees, and Sequences. Computer Science and Computational Biology, (1999) , Cambridge Cambridge University Press.
Havgaard, J.H., et al. (2005a) Pairwise local structural alignment of RNA sequences with sequence similarity less than 40%. Bioinformatics, 21, 18151824
Havgaard, J.H., et al. (2005b) The FOLDALIGN web server for pairwise structural RNA alignment and mutual motif search. Nucleic Acids Res, . 33, W650W653
Hofacker, I., et al. (1994) Fast folding and comparsion of RNA structures. Monatsh. Chem, . 125, 167188[CrossRef].
Hofacker, I., et al. (2002) Secondary structure prediction for aligned RNA sequences. J. Mol. Biol, . 319, 10591066[CrossRef][Web of Science][Medline].
Hofacker, I., et al. (2004) Alignment of RNA base pairing probability matrices. Bioinformatics, 20, 22222227
Hofacker, I.L. (2003) Vienna RNA secondary structure server. Nucleic Acids Res, . 31, 34293431
Holmes, I. (2005) Accelerated probabilistic inference of RNA structure evolution. BMC Bioinformatics, 6, 73[CrossRef][Medline].
Hudelot, C., et al. (2003) RNA-based phylogenetic methods: application to mammalian mitochondrial RNA sequences. Mol. Phylogenet. Evol, . 28, 241252[CrossRef][Web of Science][Medline].
Katoh, K., et al. (2005) MAFFT version 5: improvement in accuracy of multiple sequence alignment. Nucleic Acids Res, . 33, 511518
Klein, R. and Eddy, S. (2003) RSEARCH: finding homologs of single structured RNA sequences. BMC Bioinformatics, 4, 44[CrossRef][Medline].
Knudsen, B. and Hein, J. (2003) Pfold: RNA secondary structure prediction using stochastic context-free grammars. Nucleic Acids Res, . 31, 34233428
Kryukov, G.V. and Gladyshev, V.N. (2004) The prokaryotic selenoproteome. EMBO Rep, . 5, 538543[CrossRef][Web of Science][Medline].
Lescoute, A., et al. (2005) Recurrent structural RNA motifs, isostericity matrices and sequence alignments. Nucleic Acids Res, . 33, 23952409
Löytynoja, A. and Milinkovitch, M.C. (2003) A hidden Markov model for progressive multiple alignment. Bioinformatics, 19, 15051513
Lück, R., et al. (1999) ConStruct: a tool for thermodynamic controlled prediction of conserved secondary structure. Nucleic Acids Res, . 27, 42084217
Martin, L.C., et al. (2005) Using information theory to search for co-evolving residues in proteins. Bioinformatics, 21, 41164124
Mathews, D. and Turner, D. (2002) Dynalign: an algorithm for finding the secondary structure common to two RNA sequences. J. Mol. Biol, . 317, 191203[CrossRef][Web of Science][Medline].
Mathews, D.H. (2005) Predicting a set of minimal free energy RNA secondary structures common to two sequences. Bioinformatics, 21, 22462253
Notredame, C., et al. (2000) T-Coffee: a novel method for fast and accurate multiple sequence alignment. J. Mol. Biol, . 302, 205217[CrossRef][Web of Science][Medline].
Rivas, E. and Eddy, S. (1999) A dynamic programming algorithm for RNA structure prediction including pseudoknots. J. Mol. Biol, . 285, 20532068[CrossRef][Web of Science][Medline].
Saitou, N. and Nei, M. (1987) The neighbor-joining method: a new method for reconstructing phylogenetic trees. Mol. Biol. Evol, . 4, 406425[Abstract].
Sankoff, D. (1985) Simultaneous solution of the RNA folding, alignment and protosequence problems. SIAM J. Appl. Math, . 45, 810825[CrossRef].
Siebert, S. and Backofen, R. (2005) MARNA: multiple alignment and consensus structure prediction of RNAs based on sequence structure comparisons. Bioinformatics, 21, 33523359
Smith, T. and Waterman, M. (1981) Identification of common molecular subsequences. J. Mol. Biol, . 147, 195197[CrossRef][Web of Science][Medline].
Sokal, R. and Michener, C. (1958) A statistical method for evaluating systematic relationships. The University of Kansas Scientific Bulletin, 38, 14091438.
Storz, G. (2002) An expanding universe of noncoding RNAs. Science, 296, 12601263
Thompson, J., et al. (1994a) CLUSTAL W: improving the sensitivity of progressive multiple sequence alignment through sequence weighting, position-specific gap penalties and weight matrix choice. Nucleic Acids Res, . 22, 46734680
Thompson, J., et al. (1994b) Improved sensitivity of profile searches through the use of sequence weights and gap excision. Comput. Appl. Biosci, . 10, 1929
Thompson, J., et al. (1999) A comprehensive comparison of multiple sequence alignment programs. Nucleic Acids Res, . 27, 26822690
Washietl, S., et al. (2005) Fast and reliable prediction of noncoding RNAs. Proc. Natl Acad. Sci. U.S.A, . 102, 24542459
Winkler, W. and Breaker, R. (2005) Regulation of bacterial gene expression by riboswitches. Ann. Rev. Microbiol, . 59, 487517[CrossRef][Web of Science][Medline].
Yang, Q. and Blanchette, M. (2004) STRUCTMINER: a tool for alignment and detection of conserved secondary structure. Genome Inform Ser Workshop Genome Inform, . 15, 102111[Medline].
Zuker, M. (2000) Calculating nucleic acid secondary structure. Curr. Opin. Struct. Biol, . 10, 303310[CrossRef][Web of Science][Medline].
Zuker, M. (2003) Mfold web server for nucleic acid folding and hybridization prediction. Nucleic Acids Res, . 31, 34063415
This article has been cited by other articles:
![]() |
R. R. Stocsits, H. Letsch, J. Hertel, B. Misof, and P. F. Stadler Accurate and efficient reconstruction of deep phylogenies from structured RNAs Nucleic Acids Res., October 1, 2009; 37(18): 6184 - 6193. [Abstract] [Full Text] [PDF] |
||||
![]() |
Y. Tabei and K. Asai A local multiple alignment method for detection of non-coding RNA sequences Bioinformatics, June 15, 2009; 25(12): 1498 - 1505. [Abstract] [Full Text] [PDF] |
||||
![]() |
K. Morita, Y. Saito, K. Sato, K. Oka, K. Hotta, and Y. Sakakibara Genome-wide searching with base-pairing kernel functions for noncoding RNAs: computational and expression analysis of snoRNA families in Caenorhabditis elegans Nucleic Acids Res., February 1, 2009; 37(3): 999 - 1009. [Abstract] [Full Text] [PDF] |
||||
![]() |
K. Katoh and H. Toh Recent developments in the MAFFT multiple sequence alignment program Brief Bioinform, July 1, 2008; 9(4): 286 - 298. [Abstract] [Full Text] [PDF] |
||||
![]() |
C. B. Do, C.-S. Foo, and S. Batzoglou A max-margin model for efficient simultaneous alignment and folding of RNA sequences Bioinformatics, July 1, 2008; 24(13): i68 - i76. [Abstract] [Full Text] [PDF] |
||||
![]() |
S. Moretti, A. Wilm, D. G. Higgins, I. Xenarios, and C. Notredame R-Coffee: a web server for accurately aligning noncoding RNA sequences Nucleic Acids Res., July 1, 2008; 36(suppl_2): W10 - W13. [Abstract] [Full Text] [PDF] |
||||
![]() |
A. Wilm, D. G. Higgins, and C. Notredame R-Coffee: a method for multiple alignment of non-coding RNA Nucleic Acids Res., May 1, 2008; 36(9): e52 - e52. [Abstract] [Full Text] [PDF] |
||||
![]() |
F. Ferre, Y. Ponty, W. A. Lorenz, and P. Clote DIAL: a web server for the pairwise alignment of two RNA three-dimensional structures using nucleotide, dihedral angle and base-pairing similarities Nucleic Acids Res., July 13, 2007; 35(suppl_2): W659 - W668. [Abstract] [Full Text] [PDF] |
||||
![]() |
E. Torarinsson, J. H. Havgaard, and J. Gorodkin Multiple structural alignment and clustering of RNA sequences Bioinformatics, April 15, 2007; 23(8): 926 - 932. [Abstract] [Full Text] [PDF] |
||||
| ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||














