Bioinformatics Advance Access originally published online on May 11, 2006
Bioinformatics 2006 22(14):1723-1729; doi:10.1093/bioinformatics/btl177
| ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
SCARNA: fast and accurate structural alignment of RNA sequences by matching fixed-length stem fragments
1 Department of Computational Biology, Graduate School of Frontier Science, University of Tokyo CB04 Kiban-tou 5-1-5 Kashiwanoha, Kashiwa, Chiba, 277-8561, Japan
2 Computational Biology Research Center, National Institute of Advanced Industrial Science and Technology (AIST) 2-42 Aomi, Koto-ku, Tokyo, 135-0064, Japan
*To whom correspondence should be addressed.
| ABSTRACT |
|---|
|
|
|---|
Motivation: The functions of non-coding RNAs are strongly related to their secondary structures, but it is known that a secondary structure prediction of a single sequence is not reliable. Therefore, we have to collect similar RNA sequences with a common secondary structure for the analyses of a new non-coding RNA without knowing the exact secondary structure itself. Therefore, the sequence comparison in searching similar RNAs should consider not only their sequence similarities but also their potential secondary structures. Sankoff's algorithm predicts the common secondary structures of the sequences, but it is computationally too expensive to apply to large-scale analyses. Because we often want to compare a large number of cDNA sequences or to search similar RNAs in the whole genome sequences, much faster algorithms are required.
Results: We propose a new method of comparing RNA sequences based on the structural alignments of the fixed-length fragments of the stem candidates. The implemented software, SCARNA (Stem Candidate Aligner for RNAs), is fast enough to apply to the long sequences in the large-scale analyses. The accuracy of the alignments is better or comparable with the much slower existing algorithms.
Availability: The web server of SCARNA with graphical structural alignment viewer is available at http://www.scarna.org/
Contact: scarna{at}m.aist.go.jp
Supplementary information: The data and the supplementary information are available at http://www.ncrna.org/papers/SCARNA/.
| 1. INTRODUCTION |
|---|
|
|
|---|
One of the important foundation of biological sequence analyses is comparing the sequences by the alignment with similarity scores. For the analyses of non-coding RNAs, we want to compare the nucleotide sequences for many purposes, such as finding the relatives of a new non-coding RNA in the genome, classification of the cDNA sequences and so on. The standard sequence comparison methods are not accurate enough for RNA sequences, however, because the secondary structures play important role in the functions and the evolutions of non-coding RNAs (Eddy, 2001). The alignments and the similarity scores of RNA sequences should consider both the primary sequences and the secondary structures.
Therefore, it is natural to try to find the common secondary structures in order to align two RNA sequences. Secondary structure prediction for a single sequence of length n without considering pseudoknots requires O(n2) in memory and O(n3) in time for computation (Nussinov et al., 1978; Zucker and Steigler, 1981). The structural RNA alignment is computationally so expensive even if the pseudoknots are ignored. The Sankoff's algorithm (Sankoff, 1985), which simultaneously allows the solution of the structure prediction and alignment problem, requires O(n4) in memory and O(n6) in time for a pair of sequences of length n. Such an algorithm is applicable only for short RNAs and not for all of the functional RNA sequences. By restricting the distances of the base pairs in the primary sequences it can be reduced to O(n4) in time (Havgaard et al., 2005; Hofacker et al., 1996) but it is still impractical for long sequences. In order to compare the RNA sequences without aligning them, a kernel method on Stochastic Context Free Grammar (SCFG) is proposed (Kin et al., 2002).
Another way of finding the common secondary structures is to use stem-based representations, where the structure of an RNA sequence is represented by a number of sets of continuous base pairs. For constructing a stem-based representation, one approach is to use the predicted secondary structures (Karklin et al., 2005). The predictions are not always accurate, however, that approach has a high risk of using totally wrong secondary structures. A more robust approach is to use a number of stem candidates derived by the simple WatsonCrick base pair rule or by scanning the base pair probability matrix (McCaskill, 1990). Those representations may contain many false stems, which have to be excluded in the end. Selecting the correspondence of the potential stems in two RNA sequences is also a combinatorial problem. Dynamic programming (DP) can solve the problem, but it requires time complexity of O(m6) for the number of potential stems m even if pseudoknots are not considered.
Perriquet et al. (2003) proposed a fast heuristic stem-based algorithm implemented in CARNAC. It first determines anchor regions that are highly conserved in given RNA sequences and then seeks a set of the other stems that have minimum folding energy. Bafna, et al. (2005) also proposed a stem-based method, which is similar to Sankoff's algorithm, implemented in RNAscf. It differs also in the way of finding structurally conserved anchors, which is based on secondary structure similarities. Ji et al. (2004) proposed a graph theoretic approach which finds conserved stems of multiple RNA sequences, implemented in comRNA.
In this article, we propose an efficient pairwise alignment method based on fixed-length stem fragments. The fragments are made by dividing the stem candidates in a number of overlapping fixed-length windows (Fig. 1). In order to align the stem fragments strictly, we need to adopt a computationally expensive algorithm. Instead, we decouple 5' and 3' parts of the stem fragments and use a pairwise alignment algorithm. The 5' parts (left components) and the 3' parts (right components) of the stem fragments are fixed-length subsequences of the stems that are complementary to each other. Because such a simple pairwise alignment does not guarantee the consistency of matches in both side (left and right), we get a certain number of mismatched components. In our approach, the common secondary structure is made only from the matches that include both left and right components. After the matched stem fragments are fixed, a pairwise alignment algorithm with affine gaps is used to make complete nucleotide alignment of two sequences.
|
To make our algorithm work on practical data, it is important to ensure the discovery of true stem fragments that belong to the common secondary structure. For high specificity in component matching, we designed the matching score of components based on various properties, e.g. sequence similarity, stacking energy and the distance to the complementary component. Unlike the other stem-based representations, the fixed-length fragments enable efficient computations of the matching score. The computation of matching scores of two variable-length sequences requires another alignment including gaps, which leads to an inefficient algorithm. One may think that it is difficult to take the stacking energy into account by fixed-length representations of the stems. We devised an engineered DP algorithm that includes the stacking energy in the score function.
In benchmarking experiments, we will show that our alignment accuracy is comparable with state-of-the-art methods, and the computational time is shorter by orders of magnitude.
| 2 METHOD |
|---|
|
|
|---|
SCARNA takes two unaligned nucleotide sequences as the inputs and produces the alignment of the sequences based on the predicted common secondary structure. For efficiency, the stem fragments are first aligned, and the nucleotide-level alignment is made by a post-processing. In the following sections, our algorithm is explained step-by-step.
2.1 Extracting stem candidates
We start by representing the potential secondary structure of each RNA sequence by a set of overlapping stem fragments. To this aim, the base pair probability matrix, is computed by means of McCaskill's (1990) algorithm. When the sequence has n bases, that matrix has n x n values, each of which represents the probability that the two bases form a base pair as a part of the whole secondary structure. When k is the fixed length of stem fragments (typically 25), the matrix is scanned by a counter-diagonal window of length k. If all the values in the window is larger than the threshold
, that window is chosen as a part of a stem candidate, which is a stem fragment.
2.2 Properties of stem components
Each fixed-length stem fragment is decomposed into two stem components, 5' (left) component and 3' (right) component, both of which have the same fixed length (Fig. 2). A stem component Xa has the following properties.
- Position p(Xa): the position of the leftmost base of the stem component Xa.
- Sequence s(Xa): the sequence of the stem component Xa.
- Loop distance d(Xa): the distance to the complementary stem component of the same stem fragment along the nucleotide sequence. 5' (left) stem components have positive distances and 3' (right) stem components have negative distances.
- Partner sequence c(Xa): the sequence of the complementary stem component.
- Confidence score f(Xa): the sum of the base-pair probabilities in the fragment.
- Stacking energy e(Xa): the sum of stacking energy in the fragment.
|
In order to perform pairwise alignment, the stem components have to be ordered as a sequence. A stem component sequence (SCS) is a sequence of all stem components sorted by their positions in the nucleotide sequence. Multiple stem components can take exactly the same position and their complementary components have different positions. Such components are sorted according to the distance to the complementary component (loop distance).
2.3 Matching score of stem components
The matching score is used as the similarity measure of the stem components in the alignment. Because our goal is to capture both the structure similarities and the sequence similarities, the similarities of the corresponding base pairs and the differences of the two loop distances are combined. Because we want to align the stem candidates that have higher scores by means of free energy, the confidence scores and the stacking energy are also considered.
Let us describe the two SCSs to be aligned as
and
, where Xi and Yj describe i-th and j-th stem components in the two nucleotide sequences, respectively. We denote by
that a left component Xa and a right component
form a stem fragment. Also, (Xi, Yj) denotes a matched pair across the sequences in the alignment of SCSs.
Because the stem components has a same fixed length, the sequence similarity is calculated in linear time by substitution probabilities using RIBOSUM (Klein and Eddy, 2003). Denote by R(Xi, Yi) the sum of RIBOSUM scores.
If we take all of those scores into account, the matching score s(i, j) of two corresponding stem components (Xi, Yj) can be written as
![]() | (1) |
Because the confidence scores and the stacking energy are mutually correlated, and because we have to control the importance of the terms, the parameters
1,
2 and
3 are used. The term of
3 encourages the stems with similar loop distances to match.
2.4 Consistency in alignment of stem components
Before explaining the DP algorithm for the alignment of stem components, we discuss on the consistency conditions for the stem components in the alignments. We discuss first on the stem components of a single nucleotide sequence, and then on each match of the stem components of two nucleotide sequences.
2.4.1 Consistency in a single SCS
The major difference of the alignment of stem components from a pairwise sequence alignment is that only small number of the stem components are selected to be included in the alignment. For each nucleotide sequence, a large number of combinations of stem components mutually contradict and should not be included in the same alignment.
If two stem fragments do not overlap in the nucleotide sequence, there are three types of positions. If the two stem fragments
do not overlap each other, they are
- parallel if p(Xa) < p
< p(Xb) < p
or p(Xb) < p
< p(Xa) < p
.
- nested if p(Xa) < p(Xb) < p
< p
or p(Xb) < p(Xa) < p
< p
.
- pseudoknotted if p(Xa) < p(Xb) < p
< p
or p(Xb) < p(Xa) < p
< p
.
If two stem fragments overlap in the nucleotide sequence, we can also find the three cases. Two stem fragments
and
are
- r-continuous if Xa overlaps Xb,
overlaps
and satisfy
If two overlapping stem fragments appear in a same alignment, they should be a part of a longer stem and r-continuous (Fig. 3). A pair of stacked fragments are 1-continuous fragments.
(2)
- ill-continuous if Xa overlaps Xb and
overlaps
, but (2) does not hold.
- contradictory if only one side (left or right) of the components overlap each other but the other side does not overlap (Fig. 4).
|
|
If two of the stem fragments of a nucleotide sequence appear in the alignment, they should not overlap at all or be r-continuous. The overlapping stem components are carefully treated in the SCS alignment that the left and right components of stem fragments satisfy r-continuous condition as explained later. Therefore, ill-continuous stem fragments never appear in our alignments. The non-overlapping stem components, however, may have overlapping complementary stem components because the left and right components of the stem fragments are separately aligned. Therefore, contradictory overlaps do happen in our alignments.
2.4.2 Consistency of matches of stem components
Next we discuss on the matches of the stem components from two nucleotide sequences. For stem fragments
and
from two SCSs
and
, if Xa matches Yb in the alignment,
should also match
. Let us define such a match as a leftright consistent match. Because the left component and the right component of each stem fragment are aligned separately, leftright consistency is not guaranteed in general. However, leftright consistent matches occur frequently because the matching scores encourage the matches of stem components that have similar loop distances.
2.4.3 Removing inconsistent matches
The leftright inconsistent matches are removed after the SCS alignment (Fig. 5). If any two of the stem components of a same SCS appear in the SCS alignment and their complementary components overlap (i.e. contradictory overlap), those complementary components do not appear together in the alignment because the alignment of complementary components are controlled to be either non-overlapping or r-continuous. Therefore, the contradictory overlaps of the stem fragments are removed just by removing the leftright inconsistent matches of the components (Fig. 6).
|
|
2.5 Alignment algorithm for stem components
The alignment of SCSs is computed by two DP matrix, M(i, j) and G(i, j). M(i, j) is the best score up to a pair of Xi and Yj given that Xi matches Yj and G(i, j) is the best score given that Xi mismatches Yj.
The updates to derive M(i, j) and G(i, j) are described as
![]() | (3) |
![]() | (4) |
.
i is the index (smaller than i) of the component which is 1-continuous to Xi, ßj is that of Yj. pi is the index (smaller than i) of the nearest component which does not overlap with Xi, qj is that of Yj.
4 and
5 are control parameters.
In a simple DP for pairwise alignment, DP matrices depends on the adjacent elements. In our algorithm, however, M(i, j) is derived from the remote elements denoted as M(
i, ßj), M(pi, qj) and G(pi, qj). That ensures the adjacent matches of stem components in DP being either 1-continuous or non-overlapping (Fig. 7).
|
The updates (3) take the maximum of three arguments. The first argument treats the case for continuous stems longer than the fixed length of stem fragments and ensures that the adjacent matches are 1-continuous. The indices
i and ßj are determined such that
and
are 1-continuous fragments of
and
, respectively. Since
i and ßj do not always exist, the first argument in (3) is taken into account only if both
i and ßj are available. Only the incremental parts of sequence similarities, the confidence scores and the stacking energy should be included in those continuous matches because 1-continuous matches share base pairs except one. The symbols,
R(Xi),
f(Xi) and
e(Xi), correspond to the incremental differences for Xi-X
i of RIBOSUM scores, confidence scores and stacking energy respectively (See Fig. 1 in Supplement Material for details). The second and third arguments in (3) also take larger steps than a simple DP and ensure that the adjacent stem components have no overlap. Xpi and Yqj are the closest components in SCSs that do not overlap with Xi and Yj, respectively.
Because
i and pi for each i, ßj and qj for each j, and the corresponding
R,
f and
e are calculated before the DP process in linear time, those calculations do not give any damage on time complexity of the algorithm.
2.6 Post-processing and nucleotide alignment
The inconsistency in the matches of stem components are removed as the post-process. In order to guarantee the consistency, it is sufficient to remove the leftright inconsistent matches of stem components as previously explained in Section 2.4.3. In other words, the matches of stem components whose complementary components do not match in the SCS alignment are removed as the post-process.
The nucleotide sequence alignment is intended to align remaining loop regions except the selected common stems represented by the consistent matches of stem components. It is simply implemented as a pairwise alignment with affine gaps of whole nucleotide sequences by adding a large value in DP matrix to the positions for the base pairs indicated by the SCS alignment. Those values are so large that the nucleotide alignment is forced to go through the positions and subtracted afterwards to get the score of the alignment.
| 3 RESULTS |
|---|
|
|
|---|
In this section we show the performance of SCARNA for the alignment of RNA sequences by computational experiments on the benchmark dataset of tRNAs used by Gardner et al. (2005) and on the dataset from various families of non-coding RNA sequences in Rfam database (Griffiths-Jones, 2003). It has been observed that SCARNA has a competitive performance to the other RNA structural alignment approaches using Sankoff's Algorithm (Sankoff, 1985).
We have evaluated the quality of the alignments by the sum-of-pairs score (SPS) and the structure conservation index (SCI) (Gardner et al., 2005). The SPS is defined as the fraction out of all possible nucleotide pairs that are aligned both in the predicted alignment and in the alignment of the reference. The SPS provides a measure of the sensitivity of the prediction.
The SCI provides a measure of the conserved secondary structure information contained within the alignment (Washietl et al., 2004). It is a derivative of the score calculated by the RNAalifold consensus folding algorithm (Hofacker et al., 2002; Washietl and Hofacker, 2004) which is based upon the sum of the thermodynamic term and the covariance term. In contrast to the SPS, SCI is independent from a reference alignment. The SCI is close to zero if RNAalifold identifies no common RNA structure in the alignment, whereas a set of perfectly conserved structures has an SCI
1. The SCI points out the structural aspect of alignment accuracy and, therefore, a useful measure in addition to the SPS.
All the following tests were performed on a Linux machine with a AMD OpteronTM Processor 850 2.4 GHz x 4 and 20 GB RAM. The length of stem fragments k was set to 2. The threshold of base pair probability
was set to 0.0001. The control parameters,
1,
2,
3,
4 and
5 were set to 3.7, 0.1, 3.1, 9.4 and 8.6, respectively. These parameters were used to all RNA familywise dataset. The command line options of other tools is listed on table 1 in the Supplement Matrial.
3.1 Benchmark dataset of tRNAs
Gardner's benchmark datasets (Gardner et al., 2005) are composed of pairs of tRNA sequences that are classified by sequence identities. Though all the structural alignment programs are not able to align RNA sequences of more than 150 bases without any device, they can align those short tRNA sequences of 71.8 nt in average. The sequences and the reference alignments for calculating the SPS were obtained from the Rfam database.
The experimental results are shown in Fig. 8. The SPS and the SCI of SCARNA exceed those of sequence-based methods [e.g. ClustalW (Chenna et al., 2003), MUSCLE (Edgar, 2004), PCMA (Pei et al., 2003), POA (gp) (Lee et al. 2002), ProAlign (Loytynoja and Sharlow, 2003) and Prrn (Gotoh, 1996) and are comparable with those of structure-based methods [e.g. Dynalign (Mathews and Turner, 2002; Mathews, 2005), Foldalign2.0 (Hargaard et al., 2005), PMcomp (Hofacker et al., 2004) and Stemloc (Holmes and Rubin, 2002; Holmes, 2004, 2005). While the sequence-based methods and structure-based ones have a dramatic divergence in relative performances below about
60% sequence identity, the SPS and the SCI of SCARNA do not come down. In particular, the SPSs of SCARNA outperform most of the structure-based methods in <50% sequence identity.
|
3.2 Benchmark dataset of other non-coding RNAs
In order to evaluate our algorithm by longer non-coding RNAs, we made a benchmark dataset from 5S ribosomal RNA, 5.8S ribosomal RNA and Hammerhead ribozyme in the Rfam (Griffiths-Jones et al., 2003). The collection of reliable sequences is a difficult task. We only used the seed alignments of Rfam because the full alignment includes computationally collected sequences. It is observed that even the seed alignments includes questionable ones. We tried to filter out unreliable sequences whose alignments include inconsistend asignment of base pairs on gaps or very long gaps. We compared SCARNA with ClustalW and Foldalign2.0. The result is that the SPS and SCI of SCARNA can be compared with those of Foldalign2.0, favorably. (See Supplement Material, Figs 24 for details.)
| 4 DISCUSSION |
|---|
|
|
|---|
4.1 Computational Complexity
While the complexities of O(n3) in time and O(n2) in memory for secondary structure prediction of RNA sequences of length n are affordable in most cases, O(n6) and O(n4) in Sankoff's algorithm for structural alignment can hardly be accepted. The comparison of execution time of SCARNA with other methods (Fig. 9) shows its applicability to real sequences. SCARNA requires O(m2) in time and O(m2) in memory for the alignment of SCSs of length m. The length of the SCS for an RNA sequence depends on the length of the RNA sequence, the threshold
for base pair probabilities, and the fixed length k of stem fragments. The lengths of the SCSs are linear to the lengths of the RNA sequences for the reasonable k-s (See Supplement Material for detail, Fig. 5). Therefore, the computational complexities of SCS alignment are evaluated as O(n2) in time and O(n2) in memory. The computation of base pair probabilities for SCS building requires O(n3) in time and O(n2) in memory (See Supplement Material, Fig. 6). For very long nucleotide sequences, however, it can be reduced to O(n2) and O(n2) by restricting the distance of base pairs to a fixed length. Therefore, SCARNA can be used for long sequences in large-scale analyses enjoying O(n2) of computational time.
|
4.2 Secondary structure prediction
The accuracy of the nucleotide alignments by SCARNA depends on the predicted common secondary structures. The performance of the alignment by SCARNA suggests high accuracy of the predicted secondary structures. In order to evaluate the accuracy of the secondary structure predictions for the individual sequences directly, a post-processing of recovering the high-scoring base pairs that are consistent with the predicted common secondary structures has been tested.
We made datasets from RNA sequences in Rfam database by the fraction of base pairs. The datasets are filled with following conditions. (1) The percentage of base pairs are 2550%. (2)The sequences are between 80 and 150 nt in length. (3)Their reference alignments have >5% and <50% gaps. Each dataset is classified into several subgroups of RNA sequences by their sequence identities.
After the alignment of SCSs and removing the inconsistent stem components, the base pairs with base pair probabilities >0.95 have been recovered for the prediction of individual RNA sequences. Sn (Sensitivity), Sp (Precision) and MCC (Matthews Correlation Coefficient) (Mathews, 1975) have been used for the performance measures of secondary structure prediction. Sn is sometimes better recognized as recall. Sp is also known as positive predictive value (PPV). They are defined as
![]() | (4) |
![]() | (4) |
|
4.3 Ability to capture pseudoknots
The major drawback of the DP algorithm for SCS alignment in SCARNA is that the left-right consistency is not guaranteed. The lack of the consistency, however, becomes a merit for capturing pseudoknotted structures. SCARNA often finds pseudoknotted structures without paying any additional computational costs because the algorithm does not forbid two stem fragments having pseudoknotted positions (see Supplement Material, Figure 8 and 9 for example of pseudoknot prediction), althogh McCaskill's algorithm does not consider the pseudoknots in the calculation of the base-pair probabilities and the probabilities for pseudoknotted base pairs may be underestimated. The DP algorithm for SCS alignment can be improved to be left-right consistent by using pair stochastic context free grammars (PSCFGs) and by paying expensive computational costs, but only for pseudoknot-free structures.
4.4 Local alignment
Since our global alignment algorithm (see Section 2.5) is extendable to local alienments, we are working on adding local alignment capability to SCARNA, which allows to search non-coding RNAs from genomic sequences based on the secondary structures as well as the sequence similarities.
| 5 CONCLUSION |
|---|
|
|
|---|
We have proposed a new method for fast and accurate alignments of RNA sequences based on the potential common secondary structures. The method uses the fixed-length stem fragments as the representation of the secondary structures. The 3' components and the 5' components of the stem fragments are separately aligned by an engineered DP and the inconsistent matches are removed as the post-process. The base pair probabilities, substitution probabilities as the base pairs, stacking energy are considered in the alignments. The method has been implemented as SCARNA, whose accuracies of the alignments have been shown to be much better than sequence-based methods and compatible to the computationally expensive structure-based methods. The high accuracies of SCARNA in the detections of the individual secondary structures also supports the performance. SCARNA is fast enough to align the sequences with more than 1000 nt in length, which most of the structure-based methods are unable to handle. The computational complexity of the algorithm is O(n3) in time and O(n2) in memory for the length of sequence n. The time complexity can be reduced to O(n2) for long sequences by restricting the distance of the bases in the base pairs. Pseudoknotted structures are also found without paying extra computational costs.
| Acknowledgments |
|---|
The authors thank Hisanori Kiryu, Michiaki Hamada, Tsuyoshi Kato and the other members of bioinformatics group for RNAs at National Institute of Advanced Industrial Science and Technology (AIST) and at Japan Biological Information Consortium (JBIC) for useful discussions. This work is partially supported by Grant-in-Aid for Scientific Research on Priority Area Comparative Genomics of Ministry of Education, Culture, Sports, Science and Technology, by Functional RNA Project of Ministry of Economy, Trade and Industry, and by the grant from Institute for Bioinformatics Research Development.
Conflict of Interest: none declared.
| FOOTNOTES |
|---|
Associate Editor: Dmitrij Frishman
Received on August 26, 2005; revised on April 3, 2006; accepted on May 2, 2006
| REFERENCES |
|---|
|
|
|---|
Bafna, V., et al. (2005) Consensus folding of unaligned RNA sequences revisited. RECOMB, 172187.
Chenna, R., et al. (2003) Multiple sequence alignment with the clustal series of programs. Nucleic Acids Res, . 31, 34973500
Eddy, S.R. (2001) Non-coding RNA genes and the modern RNA world. Nat. Genet, . 2, 919929[CrossRef][ISI][Medline].
Edgar, R.C. (2004) MUSCLE: multiple sequence alignment with high accuracy and high throughput. Nucleic Acids Res, . 32, 17921797
Gardner, P.P. and Giegerich, R. (2004) A comprehensive comparison of comparative RNA structure prediction approaches. BMC Bioinformatics, 5, 140[CrossRef][Medline].
Gardner, P., et al. (2005) A benchmark of multiple sequence alignment programs upon structural RNAs. Nucleic Acids Res, . 33, 24332439
Gotoh, O. (1996) Significant improvement in accuracy of multiple protein sequence alignments by iterative refinement as assessed by reference to structural alignments. J. Mol. Biol, . 264, 823838[CrossRef][ISI][Medline].
Griffiths-Jones, S., et al. (2003) Rfam: an RNA family database. Nucleic Acids Res, . 31, 439441
Havgaard, J.H., et al. (2005) Pairwise local structural alignment of RNA sequences with sequence similarity less than 40%. Bioinformatics, 21, 18151824
Hofacker, I.L., et al. (1994) Fast folding and comparison of RNA secondary structures. Monatsh. Chemie, 125, 167188[CrossRef].
Hofacker, I.L., et al. (2004) Alignment of RNA base pairing probability matrices. Bioinformatics, 20, 22222227
Hofacker, I., et al. (2004) Secondary structure prediction for aligned RNA sequences. J. Mol. Biol, . 319, 10591066.
Holmes, I. and Rubin, G.M. (2002) Pairwise RNA structure comparison with stochastic context-free grammars. Pac. Symp. Biocomput, . 163174.
Holmes, I. (2004) A probabilistic model for the evolution of RNA structure. BMC Bioinformatics, 5, 166[CrossRef][Medline].
Holmes, I. (2005) Accelerated probabilistic inference of RNA structure evolution. BMC Bioinformatics, 6, .
Ji, Y., et al. (2004) A graph theoretical approach for predicting common RNA secondary structure motifs including pseudoknots in unaligned sequences. Bioinformatics, 20, 15911602
Karklin, Y., et al. (2005) Classification of non-coding RNA using graph representations of secondary structure. Pac. Symp. Biocomput, . 415.
Kin, T., et al. (2002) Marginalized kernels for rna sequence data analysis. Genome Informatics, . 13, 112122.
Klein, R.J. and Eddy, S.R. (2003) RSEARCH: finding homologs of single structured RNA sequences. BMC Bioinformatics, 4, 44[CrossRef][Medline].
Lee, C., et al. (2002) Multiple sequence alignment using partial order graphs. Bioinformatics, 18, 452464
Loytynoja, A. and Sharlow, M.C. (2003) A hidden Markov model for progressive multiple alignment. Bioinformatics, 19, 15051513
Mathews, D.H. and Turner, D.H. (2002) Dynalign: an algorithm for finding the secondary structure common to two RNA sequences. J. Mol. Biol, . 317, 191203[CrossRef][ISI][Medline].
Mathews, D. (2005) Predicting a set of minimal free energy RNA secondary structures common to two sequences. Bioinformatics, 21, 22462253
Matthews, B.W. (1975) Comparison of the predicted and observed secondary structure of T4 phage lysozyme. Biochem. Biophys. Acta, 405, 442451[Medline].
McCaskill, J.S. (1990) The equilibrium partition function and base pair binding probabilities for RNA secondary structure. Biopolymers, 29, 11051119[CrossRef][ISI][Medline].
Nussinov, R., et al. (1978) Algorithms for loop matchings. SIAM J. App. Math, . 35, 6882[CrossRef].
Pei, J., et al. (2003) PCMA: fast and accurate multiple sequence alignment based on profile consistency. Bioinformatics, 19, 427428
Perriquet, O., et al. (2003) Finding the common structures shared by two homologous RNAs. Bioinformatics, 19, 108116
Sankoff, D. (1985) Simultaneous solution of the RNA folding, alignment, and proto-sequence problems. SIAM J. App. Math, . 45, 810825[CrossRef].
Washietl, S., et al. (2005) Fast and reliable prediction of noncoding RNAs. Proc. Natl Acad. Sci. USA, 102, 24542459
Washietl, S. and Hofacker, I. (2004) Consensus folding of aligned sequences as a new measure for the detection of functional RNAs by comparative genomics. J. Mol. Biol, . 342, 1930[CrossRef][ISI][Medline].
Zuker, M. and Stiegler, P. (1981) Optimal computer folding of large RNA sequences using thermodynamics and auxiliary information. Nucleic Acids Res, . 9, 133148
This article has been cited by other articles:
![]() |
K. Katoh and H. Toh Recent developments in the MAFFT multiple sequence alignment program Brief Bioinform, July 1, 2008; 9(4): 286 - 298. [Abstract] [Full Text] [PDF] |
||||
![]() |
C. B. Do, C.-S. Foo, and S. Batzoglou A max-margin model for efficient simultaneous alignment and folding of RNA sequences Bioinformatics, July 1, 2008; 24(13): i68 - i76. [Abstract] [Full Text] [PDF] |
||||
![]() |
K. Asai, H. Kiryu, M. Hamada, Y. Tabei, K. Sato, H. Matsui, Y. Sakakibara, G. Terai, and T. Mituyama Software.ncrna.org: web servers for analyses of RNA sequences Nucleic Acids Res., July 1, 2008; 36(suppl_2): W75 - W78. [Abstract] [Full Text] [PDF] |
||||
![]() |
H. Kiryu, T. Kin, and K. Asai Rfold: an exact algorithm for computing local base pairing probabilities Bioinformatics, February 1, 2008; 24(3): 367 - 373. [Abstract] [Full Text] [PDF] |
||||
![]() |
H. Kiryu, Y. Tabei, T. Kin, and K. Asai Murlet: a practical multiple alignment tool for structural RNA sequences Bioinformatics, July 1, 2007; 23(13): 1588 - 1598. [Abstract] [Full Text] [PDF] |
||||
![]() |
M. Hamada, K. Tsuda, T. Kudo, T. Kin, and K. Asai Mining frequent stem patterns from unaligned RNA sequences Bioinformatics, October 15, 2006; 22(20): 2480 - 2487. [Abstract] [Full Text] [PDF] |
||||
| ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||














