Bioinformatics Advance Access originally published online on December 20, 2006
Bioinformatics 2007 23(4):434-441; doi:10.1093/bioinformatics/btl636
| ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
Robust prediction of consensus secondary structures using averaged base pairing probability matrices
1 Computational Biology Research Center, National Institute of Advanced Industrial Science and Technology (AIST) 2-42 Aomi, Koto-ku, Tokyo, 135-0064, Japan
2 Graduate School of Information Sciences, Nara Institute of Science and Technology 8916-5 Takayama-cho, Ikoma, Nara 630-0192, Japan
3 Department of Computational Biology, Faculty of Frontier Science The University of Tokyo, 5-1-5 Kashiwanoha, Kashiwa, Chiba 277-8561, Japan
*To whom correspondence should be addressed.
| ABSTRACT |
|---|
|
|
|---|
Motivation: Recent transcriptomic studies have revealed the existence of a considerable number of non-protein-coding RNA transcripts in higher eukaryotic cells. To investigate the functional roles of these transcripts, it is of great interest to find conserved secondary structures from multiple alignments on a genomic scale. Since multiple alignments are often created using alignment programs that neglect the special conservation patterns of RNA secondary structures for computational efficiency, alignment failures can cause potential risks of overlooking conserved stem structures.
Results: We investigated the dependence of the accuracy of secondary structure prediction on the quality of alignments. We compared three algorithms that maximize the expected accuracy of secondary structures as well as other frequently used algorithms. We found that one of our algorithms, called McCaskill-MEA, was more robust against alignment failures than others. The McCaskill-MEA method first computes the base pairing probability matrices for all the sequences in the alignment and then obtains the base pairing probability matrix of the alignment by averaging over these matrices. The consensus secondary structure is predicted from this matrix such that the expected accuracy of the prediction is maximized. We show that the McCaskill-MEA method performs better than other methods, particularly when the alignment quality is low and when the alignment consists of many sequences. Our model has a parameter that controls the sensitivity and specificity of predictions. We discussed the uses of that parameter for multi-step screening procedures to search for conserved secondary structures and for assigning confidence values to the predicted base pairs.
Availability: The C++ source code that implements the McCaskill-MEA algorithm and the test dataset used in this paper are available at http://www.ncrna.org/papers/McCaskillMEA/
Contact: kiryu-h{at}aist.go.jp
Supplementary information: Supplementary data are available at Bioinformatics online.
| 1 INTRODUCTION |
|---|
|
|
|---|
Recently, a number of studies have shown that there is a substantial number of RNA transcripts that do not code protein sequences in higher eukaryotic cells (Okazaki et al., 2002; Dunham et al., 2004; Carninci et al., 2005), and the question whether such transcripts have any functional roles in cellular processes has attracted much interest. Since the existence of conserved secondary structures among phylogenetic relatives indicates the functional importance of such transcripts, several research groups have sought for conserved secondary structures on a genomic scale (Hofacker et al., 2004b; Washietl et al., 2005a,b; Pedersen et al., 2006).
In their studies, a large number of multiple alignments were created using computer programs, and consensus secondary structures were then predicted from these alignments. They used alignment programs that neglected the special conservation patterns of secondary structures, such as the base covariations in the stem regions since the alignment algorithms that took into account the base covariations required huge computational resources (Sankoff, 1985; Hofacker et al., 2004a; Mathews and Turner, 2002). Therefore, there were potential risks of overlooking conserved secondary structures due to misalignments. Such loss of sensitivity is particularly problematic in the early stage of large-scale screening that precedes the time-consuming but accurate computational and experimental validation stages.
In this paper, we investigate the dependence of the accuracy of secondary structure prediction on the quality of alignments and propose a method to predict conserved secondary structures from multiple alignments, which is robust against alignment failures. Our algorithm first computes the base pairing probability matrix for each sequence in the alignment and then obtains the base pairing probability matrix of the alignment by averaging over these matrices. The consensus secondary structure is predicted from this matrix using a Nussinov-style dynamic programming algorithm (Nussnov et al., 1978).
The use of the average pair probability matrix for obtaining the consensus structures is not a new idea and there have been several studies (Hofacker and Stadler, 1999; Hofacker et al., 1998; Luck et al., 1996, 1999; Knight et al., 2004) that have used the base pairing probability matrices of single sequences to predict the consensus secondary structures. In particular, the ConStruct program (Luck et al., 1996, 1999) predicts the same consensus structures as those predicted by our algorithm with a specific parameter value. However, we present a new interpretation and justification of this method in terms of the maximal expected accuracy (MEA) principle (Miyazawa, 1995), which has been successfully applied recently to the sequence alignment and the structure prediction to single sequences. This new interpretation makes it obvious that the method has an advantage in predicting the structures from seriously misaligned sequences. We show that our algorithm outperforms the leading programs for the consensus structure prediction (Hofacker et al., 2002; Knudsen and Hein, 2003) in such a situation.
| 2 SYSTEMS AND METHODS |
|---|
|
|
|---|
2.1 Definitions
We first present a few formal definitions of conserved secondary structure prediction that are useful for subsequent discussions. For a given alignment of length L, which is composed of N sequences X, let
|
|
|
| (1) |
Moreover, since we do not consider pseudo knot structures in this paper, we assume that y follows the nested structure constraint.
|
| (2) |
X in the alignment, we similarly consider the secondary structure |
|
2.2 Maximal expected accuracy algorithm
Recent studies have shown that the secondary structure predictions based on the principle of the MEA (Miyazawa, 1995) perform better than the predictions made by the conventional maximal likelihood algorithm (Pedersen et al., 2006; Do et al., 2006; Knudsen and Hein, 2003). This algorithm first computes the base pairing probability pij for each pair of alignment columns (i, j) and then considers the expected accuracy
for each secondary structure candidate
. The predicted secondary structure
is obtained by maximizing the expected accuracy
with respect to
. We first consider the secondary structure prediction for single sequences. For a given conditional probability distribution P(y|x), the base pairing probability pij of columns (i, j) can be defined as follows.
|
|
(z, z') is the Kronecker delta function and is defined by the following equation. |
|
Further, E[A] is the expected value of A with respect to P(y|x). Let the loop probability qi be the probability that the alignment column i does not form any pair with other columns.
|
| (3) |
0, the expected accuracy EA
(|
|
= 1,
The secondary structure that maximizes the expected accuracy can be computed by the traceback procedure of a Nussinov-like dynamic programming algorithm (Nussnov et al., 1978).
|
| (4) |
is given by the following equation. |
|
controls the sensitivity and specificity of the structure prediction (Do et al., 2006). A small
value encourages the base pair formation, which results in higher sensitivity, and a large
value encourages the increase of single-stranded regions and results in higher specificity.
P(y|x) can be computed for various models, such as models based on the loop decomposition of secondary structure energy and models based on stochastic context-free grammars (SCFGs). In energy based models, P(y|x) is given by the Boltzmann distribution of secondary structure configurations,
|
| (5) |
The maximal likelihood prediction of the energy-based models is given by the secondary structure y that maximizes P(y|x).
|
|
In the SCFG models (Dowell and Eddy, 2004), P(y|x) is given by the sum of the conditional probabilities P(
|x) over the set of parses
sharing the same secondary structure y.
|
|
, x) is the joint probability of generating the parse
and is given by the product of the transition and emission probabilities of the SCFG model. The sum of the numerator is over all the parse trees that share the same secondary structure y, and the sum of the denominator is over all the possible parse trees. The corresponding base pairing probability matrix is computed by the inside and outside algorithm (Durbin et al., 1998; Do et al., 2006). The reason why the MEA algorithms show better prediction accuracy than their maximal likelihood counterparts can be explained as follows. In general, any computational model of secondary structure prediction based only on the sequence data has limitations in accuracy because an RNA molecule in reality forms a 3D structure in the cell, and interacts with itself among all the neighboring bases in the 3D structure. Moreover, it interacts with bounded proteins and other cellular environments that affect the formation of its secondary structure. Hence, the absolute optimality with respect to the scoring system of the model is of limited importance. When the model is not very accurate but reasonably good, taking the majority of near-optimal structures may be a more feasible way to predict the secondary structures. The MEA algorithm can be considered as one of such algorithms. For example, if an optimal structure does not form a pair at a column pair (i, j) but many suboptimal structures form a pair at (i, j), then the MEA solution tends to predict the pair (i, j) since the base pair probability pij assumes a large value at that position. Therefore, in contrast to the maximal likelihood method that considers only one optimal structure, the MEA algorithm takes into account various near-optimal structures and predicts the consensus structure supported by them. It presumably acts to reduce model-specific artifacts in the predictions.
A more pragmatic reason for the efficiency of the MEA algorithms is that the objective function of MEA is closer to the accuracy measures for the structure prediction. The accuracy of the structure prediction is usually evaluated by counting the (in-) correctly predicted base pairs, and not by mesuring the correctness of structural components, such as the loop lengths and the stacking energies. Such accuracy measures are advantageous to the MEA algorithm since the MEA solution tends to predict all the likely pairs throwing away the constraints of the original model that is used to compute the pair probability matrices.
2.3 Algorithms for consensus structure prediction
RNAAlipfold (Hofacker et al., 2002) is a multi-sequence extension of the McCaskill algorithm. For each consensus secondary structure candidate, it assigns a Boltzmann factor,
|
|
Pfold (Knudsen and Hein, 2003) is a multi-sequence extension of the SCFG model for a single sequence structure prediction. The differences from the single sequence case are that it simultaneously emits bases in each column and each pair of columns, and that the emission scores assume the likelihood values computed by the Markov model of sequence evolution. The Pfold algorithm is an MEA algorithm that maximizes the expected accuracy with
= 1.
Both RNAAlipfold and Pfold assume the correctness of alignments, and the covariation scores contained in both the models rely on it. Their covariation scores are most efficient when they are applied to high-quality multiple alignments. However, in low-quality alignment data, there are many fake inconsistent mutations caused by alignment failures, which may cause the incorrect estimations of the covariation scores and result in the loss of sensitivity to conserved structures.
Here, we propose an alternative MEA algorithm that is not strongly dependent on the correctness of alignments. First, we define the conditional probability distribution function P(Y|X) over all the secondary structures of all the sequences in the alignment as follows:
|
|
X} denote the set of secondary structures of sequences X, and P(y(x)|x) is given by the Boltzmann distribution of single sequence x [Equation (5)]. For each consensus structure candidate |
|
|
| (6) |
The MEA structure is computed from qi and pij in a manner identical to the case of the prediction from single sequences. We refer to the algorithm as McCaskill-MEA. For
= 0, the McCaskill-MEA algorithm predicts the same structures as those of the ConStruct program (Luck et al., 1996, 1999).
The McCaskill-MEA algorithm does not assume that all the sequences take an equal single structure and instead predicts the structure that is supported by the majority of sequences. Since the model does not include any covariation score term, the accuracy of the prediction may be lower than that of other algorithms for high-quality alignments. However, McCaskill-MEA has the advantage that the algorithm is free from the negative effects of covariation scores in the presence of severe alignment errors.
To observe the effect of the suboptimal structures contained in the base pairing probability matrices
, we consider another MEA algorithm. It first computes the predicted structures for all sequences using the Mfold program and defines the base pairing probability matrix of the sequence as
|
|
2.4 The dataset
We have collected a test dataset from version 7.0 of the Rfam database (Griffiths-Jones et al., 2003). We have used only the manually curated seed alignments with the consensus structures published in literatures. The alignments and structure annotations of the Rfam database is assumed to be correct and the accuracy is evaluated with respect to them. The test dataset consists of 85 subalignments of 10 sequences. The number of families is 17 and there are five subalignments for each family. The basic properties of the dataset are listed in Table 1, in which the family name, mean base length and mean sequence identity are listed.
|
The alignments have been realigned using the multiple alignment softwares ProbCons (Do et al., 2005) and ClustalW (Thompson et al., 1994). ProbCons is based on the MEA algorithm and its parameters are optimized for RNA sequences using the expectation maximization algorithm. ClustalW uses a simple heuristic scoring scheme that is not specifically tuned for RNA sequences. The accuracies of the alignments were evaluated by computing the sum-of-pairs score (SPS) (Carillo and Lipman, 1988). The obtained values are listed in Table 1. SPS is the fraction of base pairs in unequal sequences that are aligned in a manner identical to those in the reference alignment. SPS assumes values between zero and one.
Table 1 shows the diversity of the test dataset whose mean lengths vary from 54 bases to 291 bases and whose mean pairwise sequence identities vary from 40 to 94%. SPS also varies considerably among families; however, in general, SPS values of ProbCons alignments are higher than those of ClustalW alignments, indicating the superiority of ProbCons over ClustalW. The differences in the accuracy of the predictions between ProbCons alignments and ClustalW alignments can be considered as the differences in accuracy between high-quality alignments and low-quality alignments. Since the alignment quality is lower for multiple alignments of highly diverged sequences in general, the differences in accuracy between ProbCons and ClustalW alignments also indicate the differences in accuracy between the alignments of diverged sequences and those of evolutionarily close sequences.
The accuracy of structure predictions is evaluated by computing the sensitivity and specificity of the predictions, which are defined by
|
|
We also use the Matthews correlation coefficients (MCCs) (Matthews, 1975), which is defined by the formula.
|
|
| 3 IMPLEMENTATION |
|---|
|
|
|---|
We used version 1.5 of the Vienna RNA package (Hofacker, 2003) for the computation of the base pairing probability matrices of McCaskill-MEA and RNAAlipfold-MEA and the structure predictions of the RNAAlifold algorithm. Version 5 of Mfold was used for the computation of the Mfold-MEA algorithm. A stand-alone program of Pfold was obtained (courtesy of Dr B. Knudsen). The algorithm of Equation (4) was implemented using the C++ language.
| 4 DISCUSSION |
|---|
|
|
|---|
4.1 Comparison of algorithms
Figure 1 shows examples of the density plots of the base pairing probabilities, which are calculated from a multiple alignment of 10 tRNA sequences. The alignment is created using the ClustalW software. The lower left triangles in both the figures show the true distribution of base pair probabilities. They are computed by first assigning the annotated structure in the Rfam database to each sequence and then computing base pairing probability matrices in a manner similar to Mfold-MEA. Although, the true tRNA structure has only four stems (Fig. 1, bottom), about 10 stems are observed in the plot due to severe misalignments. The upper right triangles show the density plot of the pairing probabilities used in the RNAAlipfold-MEA (left) and McCaskill-MEA (right) algorithms. Only two out of four stems are observed in the matrix for RNAAlipfold-MEA, while all the four stems are observed in the matrix for McCaskill-MEA.
|
Figure 2 shows the receiver operator characteristic (ROC) curves of the structure predictions from alignments of 10 sequences. The x and y axis represent the specificity and the sensitivity of predictions, respectively. The ROC curves are computed by varying
in the three MEA algorithms. The sensitivity is large for small values of
, since the terms that score the base pairs in the expected accuracy is emphasized and the number of predicted base pairs increases. The ROC curve reaches a limit for 
0. In this limit, the entire regions of the multiple alignments are filled with predicted stems. For large values of
, the number of predicted base pairs decreases. In the limit of large
, the number of predicted stems is so small that the corresponding plot fluctuates due to statistical fluctuations. Therefore, we only showed the data points for which the total number of predicted base pairs is greater than 10% of the total number of true base pairs. Since there is no parameter to control the specificity-sensitivity trade-off for Pfold (filled circle) and RNAAlifold (filled triangle), only one point for each alignment type is plotted.
|
Figure 2 shows that the sensitivity considerably depends on the alignment quality; the maximal sensitivity achieved for ClustalW alignments is less than ProbCons alignments by 10% and less than the reference alignments by about 30%. For all alignment types, the curves of McCaskill-MEA are above Mfold-MEA, which shows the efficiency achieved by including the effect of suboptimal structures in the consensus structure prediction. The higher sensitivity of RNAAlipfold-MEA as compared to that of RNAAlifold at the same specificity values also indicates the superiority of the MEA algorithm as compared to its maximal likelihood version, although the difference is less prominent. Both the specificity and sensitivity decrease for Mfold-MEA in the large
limit, which indicates that the loop probability values qi incorrectly assume large values at the columns of the true base pairs due to the neglect of suboptimal structures. As expected, Pfold and RNAAlipfold-MEA show slightly better sensitivities for specificities >0.8 as compared to McCaskill-MEA due to the positive effect of the base covariation scores. However, McCaskill-MEA shows the best sensitivities for lower specificity regions in all the three alignment types. Table 2 lists the ROC score, the maximal MCC and the maximal sensitivity for each alignment type and structure prediction algorithm. The ROC score is defined by the area under the ROC curve and is a standard indicator for prediction efficiency. Table 2 shows that the ROC score is the highest for the McCaskill-MEA algorithm. As for the maximal MCC, the Pfold program achieves the best MCC value for the high-quality reference alignments. However, McCaskill-MEA is better than the RNAAlifold program even for these alignments. For the ProbCons and ClustalW alignments, McCaskill-MEA algorithm outperforms the other programs. The difference between the maximal MCC value of McCaskill-MEA and that of other algorithms is larger for the lower quality ClustalW alignments. The table also shows that the maximal sensitivity is the highest for the McCaskill-MEA algorithm.
|
Note that in contrast to the specificity, the sensitivity values cannot be arbitrarily close to 1 unless the model's accuracy is fairly high; this is because it is not possible to predict all the base pair candidates (i.e. yij = 1 for all 1
i
j
L) to satisfy the consistency constraint [Equation (1)]. At this point, the secondary structure prediction problem is different from other binary classification problems where the classifier that predicts all the test samples as positive, (which corresponds to predicting yij = 1 for all 1
i
j
L), trivially achieves a sensitivity of one. Therefore, it may be said that the maximal reachable sensitivity is itself an indicator of the efficiency of the algorithms, and that McCaskill-MEA is comparatively much better than other algorithms with respect to it. Figure 3 shows the dependence of the ROC score on the number of sequences in the alignments. The alignments of 2, 4, 6 and 8 sequences are created by sampling sequences randomly from the alignments of 10 sequences. The colors and symbols are the same as in Figure 2. The ROC scores of McCaskill-MEA and Mfold-MEA increase with the number of sequences, while the increase of RNAAlipfold-MEA is somewhat slower than that of the other algorithms. For computationally aligned sequences, the McCaskill-MEA algorithm shows the best performance among the three algorithms. Even for the reference alignments, the McCaskill-MEA algorithm has a slightly better ROC score than the RNAAlipfold-MEA algorithm, which might imply the difficulty to score sequence covariations correctly for diverged sequences.
|
Figure 4 shows an example of the dependence of the sensitivity at fixed specificity (0.7) on the mean sequence identities. The current Rfam dataset has not permitted us to collect multiple sequence alignments over a wide-range of sequence identities for various families. The used dataset consists of 188 multiple alignments of four sequences collected from the Rfam Hammerhead_3 ribozyme family. The alignments are categorized into bins of width 10% in their mean sequence identities, and the mean sensitivities for each bin are plotted. The colors and symbols have the same meanings as in Figures 2 and 3. For this family, all three algorithms show similar behaviors for the reference alignments and the ProbCons alignments. For the ClustalW alignments, however, the McCaskill-MEA algorithm shows the best sensitivity in the region where the sequence identity is <60%.
|
4.2 Uses of parameter

As we have shown, the prediction accuracy of conserved secondary structures significantly depends on the alignment quality, which indicates the necessity of refining the alignments after candidates of alignments with conserved structures are screened. The parameter
, which controls the sensitivity and specificity of the prediction accuracy, can be conveniently used for such multi-step screening procedures; this is done by taking small values of
to screen conserved structure candidates from coarse alignments with high-sensitivity and then taking
large to predict structures from refined multiple alignments with high-specificity.
Another use of the parameter
is to assign confidence values to predicted base pairs. Figure 5 shows an example of the predicted structures of the UnaL2 family for varying parameter
. They are predicted from a ProbCons alignment using the McCaskill-MEA algorithm. As seen from the figure, the number of base pairs monotonically decreases with
without creating alternative base pairs. This behavior holds in most cases. Hence, we define the confidence value of each predicted base pair as follows. For any base pair, we associate the
value that is maximal among the ones whose MEA solutions predict that pair, and define the confidence value of the pair as the specificity corresponding to that
[Fig. 6 (left)]. The confidence value of a predicted base pair represents the empirical probability that the pair is a true base pair. The definition of confidence value depends only on the test dataset and the corresponding ROC curve and is essentially independent of the details of models. It has the property that the number of base pairs with high-confidence values is large in the predicted structures from the accurate multiple alignments. The confidence values of the prediction to the UnaL2 family is plotted in Figure 6 (right). The confidence values will be useful to rank the secondary structure candidates in genomic scale studies of conserved secondary structures.
|
|
| 5 CONCLUSION |
|---|
|
|
|---|
We have presented a method to predict the conserved secondary structures from multiple aligned sequences that are subject to alignment failures. Our method first calculates the base pairing probability matrix for each sequence, which are subsequently averaged to yield the base pairing probability matrix of the alignment. The consensus secondary structure is obtained by maximizing the expected accuracy of the structure with respect to the base pairing probabilities. For computationally aligned multiple sequences, our method shows a better performance as compared to other frequently used programs. We have shown that our method is particularly suitable for the alignments that suffer from significant alignment failures and that consist of a large number of sequences. We have shown that the parameter
in our model, which controls the sensitivity and specificity of the prediction, is useful for the genomic scale screening of conserved secondary structures and for assigning confidence values to the predicted base pairs. In the present study, we have investigated only the global problem of structure prediction, i.e., the lengths of the alignments and those of the structural RNA genes are assumed to be of the same order. For the local problem of consensus structure prediction that searches long multiple alignments for small conserved structures, the calculation of the base pairing probability matrices over the entire alignment might have problems caused by the stochastic disturbance from the regions that are not related to the RNA genes and secondary structures. We leave the investigation of the local structure prediction problem and the scaling property of the base pairing probability matrices for future study.
We have considered only simple applications of the maximal expected accuracy principle in which the base pairing probabilities are derived either from an alignment or from all the sequences in the alignment. However, we can extend our method to compute the average of both the probabilities that might complement each other. We can also consider combining other probability matrices derived from other models, such as SCFG models. Such considerations lead to the problem of finding the best proportionality constants to sum the various probabilities. It may be interesting to study machine-learning approaches to combine various base pairing probabilities in an optimal manner.
| Acknowledgments |
|---|
The authors thank their colleagues in the project for discussions and comments. H.K. thanks Yasuo Tabei and Michiaki Hamada for helpful discussions. This work was partially supported by the Functional RNA Project funded by the New Energy and Industrial Technology Development Organization (NEDO) of Japan.
Conflict of Interest: none declared.
| FOOTNOTES |
|---|
Associate Editor: Charlie Hodgman
Received on September 9, 2006; revised on December 11, 2006; accepted on December 12, 2006
| REFERENCES |
|---|
|
|
|---|
Carillo, H. and Lipman, D. (1988) The multiple sequence alignment problem in biology. SIAM J. Appl. Math, . 48, 10731082[CrossRef].
Carninci, P., et al. (2005) The transcriptional landscape of the mammalian genome. Science, 309, 15591563
Do, C., et al. (2005) ProbCons: probabilistic consistency-based multiple sequence alignment. Genome Res, . 15, 330340
Do, C., et al. (2006) CONTRAfold: RNA secondary structure prediction without physics-based models. Bioinformatics, 22, e90e98
Dowell, R. and Eddy, S. (2004) Evaluation of several lightweight stochastic context-free grammars for RNA secondary structure prediction. BMC Bioinformatics, 5, 71[CrossRef][Medline].
Dunham, A., et al. (2004) The DNA sequence and analysis of human chromosome 13. Nature, 428, 522528[CrossRef][Medline].
Durbin, R., Eddy, S.R., Krogh, A., Mitchison, G.J. Biological Sequence Analysis: Probabilistic Models of Proteins and Nucleic Acids, (1998) , Cambridge, UK Cambridge University Press.
Griffiths-Jones, S., et al. (2003) Rfam: an RNA family database. Nucleic Acids Res, . 31, 439441
Hofacker, I. (2003) Vienna RNA secondary structure server. Nucleic Acids Res, . 31, 34293431
Hofacker, I. and Stadler, P. (1999) Automatic detection of conserved base pairing patterns in RNA virus genomes. Comput. Chem, . 23, 401414[CrossRef][ISI][Medline].
Hofacker, I., et al. (1998) Automatic detection of conserved RNA structure elements in complete RNA virus genomes. Nucleic Acids Res, . 26, 38253836
Hofacker, I., et al. (2002) Secondary structure prediction for aligned RNA sequences. J. Mol. Biol, . 319, 10591066[CrossRef][ISI][Medline].
Hofacker, I., et al. (2004a) Alignment of RNA base pairing probability matrices. Bioinformatics, 20, 22222227
Hofacker, I., et al. (2004b) Prediction of locally stable RNA secondary structures for genome-wide surveys. Bioinformatics, 20, 186190
Knight, R., et al. (2004) BayesFold: rational 2 degrees folds that combine thermodynamic, covariation, and chemical data for aligned RNA sequences. RNA, 10, 13231336
Knudsen, B. and Hein, J. (2003) Pfold: RNA secondary structure prediction using stochastic context-free grammars. Nucleic Acids Res, . 31, 34233428
Luck, R., et al. (1996) Thermodynamic prediction of conserved secondary structure: application to the RRE element of HIV, the tRNA-like element of CMV and the mRNA of prion protein. J. Mol. Biol, . 258, 813826[CrossRef][ISI][Medline].
Luck, R., et al. (1999) ConStruct: a tool for thermodynamic controlled prediction of conserved secondary structure. Nucleic Acids Res, . 27, 42084217
Mathews, D. and Turner, D. (2002) Dynalign: an algorithm for finding the secondary structure common to two RNA sequences. J. Mol. Biol, . 317, 191203[CrossRef][ISI][Medline].
Mathews, D., et al. (1999) Expanded sequence dependence of thermodynamic parameters improves prediction of RNA secondary structure. J. Mol. Biol, . 288, 911940[CrossRef][ISI][Medline].
Matthews, B. (1975) Comparison of predicted and observed secondary structure of t4 phage lysozyme. Biochim. Biophys. Acta, 405, 442451[Medline].
McCaskill, J. (1990) The equilibrium partition function and base pair binding probabilities for RNA secondary structure. Biopolymers, 29, 11051119[CrossRef][ISI][Medline].
Miyazawa, S. (1995) A reliable sequence alignment method based on probabilities of residue correspondences. Protein Eng, . 8, 9991009
Nussnov, R., et al. (1978) Algorithms for loop matchings. SIAM J. Appl. Math, . 35, 6882[CrossRef].
Okazaki, Y., et al. (2002) Analysis of the mouse transcriptome based on functional annotation of 60 770 full-length cDNAs. Nature, 420, 563573[CrossRef][Medline].
Pedersen, J., et al. (2006) Identification and classification of conserved RNA secondary structures in the human genome. PLoS Comput. Biol, . 2, e33[CrossRef][Medline].
Sankoff, D. (1985) Simultaneous solution of the rna folding, alignment and protosequence problems. SIAM J. Appl. Math, . 45, 810825[CrossRef].
Thompson, J., et al. (1994) CLUSTAL W: improving the sensitivity of progressive multiple sequence alignment through sequence weighting, position-specific gap penalties and weight matrix choice. Nucleic Acids Res, . 22, 46734680
Washietl, S. and Hofacker, I. (2004) Consensus folding of aligned sequences as a new measure for the detection of functional RNAs by comparative genomics. J. Mol. Biol, . 342, 1930[CrossRef][ISI][Medline].
Washietl, S., et al. (2005a) Mapping of conserved RNA secondary structures predicts thousands of functional noncoding RNAs in the human genome. Nat. Biotechnol, . 23, 13831390[CrossRef][ISI][Medline].
Washietl, S., et al. (2005b) Fast and reliable prediction of noncoding RNAs. Proc. Natl Acad. Sci. USA, 102, 24542459
Zuker, M. (1989) Computer prediction of RNA structure. Methods Enzymol, 180, 262288[ISI][Medline].
This article has been cited by other articles:
![]() |
K. Katoh and H. Toh Recent developments in the MAFFT multiple sequence alignment program Brief Bioinform, July 1, 2008; 9(4): 286 - 298. [Abstract] [Full Text] [PDF] |
||||
![]() |
K. Asai, H. Kiryu, M. Hamada, Y. Tabei, K. Sato, H. Matsui, Y. Sakakibara, G. Terai, and T. Mituyama Software.ncrna.org: web servers for analyses of RNA sequences Nucleic Acids Res., July 1, 2008; 36(suppl_2): W75 - W78. [Abstract] [Full Text] [PDF] |
||||
![]() |
H. Kiryu, T. Kin, and K. Asai Rfold: an exact algorithm for computing local base pairing probabilities Bioinformatics, February 1, 2008; 24(3): 367 - 373. [Abstract] [Full Text] [PDF] |
||||
| ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||

















