Bioinformatics Advance Access originally published online on February 24, 2005
Bioinformatics 2005 21(10):2246-2253; doi:10.1093/bioinformatics/bti349
| ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
Predicting a set of minimal free energy RNA secondary structures common to two sequences
Center for Human Genetics and Molecular Pediatric Disease, University of Rochester Medical Center 601 Elmwood Avenue, Box 703, Rochester, NY 14642, USA
| Abstract |
|---|
|
|
|---|
Motivation: Function derives from structure, therefore, there is need for methods to predict functional RNA structures.
Results: The Dynalign algorithm, which predicts the lowest free energy secondary structure common to two unaligned RNA sequences, is extended to the prediction of a set of low-energy structures. Dot plots can be drawn to show all base pairs in structures within an energy increment. Dynalign predicts more well-defined structures than structure prediction using a single sequence; in 5S rRNA sequences, the average number of base pairs in structures with energy within 20% of the lowest energy structure is 317 using Dynalign, but 569 using a single sequence. Structure prediction with Dynalign can also be constrained according to experiment or comparative analysis. The accuracy, measured as sensitivity and positive predictive value, of Dynalign is greater than predictions with a single sequence.
Availability: Dynalign can be downloaded at http://rna.urmc.rochester.edu
Contact: David_Mathews{at}urmc.rochester.edu
Supplementary information: http://rna.urmc.rochester.edu/supplementary.html
| INTRODUCTION |
|---|
|
|
|---|
Over the last couple of decades, it has become clear that RNA serves many cellular roles beyond being just a passive carrier of genetic information. It is now understood that RNA catalyzes reactions (Doudna and Cech, 2002; Hansen et al., 2002), directs the site-specific modification of RNA nucleotides (Bachellerie et al., 2002), modulates protein expression (Meister and Tuschl, 2004) and serves in protein localization (Walter and Blobel, 1982).
RNA secondary structure, the sum of canonical base pairs, is generally more stable than tertiary contacts and forms on faster timescales than tertiary structure (Tinoco and Bustamante, 1999). Therefore, secondary structure can largely be determined independently of tertiary structure.
The gold standard for determining RNA secondary structure, in the absence of a crystal structure, is comparative sequence analysis (Pace et al., 1999), in which a large number of sequences are aligned to reveal the common base pairing pattern. Alternatively, when only one sequence is available, free energy minimization can be used to predict the secondary structure (Hofacker, 2003; Mathews et al., 2004). Free energy minimization has been benchmarked as high as 73% accurate on average for predicting known base pairs for a diverse set of sequences in domains of fewer than 800 nt (Mathews et al., 1999; Mathews et al., 2004), but the average accuracy has also been reported as 56% for a different benchmark set of sequences with known structures (Dowell and Eddy, 2004). Several methods have been developed to predict suboptimal secondary structures, i.e. structures with free energy similar to the lowest free energy structure. Suboptimal structures provide structural alternatives to the lowest free energy structure. A diverse set of representative structures can be generated within an energy increment of the lowest free energy structure (Zuker, 1989), every structure can be exhaustively generated within an energy increment (Wuchty et al., 1999) or structures can be sampled from the ensemble (Ding and Lawrence, 2003).
When a set of homologous RNA sequences are available, secondary structure prediction accuracy can be improved by finding the lowest free energy structure consistent with every sequence. These algorithms are generally divided into two groups: those that use a fixed alignment (Hofacker et al., 2002; Knight et al., 2004; Lück et al., 1999) and those that simultaneously find the optimal alignment and structure (Chen et al., 2000; Gorodkin et al., 1997; Hofacker et al., 2004; Mathews and Turner, 2002; Perriquet et al., 2003; Sankoff, 1985). In general, those algorithms that determine the common structure and alignment simultaneously are more robust because they are not constrained by a fixed alignment, but these algorithms require more computational time. Several algorithms of both types were recently compared and benchmarked (Gardner and Giegerich, 2004).
Dynalign is a dynamic programming algorithm that finds the lowest free energy secondary structure common to two unaligned sequences and the sequence alignment that facilitates that structure (Mathews and Turner, 2002). It minimizes the total free energy:
![]() | (1) |
is the conformational free energy of sequence one,
is the conformational free energy of sequence two and
is the gap penalty that is applied to each gap in the alignment. The optimal value for
, 0.4 kcal mol1 gap1, was found by maximizing the prediction accuracy for a set of 5S rRNA sequences (Mathews and Turner, 2002). Because Dynalign does not use any term that compares the sequence identity of the two sequences, it can be used to predict the structures for homologous sequences that have no sequence identity, but only structure conservation.
This paper describes the extension of the Dynalign algorithm to predict a set of representative suboptimal solutions. The methodology for predicting suboptimal structures can also be utilized to predict energy dot plots, which demonstrate all competing base pairs in structures within a given energy increment from the lowest free energy structure. The Dynalign algorithm can now also constrain secondary structure prediction for either one or both sequences using constraints derived from experiment or comparative sequence analysis. The most recent set of nearest neighbor parameters for predicting the free energy of secondary structure formation have been implemented (Mathews et al., 2004; Xia et al., 1998). The algorithm remains O(N3 M3) in time and O(N2 M2) in memory, where M is a parameter that limits the depth of search for alignments. For nucleotide i in sequence one to align to nucleotide k in sequence two:
![]() | (2) |
To benchmark the accuracy of Dynalign, secondary structure prediction statistics are reported for sets of RNA sequences with well-known secondary structure. These sets include a set of SRP RNA sequences (Larsen et al., 1998) (http://psyche.uthct.edu/dbs/SRPDB/SRPDB.html), a set of randomly chosen tRNA sequences (Sprinzl et al., 1998) (http://www.uni-bayreuth.de/departments/biochemie/trna/), a set of 5S rRNA sequences with widely varying prediction accuracy using a single sequence (Szymanski et al., 2000) (http://www.man.poznan.pl/5SData/) and a set of randomly chosen 5S rRNA sequences.
| ALGORITHM |
|---|
|
|
|---|
Dynamic programming recursions
Dynalign is revised to include the ability to predict not only the lowest free energy structure common to two sequences, but also a set of representative suboptimal structures with higher free energies. These low-energy structures are a set of alternative conformations. The method is a generalization of suboptimal structure prediction for single sequences (Zuker, 1989) and suboptimal sequence alignment (Vingron and Argos, 1990; Zuker, 1991) to the 4D arrays used by Dynalign. The two 4D arrays defined previously (Mathews and Turner, 2002) for accumulating conformational free energies, V(i,j,k,l) and W(i,j,k,l), are doubled in size and the arrays are now filled for j up to 2 x N11 and for l up to 2 x N21 where N1 is the length of the first sequence and N2 is the length of the second sequence. For j
N1, V(i,j,k,l) is the lowest energy for a structure composed of nucleotide fragments i to j in sequence one and k to l in sequence two, given that i is paired to j, k is paired to l, i is aligned to k and j is aligned to l (Fig. 1). Similarly, for j
N1, W(i,j,k,l) is the lowest energy for fragments from i to j in sequence one and k to l in sequence two, such that the fragment will be in a multibranch loop, but without restrictions on pairing or alignment.
|
To predict suboptimal structures and alignments, free energies are calculated for fragments of the sequence that contain both the 5' and 3' end of the sequences, called excluded fragments. A second half of the arrays has been added for suboptimal structure prediction, for which j > N1 and l > N2. For these cases, V(i,j,k,l) is the lowest free energy for the excluded fragment, i.e. nucleotides 1 to j N1 and i to N1 in sequence one and nucleotides 1 to l N2 and k to N2 in sequence two, with the same pairing and alignment constraints as with j
N1. Also, W(i,j,k,l) is the lowest free energy for the excluded fragment without constraints. The arrays are only doubled in size compared with the minimum free energy calculation reported previously because if j > N1 then l > N2 and similarly j
N1 implies l
N2. To speed the free energy calculation of exterior loops (loops that contain the ends of the sequence), a (2D) array, W3(i,k) is added, where W3(i,k) is the lowest free energy for nucleotides i to N1 in sequence one and k to N2 in sequence two with i aligned to k. W5(i,k), the lowest free energy for nucleotides one to i in sequence one and one to k in sequence two, with i aligned to k, is as defined previously (Mathews and Turner, 2002).
The advantage of calculating the free energy arrays for excluded fragments is that for any set of pairs, i to j and k to l, the lowest free energy structure that contains that pair has free energy equal to V(i,j,k,l) + V(j,i + N1, l, k + N2), as illustrated in Figure 2. Tracebacks of secondary structures can be started from any pair with free energy less than some cut-off value.
|
In practice, the arrays V and W are filled using the recursions described previously (Mathews and Turner, 2002), but with 16 additional terms in the calculation of V for considering exterior loops in excluded fragments. Also, the minimum of 19 terms is calculated for W3(i,k). The complete recursions are enumerated in the Supplementary Material.
When the arrays are completely filled, the lowest energy pos- sible for a structure is known, but that structure itself is unknown. To generate structures, a heap of pairs for which V(i,j,k,l) + V(j,i+N1,l,k+N2) is below the desired energy threshold is generated and sorted. Then structures are determined starting with the lowest energy pairs on the heap using a traceback scheme similar to that outlined previously (Mathews and Turner, 2002). Two user-defined window parameters are used to ensure that the representative suboptimal structures are significantly different from each other. A structure window parameter, SWIN, ensures that the representative suboptimal secondary structures are significantly different from each other by requiring that each new structure have at least one base pair separated by at least SWIN nucleotides from pairs in other structures. Similarly, diversity in the sequence alignments is ensured by a window parameter, AWIN, by requiring that each new alignment have a nucleotide alignment separated by AWIN nucleotides from a previous alignment. SWIN and AWIN, however, can be set to minimum values of zero to produce many structures with small variations.
The cost of suboptimal structure prediction in Dynalign is about twice that of predicting the lowest free energy common structure alone. This cost, as compared with the original algorithm, is mitigated by the increasing speed and memory for desktop computers. The algorithm remains O(M3N3) in time and O(M2N2) in storage. Table 1 shows the computation time and memory requirement for sample calculations.
|
Experimental constraints
Dynalign predictions can be constrained according to experimental data or comparative sequence analysis. Nucleotides can be forced single-stranded, double-stranded or accessible to chemical modification. Single- and double-stranded nucleotides can be determined experimentally using enzymatic cleavage as reviewed by Knapp (1989). Chemical modification probing of secondary structure was reviewed by Ehresmann et al. (1987) and can be used to probe native RNA structure in vitro or in vivo (Mathews et al., 2004; Zaug and Cech, 1995). Base pairs between two nucleotides in a sequence can be forced or prohibited as might be determined by comparative sequence analysis (Pace et al., 1999). Furthermore, the alignment space can be restricted by forcing two nucleotides to align (Pace et al., 1999). Conformations or alignments that violate constraints are prohibited by assigning them large, positive free energies. The constraints are then satisfied because the dynamic programming algorithm will find the lower energy solutions that do not violate the constraints.
Chemical modification is not straightforward to implement in the dynamic programming recursions because nucleotides accessible to chemical modification can be single-stranded, at the ends of helices, in GU base pairs, or adjacent to GU base pairs (Mathews et al., 2004). In Dynalign, the chemical modification constraint is imposed by introducing another 4D array, VMOD(i,j,k,l). VMOD(i,j,k,l) is the minimum of the same terms as V(i,j,k,l) except base stacking on a previous pair is excluded in VMOD if either i, j, k or l is modified and the previous pair is not a GU and the modified nucleotide is not in a GU pair. In the recursions, when V is referenced, VMOD is used in cases of continuous base pair stacking and V is used for all loop closure cases. This serves to not allow a modified pair to be buried in a helix unless it is in or adjacent to a GU pair, satisfying the constraints. This constraint requires little additional time to implement, but is costly in storage because VMOD is an additional O(N2M2) array.
Free energy parameters
The implementation of the nearest neighbor rules for predicting conformational stability has been updated. Dynalign implements helical parameters from Xia et al. (1998) and uses the updated loop parameters from Mathews et al. (2004).
The per nucleotide gap penalty is set to 0.4 kcal mol1 gap1, which provided maximal prediction accuracy for a set of 5S rRNA sequences previously (Mathews and Turner, 2002). The gap penalty is included for all gaps, even at the end of the alignments, so that the algorithm is optimizing the global alignment.
Randomly drawn sequences
To test Dynalign under typical use situations, 40 tRNA (Sprinzl et al., 1998) and 14 5S rRNA (Szymanski et al., 2000) sequences were drawn randomly from an available database of structures (Mathews et al., 1999). The tRNA sequences are RD5281, RD6280, RE0501, RE6940, RF1540, RF6820, RF7920, RF9281, RG1140, RG1310, RG1381, RG1540, RG2530, RH9990, RI0501, RI3280, RI4310, RK5240, RK9162, RK9222, RL0503, RL0504, RL3921, RL6280, RL6370, RM2530, RM4000, RN0500, RP0500, RR0500, RT3920, RV1662, RV6281, RW1251, RW2920, RW7070, RX1660, RX7980, RY1661 and RY2560. For tRNA sequences, modified nucleotides that cannot canonically pair are forced single-stranded (Mathews et al., 1999). The tRNA sequences have pairwise sequence identity ranged from 18.6 to 100.0% with an average of 46.9 ± 10.4%. The 5S rRNA sequences are Aspergillus nidulans 8, Bradyrhizobium japonicum, Bugula neritina, Exidia glandulosa, Filobasidium casuligenum, Halobacterium salinarium, Methanobacterium thermoformicicum, Microbotryum violaceum, Methanococcus voltae, Prosthecochloris aestuarii, Philosamia cynthia-ricini, Schizocosa saltatrix, Tinca tinca and Ustilaginales fluitans.
Calculation of accuracy
Accuracy of base pair prediction is tabulated in two ways. The first, sensitivity, is the percentage of known base pairs (as determined by comparative sequence analysis) correctly predicted. The second, positive predictive value, is the percentage of predicted base pairs that are in the structure determined by comparative sequence analysis. Single standard deviations are reported as errors.
Sensitivity measures the ability to predict the known base pairs, whereas positive predictive value measures the quality of the predicted structures. In general, positive predictive value is lower than sensitivity for free energy minimization structure prediction because there is a tendency to over-predict base pairs and also because the databases of known structures do not annotate all known pairs (Mathews, 2004).
Base pairs are considered correctly predicted if that pair or a pair slipped by up to 1 nt on one side of the pair is in the predicted structure (Mathews et al., 1999). For example, a known pair (i)(j) would be considered predicted if (i)(j), (i + 1)(j), (i 1)(j), (i)(j + 1), or (i)(j 1) appeared in the predicted structure. Base pairs are allowed to be slipped because comparative sequence analysis, the method used to predict the benchmark structures, can have difficulty distinguishing the true structure (as determined by X-ray crystallography) from the slipped alternative. This largely occurs at the ends of helices and with AU or GU pairs (Ban et al., 2000; Gutell et al., 2002). It is also possible that these pairs are dynamic and may sample both conformations (Gutell et al., 2002).
| BENCHMARKS |
|---|
|
|
|---|
Accuracy for test datasets
To test the accuracy of structure prediction with Dynalign, four test sets of structures with well-determined secondary structure were used. The first set is a set of seven 5S rRNA structures (Szymanski et al., 2000) with widely varying base pair prediction sensitivity (from 0 to 100%) for structure prediction using a single sequence (Mathews et al., 2004). Table 2 summarizes the percentage of known base pairs correctly predicted (sensitivity) for the lowest free energy structure and for the best suboptimal structure in a set of structures up to 20% different in energy from the lowest energy. For this set of 5S rRNA sequences, the average sensitivity of base pair prediction in the lowest energy structure improves from 47.4 ± 35.9% for a single sequence to 86.6 ± 12.9% with Dynalign. For the best suboptimal structure, the sensitivity improves from 91.5 ± 9.9 to 96.6 ± 4.5%. The best suboptimal structure predicted by Dynalign is, on average, 2.2 kcal/mol or 3.6% higher in free energy than the lowest free energy structure. The positive predictive value for base pair prediction, on average, improved from 41.1 ± 31.8 to 83.0 ± 12.3% by using two sequences.
|
The best suboptimal structure is representative of the accuracy of the set of structures. In particular, when the best suboptimal structure is more sensitive for base pair prediction than the lowest free energy structure, the importance of sampling alternative structures is illustrated. The 2.2 kcal/mol average difference in free energy is within the error of free energy prediction using the nearest neighbor parameters (Mathews et al., 2004; Xia et al., 1998). There is no a priori method, however, for determining the best structure from the set of suboptimal structures unless the secondary structure of the sequence is known, as it is for these benchmarks.
To test Dynalign with longer sequences, three calculations were performed with SRP RNA sequences from three different kingdoms (Table 3). Arabidopsis thaliana was paired with Zea mays, Bacillus brevis was paired with Bacillus subtilis, and Mycoplasma mycoides was paired with Mycoplasma pneumoniae. As with the 5S rRNA sequences, the sensitivity of both the lowest free energy structure and the best suboptimal structure improved for each sequence or remained the same for predictions that are 100% sensitive for a single sequence. On average, the sensitivity of base pair prediction improved from 73.4 to 92.4% and the positive predictive value improved from 62.9 to 78.5%.
|
Finally, to test how Dynalign can improve the accuracy of secondary structure prediction under typical conditions and without selection bias, the accuracy of Dynalign was benchmarked for tRNA sequences and 5S rRNA sequences chosen randomly from the database of known structures. A total of 40 tRNA sequences and 14 5S rRNA sequences were chosen for a total of 780 and 91 Dynalign calculations, respectively. The average accuracies for a single sequence calculation and Dynalign are tabulated for these datasets in Table 4. The average sensitivities, 85.5 and 73.8%, of structure prediction of a single sequence are very similar to the sensitivities for larger sets of tRNA and 5S rRNA sequences, 87.0 and 73.8%, respectively, reported previously (Mathews et al., 2004). The Dynalign calculation showed significant improvement in the accuracy of the lowest free energy structure as compared with predictions in a single sequence. In particular, the positive predictive value shows the largest absolute increase. For the tRNA sequences, the positive predictive value improves from 83.6 to 92.7% by using Dynalign and the improvement in 5S rRNA is 61.4 to 82.0%. Relatively modest improvements in the average sensitivity of the best suboptimal secondary structure are observed. For tRNA, the improvement is from 96.2 to 98.8% and for 5S rRNA the improvement is from 96.9 to 97.9%.
|
Energy dot plots
Energy dot plots that indicate the lowest free energy possible for a structure that contains a given base pair and alignment can now be constructed. These are analogous to the dot plots that are produced for secondary structure prediction of a single sequence (Zuker, 1989) and provide a measure of well-definedness for the predicted structure (Zuker and Jacobson, 1995, 1998). Base pairs that are composed of nucleotides involved in many low-energy pairs are poorly defined. The predicted low-energy pairs represent important alternative conformations that the algorithm and thermodynamic parameters cannot exclude. The dot plots are also analogous to the suboptimal alignment dot plots that provide a measure of well-definedness for predicted alignments (Vingron and Argos, 1990; Zuker, 1991). In the Dynalign calculation, these plots are 4D because they simultaneously contain information about structures and alignments. The plots can be projected in two dimensions for convenience by displaying only one type of information. For example, all base pairs for one sequence can be displayed within a given increment of the lowest total free energy, regardless of the alignments. Alternatively, plots could be displayed of the sequence alignment, regardless of the structures to show well-definedness of the alignments at positions involved in base pairing.
Figure 3A shows the energy dot plot for the secondary structure prediction of the Haloferax volcanii 5S rRNA sequence alone. All pairs within 20% free energy of the lowest free energy structure are displayed. Figure 3B shows the energy dot plot for the same H.volcanii 5S rRNA sequence, but with the secondary structure predicted in conjunction with the Mycobactericum phlei 5S rRNA using Dynalign. Again, all pairs within 20% total energy [Equation (1)] of the lowest free energy structure are shown. As expected, the Dynalign calculation shows a significantly more well-defined secondary structure because only those base pairs conserved between both sequences are allowed (with the exception that single base pair inserts are allowed to be embedded in a helix). A total of 823 base pairs occur in suboptimal structures within 20% energy of the lowest free energy structure when using a single sequence, but only 240 bp occur within a 20% energy difference in the Dynalign calculation.
|
For most of the sets of sequences used in this study, the Dynalign dot plots are more well-defined than dot plots for single sequences when examining 10 or 20% energy intervals. This is in spite of the fact that the percentage energy interval corresponds to a larger absolute energy interval for Dynalign because the total energy is the sum of conformational energies for both sequences [Equation (1)]. This is true for the SRP RNA sequences (Table 3), the small set of 5S rRNA sequences, where the improvement is from 619 ± 198 to 384 ± 148 bp on average, and the large set of 5S rRNA sequences (Table 4). For the set of tRNA sequences (Table 4), there is a negligible increase in number of base pairs, from 107 to 116. The fact that there is no improvement in the well-definedness of tRNA secondary structure prediction is probably a result of the fact that predicted tRNA sequences are already, on average, more well-defined than other structural RNA sequences using a single sequence, as has been previously observed using a partition function calculation (Mathews, 2004).
| DISCUSSION |
|---|
|
|
|---|
Dynalign is an algorithm that finds the lowest free energy structure common to two, unaligned RNA sequences. Helices are only allowed if a homologous helix can be made in both sequences, which effectively restricts the structures to those for which the same abstract shape can be generated by both sequences (Giegerich et al., 2004). The advantage to this approach is that, by finding the common structure, the accuracy of structure prediction can be improved significantly. Because Dynalign minimizes a free energy equation [Equation (1)] that does not depend on the sequence identity, the common secondary structure can be found for homologous sequences that have very little sequence similarity as might be found for two sequences from species that are widely separated phylogenetically. This paper describes three advances in the algorithm. First, the algorithm now can predict a set of suboptimal secondary structures. Second, the information used for suboptimal structure prediction can be used to create dot plots, which conveniently summarize the information contained in all the suboptimal structures. This is important because the lowest free energy common structure is not always the correct structure; suboptimal structures suggest alternatives. Also, the dot plot information summarizes the well-definedness of a predicted structure because nucleotides that are involved in many competing pairs with similar free energy cannot be confidently assigned to a single base pair. Nucleotides are predicted to participate in multiple pairs in low-free energy structures because of limitations in the nearest neighbor parameters (Mathews et al., 2004) or because the RNA sequence populates multiple conformations (Baumstark et al., 1997; Schultes and Bartel, 2000).
The third advance to the algorithm is that constraints can be applied to structure prediction. Constraints from chemical modification probing experiments (Ehresmann et al., 1987) and enzymatic cleavage data (Knapp, 1989) can be used to constrain the structure prediction and previous studies have shown that these data can improve the accuracy and the well-definedness of structure prediction for a single sequence (Mathews, 2004; Mathews et al., 1999, 2004). Other constraints that might derive from comparative analysis can also be applied (Pace et al., 1999), including forcing base pairs, forbidding specific base pairs and forcing two nucleotides to align.
The Dynalign algorithm is guaranteed to find the lowest free energy structure, without pseudoknots, common to two unaligned sequences because it is a dynamic programming algorithm and it uses the free energy nearest neighbor model. As shown by Table 1, however, this is a costly algorithm in terms of computer time and memory. In practice, the calculation is limited to sequences up to 400 nt in length. In the short-term, emphasis will need to be placed on finding reasonable heuristics to improve the algorithm's performance for longer sequences. It is also expected that continued advances in computer hardware will serve to make this calculation relatively less costly in the near future.
The method described here for generating suboptimal structures and alignments produces a set of representative structures, but is not designed to predict all structures within a given energy increment of the lowest free energy structure. The window parameters, introduced above, further constrain the production of suboptimal structures so that a diverse representative set is predicted. The dot plots, on the other hand, are comprehensive in that all pairs in any possible structure within a given energy increment of the lowest free energy structure are shown.
A method could be devised to produce all suboptimal structures common to two sequences by generalizing methods for structure prediction (Wuchty et al., 1999) or sequence alignment (Waterman, 1983). In general, however, these methods generate an exponentially increasing number of suboptimal solutions as the energy increment is increased (Vingron, 1996; Wuchty et al., 1999). Another possibility would be to sample structures from a partition function implementation of Dynalign, extrapolating dynamic programming methods from single sequences (Ding and Lawrence, 2003; Mathews, 2004) and alignments to two sequences. The exhaustive sampling and partition function algorithms require that the dynamic programming recursions for filling the arrays are non-redundant, i.e. each possible structure is considered once and only once (Mathews et al., 2004; Wuchty et al., 1999). This would necessitate additional arrays of O(N4) size.
Although Dynalign does not explicitly predict structures with pseudoknots, dot plots can show putative helices that form pseudoknots (Gaspin and Westhof, 1995). It would be possible to use the dot plot and alignment output from Dynalign as input to a Maximum Weight Matching algorithm that is capable of predicting secondary structures with pseudoknots (Ruan et al., 2004).
Dynalign is available as C++ code for local compilation on Linux/Unix machines or as a user-friendly program as part of the RNAstructure software package for Microsoft Windows. Both can be downloaded from the Mathews lab website athttp://rna.urmc.rochester.edu
Received on December 21, 2004; revised on February 8, 2005; accepted on February 20, 2005
| REFERENCES |
|---|
|
|
|---|
Bachellerie, J.P., et al. (2002) The expanding snoRNA world. Biochimie, 84, 775790[Medline].
Ban, N., et al. (2000) The complete atomic structure of the large ribosomal subunit at 2.4 Å resolution. Science, 289, 905920
Baumstark, T., et al. (1997) Viroid processing: switch from cleavage to ligation is driven by a change from tetraloop to a loop A conformation. EMBO J., 16, 599610[CrossRef][ISI][Medline].
Chen, J., et al. (2000) Prediction of common secondary structures of RNAs: a genetic algorithm approach. Nucleic Acids Res., 28, 991999
Ding, Y. and Lawrence, C.E. (2003) A statistical sampling algorithm for RNA secondary structure prediction. Nucleic Acids Res., 31, 72807301
Doudna, J. and Cech, T. (2002) The chemical repertoire of natural ribozymes. Nature, 418, 222228[CrossRef][Medline].
Dowell, R.D. and Eddy, S.R. (2004) Evaluation of several lightweight stochastic context-free grammars for RNA secondary structure prediction. BMC Bioinformatics, 5, 71[CrossRef][Medline].
Ehresmann, C., et al. (1987) Probing the structure of RNAs in solution. Nucleic Acids Res., 15, 91099128
Gardner, P.P. and Giegerich, R. (2004) A comprehensive comparison of comparative RNA structure prediction approaches. BMC Bioinformatics, 5, 140[CrossRef][Medline].
Gaspin, C. and Westhof, E. (1995) An interactive framework for RNA secondary structure prediction with a dynamical treatment of constraints. J. Mol. Biol., 254, 163174[CrossRef][ISI][Medline].
Giegerich, R., et al. (2004) Abstract shapes of RNA. Nucleic Acids Res., 32, 48434851
Gorodkin, J., et al. (1997) Finding the most significant common sequence and structure in a set of RNA sequences. Nucleic Acids Res., 25, 37243732
Gutell, R.R., et al. (2002) The accuracy of ribosomal RNA comparative structure models. Curr. Opin. Struct. Biol., 12, 301310[CrossRef][ISI][Medline].
Hansen, J.L., et al. (2002) Structural insights into peptide bond formation. Proc. Natl Acad. Sci. USA, 99, 1167011675
Hofacker, I.L. (2003) Vienna RNA secondary structure server. Nucleic Acids Res., 31, 34293431
Hofacker, I.L., et al. (2002) Secondary structure prediction for aligned RNA sequences. J. Mol. Biol., 319, 10591066[CrossRef][ISI][Medline].
Hofacker, I.L., et al. (2004) Alignment of RNA base pairing probability matrices. Bioinformatics, 20, 22222227
Knapp, G. (1989) Enzymatic approaches to probing RNA secondary and tertiary structure. Methods Enzymol., 180, 192212[ISI][Medline].
Knight, R., et al. (2004) BayesFold: rational 2° folds that combine thermodynamic, covariation, and chemical data for aligned RNA sequences. RNA, 10, 13231336
Larsen, N., et al. (1998) The signal recognition particle database (SRPDB). Nucleic Acids Res., 26, 177178
Lück, R., et al. (1999) ConStruct: a tool for thermodynamic controlled prediction of conserved secondary structure. Nucleic Acids Res., 27, 42084217
Mathews, D.H. (2004) Using an RNA secondary structure partition function to determine confidence in base pairs predicted by free energy minimization. RNA, 10, 11781190
Mathews, D.H. and Turner, D.H. (2002) Dynalign: an algorithm for finding the secondary structure common to two RNA sequences. J. Mol. Biol., 317, 191203[CrossRef][ISI][Medline].
Mathews, D.H., et al. (1999) Expanded sequence dependence of thermodynamic parameters provides improved prediction of RNA secondary structure. J. Mol. Biol., 288, 911940[CrossRef][ISI][Medline].
Mathews, D.H., et al. (2004) Incorporating chemical modification constraints into a dynamic programming algorithm for prediction of RNA secondary structure. Proc. Natl Acad. Sci. USA, 101, 72877292
Meister, G. and Tuschl, T. (2004) Mechanisms of gene silencing by double-stranded RNA. Nature, 431, 343349[CrossRef][Medline].
Pace, N.R., Thomas, B.C., Woese, C.R. (1999) Probing RNA structure, function, and history by comparative analysis. In Gesteland, R.F., Cech, T.R., Atkins, J.F. (Eds.). The RNA World, 2nd edn., , NY Cold Spring Harbor Laboratory Press, pp. 113141.
Perriquet, O., et al. (2003) Finding the common structure shared by two homologous RNAs. Bioinformatics, 19, 108116
Ruan, J., et al. (2004) An iterated loop matching approach to the prediction of RNA secondary structures with pseudoknots. Bioinformatics, 20, 5866
Sankoff, D. (1985) Simultaneous solution of the RNA folding, alignment and protosequence problems. SIAM J. Appl. Math., 45, 810825[CrossRef].
Schultes, E.A. and Bartel, D.P. (2000) One sequence, two ribozymes: implications for emergence of new ribozyme folds. Science, 289, 448452
Sprinzl, M., et al. (1998) Compilation of tRNA sequences and sequences of tRNA genes. Nucleic Acids Res., 26, 148153
Szymanski, M., et al. (2000) 5S ribosomal RNA database Y2K. Nucleic Acids Res., 28, 166167
Tinoco, I., Jr and Bustamante, C. (1999) How RNA folds. J. Mol. Biol., 293, 271281[CrossRef][ISI][Medline].
Vingron, M. (1996) Near-optimal sequence alignment. Curr. Opin. Struct. Biol., 6, 346352[CrossRef][ISI][Medline].
Vingron, M. and Argos, P. (1990) Determination of reliable regions in protein sequence alignments. Protein Eng., 3, 565569
Walter, P. and Blobel, G. (1982) Signal recognition particle contains a 7S RNA essential for protein translocation across the endoplasmic reticulum. Nature, 299, 691698[CrossRef][Medline].
Waterman, M.S. (1983) Sequence alignments in the neighborhood of the optimum with general application to dynamic programming. Proc. Natl Acad. Sci. USA, 80, 31233124
Wuchty, S., et al. (1999) Complete suboptimal folding of RNA and the stability of secondary structures. Biopolymers, 49, 145165[CrossRef][ISI][Medline].
Xia, T., et al. (1998) Thermodynamic parameters for an expanded nearest-neighbor model for formation of RNA duplexes with WatsonCrick pairs. Biochemistry, 37, 1471914735[CrossRef][Medline].
Zaug, A.J. and Cech, T.R. (1995) Analysis of the structure of Tetrahymena nuclear RNAs in vivo: telomerase RNA, the self-splicing rRNA Intron, and U2 snRNA. RNA, 1, 363374[Abstract].
Zuker, M. (1989) On finding all suboptimal foldings of an RNA molecule. Science, 244, 4852
Zuker, M. (1991) Suboptimal sequence alignment in molecular biology. Alignment with error analysis. J. Mol. Biol., 221, 403420[CrossRef][ISI][Medline].
Zuker, M. and Jacobson, A.B. (1995) Well-determined regions in RNA secondary structure predictions. Applications to small and large subunit rRNA. Nucleic Acids Res., 23, 27912798
Zuker, M. and Jacobson, A.B. (1998) Using reliability information to annotate RNA secondary structures. RNA, 4, 669679[Abstract].
This article has been cited by other articles:
![]() |
A. Wilm, D. G. Higgins, and C. Notredame R-Coffee: a method for multiple alignment of non-coding RNA Nucleic Acids Res., May 1, 2008; 36(9): e52 - e52. [Abstract] [Full Text] [PDF] |
||||
![]() |
R. Kalendar, J. Tanskanen, W. Chang, K. Antonius, H. Sela, O. Peleg, and A. H. Schulman Cassandra retrotransposons carry independently transcribed 5S RNA PNAS, April 15, 2008; 105(15): 5833 - 5838. [Abstract] [Full Text] [PDF] |
||||
![]() |
I. M. Meyer A practical guide to the art of RNA gene prediction Brief Bioinform, November 1, 2007; 8(6): 396 - 414. [Abstract] [Full Text] [PDF] |
||||
![]() |
E. S. Andersen, A. Lind-Thomsen, B. Knudsen, S. E. Kristensen, J. H. Havgaard, E. Torarinsson, N. Larsen, C. Zwieb, P. Sestoft, J. Kjems, et al. Semiautomated improvement of RNA alignments RNA, November 1, 2007; 13(11): 1850 - 1859. [Abstract] [Full Text] [PDF] |
||||
![]() |
T. TAMURA and T. AKUTSU Approximation Algorithms for Optimal RNA Secondary Structures Common to Multiple Sequences IEICE Trans A: Fundamentals, May 1, 2007; E90-A(5): 917 - 923. [Abstract] [PDF] |
||||
![]() |
Z. J. Lu, D. H. Turner, and D. H. Mathews A set of nearest neighbor parameters for predicting the enthalpy change of RNA secondary structure formation Nucleic Acids Res., October 18, 2006; 34(17): 4912 - 4924. [Abstract] [Full Text] [PDF] |
||||
![]() |
S. D. Baird, M. Turcotte, R. G. Korneluk, and M. Holcik Searching for IRES RNA, October 1, 2006; 12(10): 1755 - 1785. [Abstract] [Full Text] [PDF] |
||||
![]() |
Y. Tabei, K. Tsuda, T. Kin, and K. Asai SCARNA: fast and accurate structural alignment of RNA sequences by matching fixed-length stem fragments Bioinformatics, July 15, 2006; 22(14): 1723 - 1729. [Abstract] [Full Text] [PDF] |
||||
![]() |
D. Dalli, A. Wilm, I. Mainz, and G. Steger STRAL: progressive alignment of non-coding RNA using base pairing probability vectors in quadratic time Bioinformatics, July 1, 2006; 22(13): 1593 - 1599. [Abstract] [Full Text] [PDF] |
||||
| ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||









