CONTRAfold: RNA secondary structure prediction without physics-based models
1 Computer Science Department, Stanford University Stanford, CA 94305, USA
*To whom correspondence should be addressed.
Motivation: For several decades, free energy minimization methods have been the dominant strategy for single sequence RNA secondary structure prediction. More recently, stochastic context-free grammars (SCFGs) have emerged as an alternative probabilistic methodology for modeling RNA structure. Unlike physics-based methods, which rely on thousands of experimentally-measured thermodynamic parameters, SCFGs use fully-automated statistical learning algorithms to derive model parameters. Despite this advantage, however, probabilistic methods have not replaced free energy minimization methods as the tool of choice for secondary structure prediction, as the accuracies of the best current SCFGs have yet to match those of the best physics-based models.
Results: In this paper, we present CONTRAfold, a novel secondary structure prediction method based on conditional log-linear models (CLLMs), a flexible class of probabilistic models which generalize upon SCFGs by using discriminative training and feature-rich scoring. In a series of cross-validation experiments, we show that grammar-based secondary structure prediction methods formulated as CLLMs consistently outperform their SCFG analogs. Furthermore, CONTRAfold, a CLLM incorporating most of the features found in typical thermodynamic models, achieves the highest single sequence prediction accuracies to date, outperforming currently available probabilistic and physics-based techniques. Our result thus closes the gap between probabilistic and thermodynamic models, demonstrating that statistical learning procedures provide an effective alternative to empirical measurement of thermodynamic parameters for RNA secondary structure prediction.
Availability: Source code for CONTRAfold is available at http://contra.stanford.edu/contrafold/.
Contact: chuongdo{at}cs.stanford.edu
This article has been cited by other articles:
![]() |
M. S. Andronescu, C. Pop, and A. E. Condon Improved free energy parameters for RNA pseudoknotted secondary structure prediction RNA, January 1, 2010; 16(1): 26 - 42. [Abstract] [Full Text] [PDF] |
||||
![]() |
D. H. Turner and D. H. Mathews NNDB: the nearest neighbor parameter database for predicting stability of nucleic acid secondary structure Nucleic Acids Res., January 1, 2010; 38(suppl_1): D280 - D282. [Abstract] [Full Text] [PDF] |
||||
![]() |
M. Hamada, K. Sato, H. Kiryu, T. Mituyama, and K. Asai CentroidAlign: fast and accurate aligner for structured RNAs by maximizing expected sum-of-pairs score Bioinformatics, December 15, 2009; 25(24): 3236 - 3243. [Abstract] [Full Text] [PDF] |
||||
![]() |
J. Zhang, J. Dundas, M. Lin, R. Chen, W. Wang, and J. Liang Prediction of geometrically feasible three-dimensional structures of pseudoknotted RNA through free energy estimation RNA, December 1, 2009; 15(12): 2248 - 2263. [Abstract] [Full Text] [PDF] |
||||
![]() |
Z. J. Lu, J. W. Gloor, and D. H. Mathews Improved RNA secondary structure prediction by maximizing expected pair accuracy RNA, October 1, 2009; 15(10): 1805 - 1813. [Abstract] [Full Text] [PDF] |
||||
![]() |
S. J. Schroeder Advances in RNA Structure Prediction from Sequence: New Tools for Generating Hypotheses about Viral RNA Structure-Function Relationships J. Virol., July 1, 2009; 83(13): 6326 - 6334. [Full Text] [PDF] |
||||
![]() |
K. Sato, M. Hamada, K. Asai, and T. Mituyama CENTROIDFOLD: a web server for RNA secondary structure prediction Nucleic Acids Res., July 1, 2009; 37(suppl_2): W277 - W280. [Abstract] [Full Text] [PDF] |
||||
![]() |
K. C. Pang, M. E. Dinger, T. R. Mercer, L. Malquori, S. M. Grimmond, W. Chen, and J. S. Mattick Genome-Wide Identification of Long Noncoding RNAs in CD8+ T Cells J. Immunol., June 15, 2009; 182(12): 7738 - 7748. [Abstract] [Full Text] [PDF] |
||||
![]() |
Y. Tabei and K. Asai A local multiple alignment method for detection of non-coding RNA sequences Bioinformatics, June 15, 2009; 25(12): 1498 - 1505. [Abstract] [Full Text] [PDF] |
||||
![]() |
M. Hamada, K. Sato, H. Kiryu, T. Mituyama, and K. Asai Predictions of RNA secondary structure by combining homologous sequence information Bioinformatics, June 15, 2009; 25(12): i330 - i338. [Abstract] [Full Text] [PDF] |
||||
![]() |
M. Hamada, H. Kiryu, K. Sato, T. Mituyama, and K. Asai Prediction of RNA secondary structure using generalized centroid estimators Bioinformatics, February 15, 2009; 25(4): 465 - 473. [Abstract] [Full Text] [PDF] |
||||
![]() |
M. A. Jonikas, R. J. Radmer, A. Laederach, R. Das, S. Pearlman, D. Herschlag, and R. B. Altman Coarse-grained modeling of large RNA molecules with knowledge-based potentials and structural filters RNA, February 1, 2009; 15(2): 189 - 199. [Abstract] [Full Text] [PDF] |
||||
![]() |
K. E. Deigan, T. W. Li, D. H. Mathews, and K. M. Weeks Accurate SHAPE-directed RNA structure determination PNAS, January 6, 2009; 106(1): 97 - 102. [Abstract] [Full Text] [PDF] |
||||
![]() |
A. Busch, A. S. Richter, and R. Backofen IntaRNA: efficient prediction of bacterial sRNA targets incorporating target site accessibility and seed regions Bioinformatics, December 15, 2008; 24(24): 2849 - 2856. [Abstract] [Full Text] [PDF] |
||||
![]() |
R. K. Bradley, L. Pachter, and I. Holmes Specific alignment of structured RNA: stochastic grammars and sequence annealing Bioinformatics, December 1, 2008; 24(23): 2677 - 2683. [Abstract] [Full Text] [PDF] |
||||
![]() |
S. E. Seemann, J. Gorodkin, and R. Backofen Unifying evolutionary and thermodynamic information for RNA folding of multiple alignments Nucleic Acids Res., November 1, 2008; 36(20): 6355 - 6362. [Abstract] [Full Text] [PDF] |
||||
![]() |
M. Rabani, M. Kertesz, and E. Segal Computational prediction of RNA structural motifs involved in posttranscriptional regulatory processes PNAS, September 30, 2008; 105(39): 14885 - 14890. [Abstract] [Full Text] [PDF] |
||||
![]() |
K. Katoh and H. Toh Recent developments in the MAFFT multiple sequence alignment program Brief Bioinform, July 1, 2008; 9(4): 286 - 298. [Abstract] [Full Text] [PDF] |
||||
![]() |
C. B. Do, C.-S. Foo, and S. Batzoglou A max-margin model for efficient simultaneous alignment and folding of RNA sequences Bioinformatics, July 1, 2008; 24(13): i68 - i76. [Abstract] [Full Text] [PDF] |
||||
![]() |
A. Wilm, D. G. Higgins, and C. Notredame R-Coffee: a method for multiple alignment of non-coding RNA Nucleic Acids Res., May 1, 2008; 36(9): e52 - e52. [Abstract] [Full Text] [PDF] |
||||
![]() |
L. E. Carvalho and C. E. Lawrence Centroid estimation in discrete high-dimensional spaces with applications in biology PNAS, March 4, 2008; 105(9): 3209 - 3214. [Abstract] [Full Text] [PDF] |
||||
![]() |
H. Kiryu, T. Kin, and K. Asai Rfold: an exact algorithm for computing local base pairing probabilities Bioinformatics, February 1, 2008; 24(3): 367 - 373. [Abstract] [Full Text] [PDF] |
||||
![]() |
A. Laederach Informatics challenges in structured RNA Brief Bioinform, September 1, 2007; 8(5): 294 - 303. [Abstract] [Full Text] [PDF] |
||||
![]() |
B. S. Srinivasan, N. H. Shah, J. A. Flannick, E. Abeliuk, A. F. Novak, and S. Batzoglou Current progress in network research: toward reference networks for key model organisms Brief Bioinform, September 1, 2007; 8(5): 318 - 332. [Abstract] [Full Text] [PDF] |
||||
![]() |
X. Xu, Y. Ji, and G. D. Stormo RNA Sampler: a new sampling based algorithm for common RNA secondary structure prediction and structural alignment Bioinformatics, August 1, 2007; 23(15): 1883 - 1891. [Abstract] [Full Text] [PDF] |
||||
![]() |
A. R. Gruber, R. Neubock, I. L. Hofacker, and S. Washietl The RNAz web server: prediction of thermodynamically stable and evolutionarily conserved RNA structures Nucleic Acids Res., July 13, 2007; 35(suppl_2): W335 - W338. [Abstract] [Full Text] [PDF] |
||||
![]() |
M. Andronescu, A. Condon, H. H. Hoos, D. H. Mathews, and K. P. Murphy Efficient parameter estimation for RNA secondary structure prediction Bioinformatics, July 1, 2007; 23(13): i19 - i28. [Abstract] [Full Text] [PDF] |
||||
![]() |
H. Kiryu, Y. Tabei, T. Kin, and K. Asai Murlet: a practical multiple alignment tool for structural RNA sequences Bioinformatics, July 1, 2007; 23(13): 1588 - 1598. [Abstract] [Full Text] [PDF] |
||||
![]() |
E. J. Belfield, R. K. Hughes, N. Tsesmetzis, M. J. Naldrett, and R. Casey The gateway pDEST17 expression vector encodes a -1 ribosomal frameshifting sequence Nucleic Acids Res., February 28, 2007; 35(4): 1322 - 1332. [Abstract] [Full Text] [PDF] |
||||
![]() |
H. Kiryu, T. Kin, and K. Asai Robust prediction of consensus secondary structures using averaged base pairing probability matrices Bioinformatics, February 15, 2007; 23(4): 434 - 441. [Abstract] [Full Text] [PDF] |
||||






