Bioinformatics Advance Access originally published online on December 20, 2006
Bioinformatics 2007 23(4):434-441; doi:10.1093/bioinformatics/btl636
| ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
Robust prediction of consensus secondary structures using averaged base pairing probability matrices
1 Computational Biology Research Center, National Institute of Advanced Industrial Science and Technology (AIST) 2-42 Aomi, Koto-ku, Tokyo, 135-0064, Japan
2 Graduate School of Information Sciences, Nara Institute of Science and Technology 8916-5 Takayama-cho, Ikoma, Nara 630-0192, Japan
3 Department of Computational Biology, Faculty of Frontier Science The University of Tokyo, 5-1-5 Kashiwanoha, Kashiwa, Chiba 277-8561, Japan
*To whom correspondence should be addressed.
| Abstract |
|---|
Motivation: Recent transcriptomic studies have revealed the existence of a considerable number of non-protein-coding RNA transcripts in higher eukaryotic cells. To investigate the functional roles of these transcripts, it is of great interest to find conserved secondary structures from multiple alignments on a genomic scale. Since multiple alignments are often created using alignment programs that neglect the special conservation patterns of RNA secondary structures for computational efficiency, alignment failures can cause potential risks of overlooking conserved stem structures.
Results: We investigated the dependence of the accuracy of secondary structure prediction on the quality of alignments. We compared three algorithms that maximize the expected accuracy of secondary structures as well as other frequently used algorithms. We found that one of our algorithms, called McCaskill-MEA, was more robust against alignment failures than others. The McCaskill-MEA method first computes the base pairing probability matrices for all the sequences in the alignment and then obtains the base pairing probability matrix of the alignment by averaging over these matrices. The consensus secondary structure is predicted from this matrix such that the expected accuracy of the prediction is maximized. We show that the McCaskill-MEA method performs better than other methods, particularly when the alignment quality is low and when the alignment consists of many sequences. Our model has a parameter that controls the sensitivity and specificity of predictions. We discussed the uses of that parameter for multi-step screening procedures to search for conserved secondary structures and for assigning confidence values to the predicted base pairs.
Availability: The C++ source code that implements the McCaskill-MEA algorithm and the test dataset used in this paper are available at http://www.ncrna.org/papers/McCaskillMEA/
Contact: kiryu-h{at}aist.go.jp
Supplementary information: Supplementary data are available at Bioinformatics online.
Associate Editor: Charlie Hodgman
Received on September 9, 2006; revised on December 11, 2006; accepted on December 12, 2006
This article has been cited by other articles:
![]() |
S. E. Seemann, J. Gorodkin, and R. Backofen Unifying evolutionary and thermodynamic information for RNA folding of multiple alignments Nucleic Acids Res., October 4, 2008; (2008) gkn544v1. [Abstract] [Full Text] [PDF] |
||||
![]() |
K. Katoh and H. Toh Recent developments in the MAFFT multiple sequence alignment program Brief Bioinform, July 1, 2008; 9(4): 286 - 298. [Abstract] [Full Text] [PDF] |
||||
![]() |
K. Asai, H. Kiryu, M. Hamada, Y. Tabei, K. Sato, H. Matsui, Y. Sakakibara, G. Terai, and T. Mituyama Software.ncrna.org: web servers for analyses of RNA sequences Nucleic Acids Res., July 1, 2008; 36(suppl_2): W75 - W78. [Abstract] [Full Text] [PDF] |
||||
![]() |
H. Kiryu, T. Kin, and K. Asai Rfold: an exact algorithm for computing local base pairing probabilities Bioinformatics, February 1, 2008; 24(3): 367 - 373. [Abstract] [Full Text] [PDF] |
||||


