Bioinformatics Advance Access originally published online on December 15, 2005
Bioinformatics 2006 22(4):445-452; doi:10.1093/bioinformatics/btk008
| ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
CMfindera covariance model based RNA motif finding algorithm
1Department of Computer Science and Engineering, University of Washington Seattle WA 98195-2350, USA
2Department of Genome Sciences, University of Washington Seattle WA 98195-2350, USA
*To whom correspondence should be addressed.
Motivation: The recent discoveries of large numbers of non-coding RNAs and computational advances in genome-scale RNA search create a need for tools for automatic, high quality identification and characterization of conserved RNA motifs that can be readily used for database search. Previous tools fall short of this goal.
Results: CMfinder is a new tool to predict RNA motifs in unaligned sequences. It is an expectation maximization algorithm using covariance models for motif description, featuring novel integration of multiple techniques for effective search of motif space, and a Bayesian framework that blends mutual information-based and folding energy-based approaches to predict structure in a principled way.
Extensive tests show that our method works well on datasets with either low or high sequence similarity, is robust to inclusion of lengthy extraneous flanking sequence and/or completely unrelated sequences, and is reasonably fast and scalable. In testing on 19 known ncRNA families, including some difficult cases with poor sequence conservation and large indels, our method demonstrates excellent average per-base-pair accuracy79% compared with at most 60% for alternative methods. More importantly, the resulting probabilistic model can be directly used for homology search, allowing iterative refinement of structural models based on additional homologs. We have used this approach to obtain highly accurate covariance models of known RNA motifs based on small numbers of related sequences, which identified homologs in deeply-diverged species.
Availability: Results and web server version are available at http://bio.cs.washington.edu/yzizhen/CMfinder/
Contact: yzizhen{at}cs.washington.edu
Supplementary information: Supplementary technical details are available at http://bio.cs.washington.edu/yzizhen/CMfinder/
Received on June 9, 2005; revised on December 12, 2005; accepted on December 13, 2005
This article has been cited by other articles:
![]() |
P. P. Gardner The use of covariance models to annotate RNAs in whole genomes Brief Funct Genomic Proteomic, November 1, 2009; 8(6): 444 - 450. [Abstract] [Full Text] [PDF] |
||||
![]() |
S. H. Bernhart and I. L. Hofacker From consensus structure prediction to RNA gene finding Brief Funct Genomic Proteomic, November 1, 2009; 8(6): 461 - 471. [Abstract] [Full Text] [PDF] |
||||
![]() |
D. Fan, P. B. Bitterman, and O. Larsson Regulatory element identification in subsets of transcripts: Comparison and integration of current computational methods RNA, August 1, 2009; 15(8): 1469 - 1482. [Abstract] [Full Text] [PDF] |
||||
![]() |
Y. Tabei and K. Asai A local multiple alignment method for detection of non-coding RNA sequences Bioinformatics, June 15, 2009; 25(12): 1498 - 1505. [Abstract] [Full Text] [PDF] |
||||
![]() |
D. L. Kolbe and S. R. Eddy Local RNA structure alignment with incomplete sequence Bioinformatics, May 15, 2009; 25(10): 1236 - 1243. [Abstract] [Full Text] [PDF] |
||||
![]() |
L. Childs, Z. Nikoloski, P. May, and D. Walther Identification and classification of ncRNA molecules using graph properties Nucleic Acids Res., May 1, 2009; 37(9): e66 - e66. [Abstract] [Full Text] [PDF] |
||||
![]() |
M. Backlund, K. Paukku, L. Daviet, R. A. De Boer, E. Valo, S. Hautaniemi, N. Kalkkinen, A. Ehsan, K. K. Kontula, and J. Y. A. Lehtonen Posttranscriptional regulation of angiotensin II type 1 receptor expression by glyceraldehyde 3-phosphate dehydrogenase Nucleic Acids Res., April 1, 2009; 37(7): 2346 - 2358. [Abstract] [Full Text] [PDF] |
||||
![]() |
S. E. Seemann, J. Gorodkin, and R. Backofen Unifying evolutionary and thermodynamic information for RNA folding of multiple alignments Nucleic Acids Res., November 1, 2008; 36(20): 6355 - 6362. [Abstract] [Full Text] [PDF] |
||||
![]() |
M. Rabani, M. Kertesz, and E. Segal Computational prediction of RNA structural motifs involved in posttranscriptional regulatory processes PNAS, September 30, 2008; 105(39): 14885 - 14890. [Abstract] [Full Text] [PDF] |
||||
![]() |
K. Katoh and H. Toh Recent developments in the MAFFT multiple sequence alignment program Brief Bioinform, July 1, 2008; 9(4): 286 - 298. [Abstract] [Full Text] [PDF] |
||||
![]() |
E. Torarinsson and S. Lindgreen WAR: Webserver for aligning structural RNAs Nucleic Acids Res., July 1, 2008; 36(suppl_2): W79 - W84. [Abstract] [Full Text] [PDF] |
||||
![]() |
A. Wilm, D. G. Higgins, and C. Notredame R-Coffee: a method for multiple alignment of non-coding RNA Nucleic Acids Res., May 1, 2008; 36(9): e52 - e52. [Abstract] [Full Text] [PDF] |
||||
![]() |
E. Torarinsson, Z. Yao, E. D. Wiklund, J. B. Bramsen, C. Hansen, J. Kjems, N. Tommerup, W. L. Ruzzo, and J. Gorodkin Comparative genomics beyond sequence-based alignments: RNA structures in the ENCODE regions Genome Res., February 1, 2008; 18(2): 242 - 251. [Abstract] [Full Text] [PDF] |
||||
![]() |
I. M. Meyer A practical guide to the art of RNA gene prediction Brief Bioinform, November 1, 2007; 8(6): 396 - 414. [Abstract] [Full Text] [PDF] |
||||
![]() |
E. S. Andersen, A. Lind-Thomsen, B. Knudsen, S. E. Kristensen, J. H. Havgaard, E. Torarinsson, N. Larsen, C. Zwieb, P. Sestoft, J. Kjems, et al. Semiautomated improvement of RNA alignments RNA, November 1, 2007; 13(11): 1850 - 1859. [Abstract] [Full Text] [PDF] |
||||
![]() |
M. Khaladkar, V. Bellofatto, J. T. L. Wang, B. Tian, and B. A. Shapiro RADAR: a web server for RNA data analysis and research Nucleic Acids Res., July 13, 2007; 35(suppl_2): W300 - W304. [Abstract] [Full Text] [PDF] |
||||
![]() |
Z. Weinberg, J. E. Barrick, Z. Yao, A. Roth, J. N. Kim, J. Gore, J. X. Wang, E. R. Lee, K. F. Block, N. Sudarsan, et al. Identification of 22 candidate structured RNAs in bacteria using the CMfinder comparative genomics pipeline Nucleic Acids Res., July 9, 2007; (2007) gkm487v1. [Abstract] [Full Text] [PDF] |
||||
![]() |
E. Torarinsson, J. H. Havgaard, and J. Gorodkin Multiple structural alignment and clustering of RNA sequences Bioinformatics, April 15, 2007; 23(8): 926 - 932. [Abstract] [Full Text] [PDF] |
||||
![]() |
E. K. Freyhult, J. P. Bollback, and P. P. Gardner Exploring genomic dark matter: A critical assessment of the performance of homology search methods on noncoding RNA Genome Res., January 1, 2007; 17(1): 117 - 125. [Abstract] [Full Text] [PDF] |
||||
![]() |
E. Puerta-Fernandez, J. E. Barrick, A. Roth, and R. R. Breaker Identification of a large noncoding RNA in extremophilic eubacteria PNAS, December 19, 2006; 103(51): 19490 - 19495. [Abstract] [Full Text] [PDF] |
||||
![]() |
M. Hiller, R. Pudimat, A. Busch, and R. Backofen Using RNA secondary structures to guide sequence motif finding towards single-stranded regions Nucleic Acids Res., October 18, 2006; 34(17): e117 - e117. [Abstract] [Full Text] [PDF] |
||||
![]() |
S. Neph and M. Tompa MicroFootPrinter: a tool for phylogenetic footprinting in prokaryotic genomes. Nucleic Acids Res., July 1, 2006; 34(Web Server issue): W366 - W368. [Abstract] [Full Text] [PDF] |
||||






