Bioinformatics Advance Access originally published online on November 2, 2005
Bioinformatics 2006 22(1):35-39; doi:10.1093/bioinformatics/bti743
| ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
Sequence-based heuristics for faster annotation of non-coding RNA families
1Department of Computer Science & Engineering Seattle, WA 98195, USA
2Department of Genome Sciences, University of Washington Seattle, WA 98195, USA
*To whom correspondence should be addressed.
Motivation: Non-coding RNAs (ncRNAs) are functional RNA molecules that do not code for proteins. Covariance Models (CMs) are a useful statistical tool to find new members of an ncRNA gene family in a large genome database, using both sequence and, importantly, RNA secondary structure information. Unfortunately, CM searches are extremely slow. Previously, we created rigorous filters, which provably sacrifice none of a CM's accuracy, while making searches significantly faster for virtually all ncRNA families. However, these rigorous filters make searches slower than heuristics could be.
Results: In this paper we introduce profile HMM-based heuristic filters. We show that their accuracy is usually superior to heuristics based on BLAST. Moreover, we compared our heuristics with those used in tRNAscan-SE, whose heuristics incorporate a significant amount of work specific to tRNAs, where our heuristics are generic to any ncRNA. Performance was roughly comparable, so we expect that our heuristics provide a high-quality solution thatunlike family-specific solutionscan scale to hundreds of ncRNA families.
Availability: The source code is available under GNU Public License at the supplementary web site.
Contact: zasha{at}cs.washington.edu
Supplementary information: http://bio.cs.washington.edu/supplements/zasha-HeurHmm-2004/ (Technical details, results, C++ code)
Received on December 13, 2004; revised on October 13, 2005; accepted on October 22, 2005
This article has been cited by other articles:
![]() |
Z. Weinberg, E. E. Regulski, M. C. Hammond, J. E. Barrick, Z. Yao, W. L. Ruzzo, and R. R. Breaker The aptamer core of SAM-IV riboswitches mimics the ligand-binding site of SAM-I riboswitches RNA, May 1, 2008; 14(5): 822 - 828. [Abstract] [Full Text] [PDF] |
||||
![]() |
E. Torarinsson, Z. Yao, E. D. Wiklund, J. B. Bramsen, C. Hansen, J. Kjems, N. Tommerup, W. L. Ruzzo, and J. Gorodkin Comparative genomics beyond sequence-based alignments: RNA structures in the ENCODE regions Genome Res., February 1, 2008; 18(2): 242 - 251. [Abstract] [Full Text] [PDF] |
||||
![]() |
I. M. Meyer A practical guide to the art of RNA gene prediction Brief Bioinform, November 1, 2007; 8(6): 396 - 414. [Abstract] [Full Text] [PDF] |
||||
![]() |
J. N. Kim, A. Roth, and R. R. Breaker Guanine riboswitch variants from Mesoplasma florum selectively recognize 2'-deoxyguanosine PNAS, October 9, 2007; 104(41): 16092 - 16097. [Abstract] [Full Text] [PDF] |
||||
![]() |
Z. Weinberg, J. E. Barrick, Z. Yao, A. Roth, J. N. Kim, J. Gore, J. X. Wang, E. R. Lee, K. F. Block, N. Sudarsan, et al. Identification of 22 candidate structured RNAs in bacteria using the CMfinder comparative genomics pipeline Nucleic Acids Res., July 9, 2007; (2007) gkm487v1. [Abstract] [Full Text] [PDF] |
||||
![]() |
R. Welz and R. R. Breaker Ligand binding and gene control characteristics of tandem riboswitches in Bacillus anthracis RNA, April 1, 2007; 13(4): 573 - 582. [Abstract] [Full Text] [PDF] |
||||
![]() |
E. Puerta-Fernandez, J. E. Barrick, A. Roth, and R. R. Breaker Identification of a large noncoding RNA in extremophilic eubacteria PNAS, December 19, 2006; 103(51): 19490 - 19495. [Abstract] [Full Text] [PDF] |
||||
![]() |
N. Sudarsan, M. C. Hammond, K. F. Block, R. Welz, J. E. Barrick, A. Roth, and R. R. Breaker Tandem Riboswitch Architectures Exhibit Complex Gene Control Functions Science, October 13, 2006; 314(5797): 300 - 304. [Abstract] [Full Text] [PDF] |
||||





