ExonHunter: a comprehensive approach to gene finding
a Brejová *
Vina
School of Computer Science, University of Waterloo 200 University Avenue West, Waterloo, ON, Canada N2L 3G1
*To whom correspondence should be addressed.
Motivation: We present ExonHunter, a new and comprehensive gene finding system that outperforms existing systems and features several new ideas and approaches. Our system combines numerous sources of information (genomic sequences, expressed sequence tags and protein databases of related species) into a gene finder based on a hidden Markov model in a novel and systematic way. In our framework, various sources of information are expressed as partial probabilistic statements about positions in the sequence and their annotation. We then combine these into the final prediction via a quadratic programming method, which we show to be an extension of existing methods. Allowing only partial statements is key to our transparent handling of missing information and coping with the heterogeneous character of individual sources of information. In addition, we give a new method for modeling the length distribution of intergenic regions in hidden Markov models.
Results: On a commonly used test set, ExonHunter performs significantly better than the existing gene finders ROSETTA, SLAM and TWINSCAN, with more than two-thirds of genes predicted completely correctly.
Availability: Supplementary material available at http://www.bioinformatics.uwaterloo.ca/supplements/05eh/
Contact: bbrejova{at}uwaterloo.ca
Received on January 15, 2005; accepted on March 27, 2005
This article has been cited by other articles:
![]() |
A. Vaughan, S.-Y. Chiu, G. Ramasamy, L. Li, M. J. Gardner, A. S. Tarun, S. H.I. Kappe, and X. Peng Assessment and improvement of the Plasmodium yoelii yoelii genome annotation through comparative analysis Bioinformatics, July 1, 2008; 24(13): i383 - i389. [Abstract] [PDF] |
||||
![]() |
Q. Liu, A. J. Mackey, D. S. Roos, and F. C. N. Pereira Evigan: a hidden variable model for integrating gene evidence for eukaryotic gene prediction Bioinformatics, March 1, 2008; 24(5): 597 - 605. [Abstract] [Full Text] [PDF] |
||||
![]() |
M. Stanke, M. Diekhans, R. Baertsch, and D. Haussler Using native and syntenically mapped cDNA alignments to improve de novo gene finding Bioinformatics, March 1, 2008; 24(5): 637 - 644. [Abstract] [Full Text] [PDF] |
||||
![]() |
X. Cui, T. Vinar, B. Brejova, D. Shasha, and M. Li Homology search for genes Bioinformatics, July 1, 2007; 23(13): i97 - i103. [Abstract] [Full Text] [PDF] |
||||
