Bioinformatics Vol. 18 no. 1 2002
Pages 19-27
© 2002 Oxford University Press
A Bayesian framework for combining gene predictions*
1
1 Bioinformatics Program, Department of
Bioengineering, Boston University, Boston, MA 02215, USA
2 Beckman Institute, University of Illinois,
Urbana, IL 61801, USA
Received on April 2, 2001
; revised on August 24, 2001
; accepted on September 7, 2001
Motivation: Gene identification and gene discovery in new genomic sequences is one of the most timely computational questions addressed by bioinformatics scientists. This computational research has resulted in several systems that have been used successfully in many whole-genome analysis projects. As the number of such systems grows the need for a rigorous way to combine the predictions becomes more essential.
Results: In this paper we provide a Bayesian network framework for combining gene predictions from multiple systems. The framework allows us to treat the problem as combining the advice of multiple experts. Previous work in the area used relatively simple ideas such as majority voting. We introduce, for the first time, the use of hidden input/output Markov models for combining gene predictions. We apply the framework to the analysis of the Adh region in Drosophila that has been carefully studied in the context of gene finding and used as a basis for the GASP competition. The main challenge in combination of gene prediction programs is the fact that the systems are relying on similar features such as cod on usage and as a result the predictions are often correlated. We show that our approach is promising to improve the prediction accuracy and provides a systematic and flexible framework for incorporating multiple sources of evidence into gene prediction systems.
Availability: Software can be made available on request from the authors.
Contact: vladimir{at}bu.edu
* Part of this research was presented at Computational Genomics 2000, Baltimore, MD, November 2000. Portions of this research were conducted at Compaq Computer Corporation, Cambridge Research Laboratory, Cambridge, MA.
![]()
CiteULike
Connotea
Del.icio.us What's this?
This article has been cited by other articles:
![]() |
Q. Liu, A. J. Mackey, D. S. Roos, and F. C. N. Pereira Evigan: a hidden variable model for integrating gene evidence for eukaryotic gene prediction Bioinformatics, March 1, 2008; 24(5): 597 - 605. [Abstract] [Full Text] [PDF] |
||||
![]() |
S. W. Roy and D. Penny Intron length distributions and gene prediction Nucleic Acids Res., July 9, 2007; 35(14): 4737 - 4742. [Abstract] [Full Text] [PDF] |
||||
![]() |
A. Coghlan and R. Durbin Genomix: a method for combining gene-finders' predictions, which uses evolutionary conservation of sequence and intron exon structure Bioinformatics, June 15, 2007; 23(12): 1468 - 1475. [Abstract] [Full Text] [PDF] |
||||
![]() |
P. Larranaga, B. Calvo, R. Santana, C. Bielza, J. Galdiano, I. Inza, J. A. Lozano, R. Armananzas, G. Santafe, A. Perez, et al. Machine learning in bioinformatics Brief Bioinform, March 1, 2006; 7(1): 86 - 112. [Abstract] [Full Text] [PDF] |
||||
![]() |
L. Florea, V. Di Francesco, J. Miller, R. Turner, A. Yao, M. Harris, B. Walenz, C. Mobarry, G. V. Merkulov, R. Charlab, et al. Gene and alternative splicing annotation with AIR Genome Res., January 1, 2005; 15(1): 54 - 66. [Abstract] [Full Text] [PDF] |
||||
![]() |
B. Issac and G. P. S. Raghava EGPred: Prediction of Eukaryotic Genes Using Ab Initio Methods After Combining With Sequence Similarity Approaches Genome Res., September 1, 2004; 14(9): 1756 - 1766. [Abstract] [Full Text] [PDF] |
||||
![]() |
U. Karaoz, T. M. Murali, S. Letovsky, Y. Zheng, C. Ding, C. R. Cantor, and S. Kasif Whole-genome annotation by using evidence integration in functional-linkage networks PNAS, March 2, 2004; 101(9): 2888 - 2893. [Abstract] [Full Text] [PDF] |
||||
![]() |
J. E. Allen, M. Pertea, and S. L. Salzberg Computational Gene Prediction Using Multiple Sources of Evidence Genome Res., January 1, 2004; 14(1): 142 - 148. [Abstract] [Full Text] [PDF] |
||||
![]() |
L. Zhang, V. Pavlovic, C. R Cantor, and S. Kasif Human-Mouse Gene Identification by Comparative Evidence Integration and Evolutionary Analysis Genome Res., June 1, 2003; 13(6): 1190 - 1202. [Abstract] [Full Text] [PDF] |
||||
![]() |
C. Mathe, M.-F. Sagot, T. Schiex, and P. Rouze Current methods of gene prediction, their strengths and weaknesses Nucleic Acids Res., October 1, 2002; 30(19): 4103 - 4117. [Abstract] [Full Text] [PDF] |
||||
![]() |
M. Walker, V. Pavlovic, and S. Kasif A comparative genomic method for computational identification of prokaryotic translation initiation sites Nucleic Acids Res., July 15, 2002; 30(14): 3181 - 3191. [Abstract] [Full Text] [PDF] |
||||




