Bioinformatics Vol. 18 no. 10 2002
Pages 1309-1318
© 2002 Oxford University Press
Comparative ab initio prediction of gene structures using pair HMMs
1 The Wellcome Trust Sanger Institute, Wellcome Trust Genome Campus, Hinxton, Cambridge CB10 1SA, UK
Received on January 23, 2002
; revised on April 19, 2002
; accepted on April 25, 2002
We present a novel comparative method for the ab initio prediction of protein coding genes in eukaryotic genomes. The method simultaneously predicts the gene structures of two un-annotated input DNA sequences which are homologous to each other and retrieves the subsequences which are conserved between the two DNA sequences. It is capable of predicting partial, complete and multiple genes and can align pairs of genes which differ by events of exon-fusion or exon-splitting.
The method employs a probabilistic pair hidden Markov model. We generate annotations using our model with two different algorithms: the Viterbi algorithm in its linear memory implementation and a new heuristic algorithm, called the stepping stone, for which both memory and time requirements scale linearly with the sequence length.
We have implemented the model in a computer program called DOUBLESCAN. In this article, we introduce the method and confirm the validity of the approach on a test set of 80 pairs of orthologous DNA sequences from mouse and human.
More information can be found at: http://www.sanger.ac.uk/Software/analysis/doublescan/
Contact: im1{at}sanger.ac.uk rd{at}sanger.ac.uk
* To whom correspondence should be addressed.
![]()
CiteULike
Connotea
Del.icio.us What's this?
This article has been cited by other articles:
![]() |
M. Stanke, M. Diekhans, R. Baertsch, and D. Haussler Using native and syntenically mapped cDNA alignments to improve de novo gene finding Bioinformatics, March 1, 2008; 24(5): 637 - 644. [Abstract] [Full Text] [PDF] |
||||
![]() |
S. de Groot, T. Mailund, and J. Hein Comparative annotation of viral genomes with non-conserved gene structure Bioinformatics, May 1, 2007; 23(9): 1080 - 1089. [Abstract] [Full Text] [PDF] |
||||
![]() |
E. Keibler, M. Arumugam, and M. R. Brent The Treeterbi and Parallel Treeterbi algorithms: efficient, optimal decoding for ordinary, generalized and pair HMMs Bioinformatics, March 1, 2007; 23(5): 545 - 554. [Abstract] [Full Text] [PDF] |
||||
![]() |
K. Knapp and Y.-P. P. Chen An evaluation of contemporary hidden Markov model genefinders with a predicted exon taxonomy Nucleic Acids Res., January 12, 2007; 35(1): 317 - 324. [Abstract] [Full Text] [PDF] |
||||
![]() |
A. Lomsadze, V. Ter-Hovhannisyan, Y. O. Chernoff, and M. Borodovsky Gene identification in novel eukaryotic genomes by self-training algorithm Nucleic Acids Res., November 28, 2005; 33(20): 6494 - 6506. [Abstract] [Full Text] [PDF] |
||||
![]() |
R. H. Brown, S. S. Gross, and M. R. Brent Begin at the beginning: Predicting genes with 5' UTRs Genome Res., May 1, 2005; 15(5): 742 - 747. [Abstract] [Full Text] [PDF] |
||||
![]() |
W. H. Majoros, M. Pertea, and S. L. Salzberg Efficient implementation of a generalized pair hidden Markov model for comparative gene finding Bioinformatics, May 1, 2005; 21(9): 1782 - 1788. [Abstract] [Full Text] [PDF] |
||||
![]() |
L. Taher, O. Rinner, S. Garg, A. Sczyrba, and B. Morgenstern AGenDA: gene prediction by cross-species sequence comparison Nucleic Acids Res., July 1, 2004; 32(suppl_2): W305 - W308. [Abstract] [Full Text] [PDF] |
||||
![]() |
M. Stanke, R. Steinkamp, S. Waack, and B. Morgenstern AUGUSTUS: a web server for gene finding in eukaryotes Nucleic Acids Res., July 1, 2004; 32(suppl_2): W309 - W312. [Abstract] [Full Text] [PDF] |
||||
![]() |
E. Birney, M. Clamp, and R. Durbin GeneWise and Genomewise Genome Res., May 1, 2004; 14(5): 988 - 995. [Abstract] [Full Text] [PDF] |
||||
![]() |
I. M. Meyer and R. Durbin Gene structure conservation aids similarity based gene prediction Nucleic Acids Res., February 4, 2004; 32(2): 776 - 783. [Abstract] [Full Text] [PDF] |
||||
![]() |
J. E. Moore and J. A. Lake Gene structure prediction in syntenic DNA segments Nucleic Acids Res., December 15, 2003; 31(24): 7271 - 7279. [Abstract] [Full Text] [PDF] |
||||
![]() |
L. Zhang, V. Pavlovic, C. R Cantor, and S. Kasif Human-Mouse Gene Identification by Comparative Evidence Integration and Evolutionary Analysis Genome Res., June 1, 2003; 13(6): 1190 - 1202. [Abstract] [Full Text] [PDF] |
||||
![]() |
P. Flicek, E. Keibler, P. Hu, I. Korf, and M. R. Brent Leveraging the Mouse Genome for Gene Prediction in Human: From Whole-Genome Shotgun Reads to a Global Synteny Map Genome Res., January 1, 2003; 13(1): 46 - 54. [Abstract] [Full Text] [PDF] |
||||
![]() |
G. Parra, P. Agarwal, J. F. Abril, T. Wiehe, J. W. Fickett, and R. Guigo Comparative Gene Prediction in Human and Mouse Genome Res., January 1, 2003; 13(1): 108 - 117. [Abstract] [Full Text] [PDF] |
||||


