| ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
Bioinformatics Vol. 16 no. 4 2000
Pages 367-371
© 2000 Oxford University Press
On the convergence of a clustering algorithm for protein-coding regions in microbial genomes
1 Department of Information and Computer Science, University of California, Irvine, CA 92697-3425, USA
Received on October 3, 1999
; revised on November 1, 1999
; accepted on November 1, 1999
1Also at the Department of Biological Chemistry, College of Medicine, University of California, Irvine, USA. To whom all correspondence should be addressed.
Motivation: As the number of fully sequenced prokaryotic genomes continues to grow rapidly, computational methods for reliably detecting protein-coding regions become even more important. Audic and Claverie (1998) Proc. Natl Acad. Sci. USA , 95, 1002610031, have proposed a clustering algorithm for protein-coding regions in microbial genomes. The algorithm is based on three Markov models of order k associated with subsequences extracted from a given genome. The parameters of the three Markov models are recursively updated by the algorithm which, in simulations, always appear to converge to a unique stable partition of the genome. The partition corresponds to three kinds of regions: (1) coding on the direct strand, (2) coding on the complementary strand, (3) non-coding.
Results: Here we provide an explanation for the convergence of the algorithm by observing that it is essentially a form of the expectation maximization (EM) algorithm applied to the corresponding mixture model. We also provide a partial justification for the uniqueness of the partition based on identifiability. Other possible variations and improvements are briefly discussed.
Contact: pfbaldi{at}ics.uci.edu
![]()
CiteULike
Connotea
Del.icio.us What's this?
This article has been cited by other articles:
![]() |
P. Singhal, B. Jayaram, S. B. Dixit, and D. L. Beveridge Prokaryotic Gene Finding Based on Physicochemical Characteristics of Codons Calculated from Molecular Dynamics Simulations Biophys. J., June 1, 2008; 94(11): 4173 - 4183. [Abstract] [Full Text] [PDF] |
||||
![]() |
H. Noguchi, J. Park, and T. Takagi MetaGene: prokaryotic gene finding from environmental genome shotgun sequences Nucleic Acids Res., November 14, 2006; 34(19): 5623 - 5630. [Abstract] [Full Text] [PDF] |
||||
![]() |
A. Lomsadze, V. Ter-Hovhannisyan, Y. O. Chernoff, and M. Borodovsky Gene identification in novel eukaryotic genomes by self-training algorithm Nucleic Acids Res., November 28, 2005; 33(20): 6494 - 6506. [Abstract] [Full Text] [PDF] |
||||
![]() |
F. VALAFAR Pattern Recognition Techniques in Microarray Data Analysis: A Survey Ann. N.Y. Acad. Sci., December 1, 2002; 980(1): 41 - 64. [Abstract] [Full Text] [PDF] |
||||
![]() |
J. Besemer, A. Lomsadze, and M. Borodovsky GeneMarkS: a self-training method for prediction of gene starts in microbial genomes. Implications for finding sequence motifs in regulatory regions Nucleic Acids Res., June 15, 2001; 29(12): 2607 - 2618. [Abstract] [Full Text] [PDF] |
||||


