Bioinformatics Vol. 15 no. 12 1999
Pages 980-986
© 1999 Oxford University Press
Segmentation of yeast DNA using hidden Markov models
1 Computer Science, Brown University,
Providence, RI 02912, USA
2 State Scientific Center for Biotechnology,
NIIGenetika, Moscow 113545, Russia
Motivation: Compositionally homogeneous segments of genomic DNA often correspond to meaningful biological units. Simple sliding window analysis is usually insufficient for compositional segmentation of natural sequences. Hidden Markov models (HMM) with a small number of states are a natural language for description of compositional properties of chromosome-size DNA sequences.
Results: The algorithms were applied to yeast Saccharomyces cerevisiae chromosomes (YC) I, III, IV, VI and IX. The optimal number of HMM states is found to be four. The optimal four-state HMMs for all chromosomes are very similar, as well as the reconstructed segmentations. In most cases the models with k + 1 states are obtained by splitting one of the states in the model with k states, and the corresponding increase of the level of detail in segmentation. The high AT states usually correspond to intergenic regions. We also explore the models likelihood landscape and analyze the dynamics of the optimization process, thus addressing the problem of reliability of the obtained optima and efficiency of the algorithms.
Availability: The system is available on request from the first author.
Contact: ldp{at}cs.brown.edu
Received on September 9, 1998
; revised on June 9, 1999
; accepted on June 23, 1999
![]()
CiteULike
Connotea
Del.icio.us What's this?
This article has been cited by other articles:
![]() |
S. Tempel, M. Giraud, D. Lavenier, I.-C. Lerman, A.-S. Valin, I. Couee, A. E. Amrani, and J. Nicolas Domain organization within repeated DNA sequences: application to the study of a family of transposable elements Bioinformatics, August 15, 2006; 22(16): 1948 - 1954. [Abstract] [Full Text] [PDF] |
||||
![]() |
F. Gao and C.-T. Zhang GC-Profile: a web-based tool for visualizing and analyzing the variation of GC content in genomic sequences. Nucleic Acids Res., July 1, 2006; 34(Web Server issue): W686 - W691. [Abstract] [Full Text] [PDF] |
||||
![]() |
L. Gueguen Sarment: Python modules for HMM analysis and partitioning of sequences Bioinformatics, August 15, 2005; 21(16): 3427 - 3428. [Abstract] [Full Text] [PDF] |
||||
![]() |
P. Nicolas, L. Bize, F. Muri, M. Hoebeke, F. Rodolphe, S. D. Ehrlich, B. Prum, and P. Bessieres Mining Bacillus subtilis chromosome heterogeneities using hidden Markov models Nucleic Acids Res., March 15, 2002; 30(6): 1418 - 1426. [Abstract] [Full Text] [PDF] |
||||

