Scoring hidden Markov models
Department of Computer Engineering, University of California Santa Cruz, CA 95064, USA
1To whom correspondence should be addressed
MOTIVATION: Statistical sequence comparison techniques, such as hidden Markov models and generalized profiles, calculate the probability that a sequence was generated by a given model. Log-odds scoring is a means of evaluating this probability by comparing it to a null hypothesis, usually a simpler statistical model intended to represent the universe of sequences as a whole, rather than the group of interest. Such scoring leads to two immediate questions: what should the null model be, and what threshold of log-odds score should be deemed a match to the model.
RESULTS: This paper analyses these two issues experimentally. Within the context of the Sequence Alignment and Modeling software suite (SAM), we consider a variety of null models and suitable thresholds. Additionally, we consider HMMer's log-odds scoring and SAM's original Z-scoring method. Among the null model choices, a simple looping null model that emits characters according to the geometric mean of the character probabilities in the columns modeled by the hidden Markov model (HMM) performs well or best across all four discrimination experiments.
AVAILABILITY: Information on obtaining the SAM program suite (free for academic use), as well as a server interface, is available from http://www.cse.ucsc.edu/research/compbio/sam.html. HMMer is freely available from http://genome.wustl.edu/eddy/hmm.html.
CONTACT: E-mail: rph{at}cse.ucsc.edu
Received on September 17, 1996; accepted on November 7, 1996
This article has been cited by other articles:
![]() |
A. Stojmirovic, E. M. Gertz, S. F. Altschul, and Y.-K. Yu The effectiveness of position- and composition-specific gap costs for protein similarity searches Bioinformatics, July 1, 2008; 24(13): i15 - i23. [Abstract] [Full Text] [PDF] |
||||
![]() |
A. Poleksic and M. Fienup Optimizing the size of the sequence profiles to increase the accuracy of protein sequence alignments generated by profile-profile algorithms Bioinformatics, May 1, 2008; 24(9): 1145 - 1153. [Abstract] [Full Text] [PDF] |
||||
![]() |
M. G. Kann, S. L. Sheetlin, Y. Park, S. H. Bryant, and J. L. Spouge The identification of complete domains within protein sequences using accurate E-values for semi-global alignment Nucleic Acids Res., July 9, 2007; 35(14): 4678 - 4685. [Abstract] [Full Text] [PDF] |
||||
![]() |
K. Karplus, R. Karchin, G. Shackelford, and R. Hughey Calibrating E-values for hidden Markov models using reverse-sequence null models Bioinformatics, November 15, 2005; 21(22): 4107 - 4115. [Abstract] [Full Text] [PDF] |
||||
![]() |
J. Soding Protein homology detection by HMM-HMM comparison Bioinformatics, April 1, 2005; 21(7): 951 - 960. [Abstract] [Full Text] [PDF] |
||||
![]() |
L. M. Shlapatska, S. V. Mikhalap, A. G. Berdova, O. M. Zelensky, T. J. Yun, K. E. Nichols, E. A. Clark, and S. P. Sidorenko CD150 Association with Either the SH2-Containing Inositol Phosphatase or the SH2-Containing Protein Tyrosine Phosphatase Is Regulated by the Adaptor Protein SH2D1A J. Immunol., May 1, 2001; 166(9): 5480 - 5487. [Abstract] [Full Text] [PDF] |
||||
![]() |
T. M. Lowe and S. R. Eddy A Computational Screen for Methylation Guide snoRNAs in Yeast Science, February 19, 1999; 283(5405): 1168 - 1171. [Abstract] [Full Text] |
||||



