Hidden Markov models for sequence analysis: extension and analysis of the basic method
Computer Engineering, University of California Santa Cruz, CA 95064, USA
1NORDITA, Blegdamsvej 17, 2100 Copenhagen, Denmark
2Present address: The Sanger Centre, Hixton, Cambridge CB10 1RQ, UK
Hidden Markov models (HMMs) are a highly effective means of modeling a family of unaligned sequences or a common motif within a set of unaligned sequences. The trained HMM can then be used for discrimination or multiple alignment. The basic mathematical description of an HMM and its expectation-maximization training procedure is relatively straightforward. In this paper, we review the mathematical extensions and heuristics that move the method from the theoretical to the practical. We then experimentally analyze the effectiveness of model regularization, dynamic model modification and optimization strategies. Finally it is demonstrated on the SH2 domain how a domain can be found from unaligned sequences using a special model type. The experimental work was completed with the aid of the Sequence Alignment and Modeling software suite.
Received on July 24, 1995; accepted on October 23, 1995
This article has been cited by other articles:
![]() |
A. Stojmirovic, E. M. Gertz, S. F. Altschul, and Y.-K. Yu The effectiveness of position- and composition-specific gap costs for protein similarity searches Bioinformatics, July 1, 2008; 24(13): i15 - i23. [Abstract] [Full Text] [PDF] |
||||
![]() |
M. M. Lee, M. K. Chan, and R. Bundschuh Simple is beautiful: a straightforward approach to improve the delineation of true and false positives in PSI-BLAST searches Bioinformatics, June 1, 2008; 24(11): 1339 - 1343. [Abstract] [Full Text] [PDF] |
||||
![]() |
C. Lundegaard, O. Lund, C. Kesmir, S. Brunak, and M. Nielsen Modeling the adaptive immune system: predictions and simulations Bioinformatics, December 15, 2007; 23(24): 3265 - 3275. [Abstract] [Full Text] [PDF] |
||||
![]() |
L. D. Tatar, C. L. Marolda, A. N. Polischuk, D. van Leeuwen, and M. A. Valvano An Escherichia coli undecaprenyl-pyrophosphate phosphatase implicated in undecaprenyl phosphate recycling Microbiology, August 1, 2007; 153(8): 2518 - 2529. [Abstract] [Full Text] [PDF] |
||||
![]() |
J. A. Campbell, D. S. Trossman, W. M. Yokoyama, and L. N. Carayannopoulos Zoonotic orthopoxviruses encode a high-affinity antagonist of NKG2D J. Exp. Med., June 11, 2007; 204(6): 1311 - 1317. [Abstract] [Full Text] [PDF] |
||||
![]() |
E. K. Freyhult, J. P. Bollback, and P. P. Gardner Exploring genomic dark matter: A critical assessment of the performance of homology search methods on noncoding RNA Genome Res., January 1, 2007; 17(1): 117 - 125. [Abstract] [Full Text] [PDF] |
||||
![]() |
T. Friedrich, B. Pils, T. Dandekar, J. Schultz, and T. Muller Modelling interaction sites in protein domains with interaction profile hidden Markov models Bioinformatics, December 1, 2006; 22(23): 2851 - 2857. [Abstract] [Full Text] [PDF] |
||||
![]() |
J. Cheng and P. Baldi A machine learning information retrieval approach to protein fold recognition Bioinformatics, June 15, 2006; 22(12): 1456 - 1463. [Abstract] [Full Text] [PDF] |
||||
![]() |
K. Arnold, L. Bordoli, J. Kopp, and T. Schwede The SWISS-MODEL workspace: a web-based environment for protein structure homology modelling Bioinformatics, January 15, 2006; 22(2): 195 - 201. [Abstract] [Full Text] [PDF] |
||||
![]() |
K. Karplus, R. Karchin, G. Shackelford, and R. Hughey Calibrating E-values for hidden Markov models using reverse-sequence null models Bioinformatics, November 15, 2005; 21(22): 4107 - 4115. [Abstract] [Full Text] [PDF] |
||||
![]() |
H. Zhou and Y. Zhou SPEM: improving multiple sequence alignment with sequence profiles and predicted secondary structures Bioinformatics, September 15, 2005; 21(18): 3615 - 3621. [Abstract] [Full Text] [PDF] |
||||
![]() |
D. Herrera, A. Mamarbachi, M. Simoes, L. Parent, R. Sauve, Z. Wang, and S. Nattel A Single Residue in the S6 Transmembrane Domain Governs the Differential Flecainide Sensitivity of Voltage-Gated Potassium Channels Mol. Pharmacol., August 1, 2005; 68(2): 305 - 316. [Abstract] [Full Text] [PDF] |
||||
![]() |
J. Gough Convergent evolution of domain architectures (is rare) Bioinformatics, April 15, 2005; 21(8): 1464 - 1471. [Abstract] [Full Text] [PDF] |
||||
![]() |
M. Pellegrini-Calace and J. M. Thornton Detecting DNA-binding helix-turn-helix structural motifs using sequence and structure information Nucleic Acids Res., April 14, 2005; 33(7): 2129 - 2140. [Abstract] [Full Text] [PDF] |
||||
![]() |
W.-M. Zheng Relation between weight matrix and substitution matrix: motif search by similarity Bioinformatics, April 1, 2005; 21(7): 938 - 943. [Abstract] [Full Text] [PDF] |
||||
![]() |
N. Arisue, M. Hasegawa, and T. Hashimoto Root of the Eukaryota Tree as Inferred from Combined Maximum Likelihood Analyses of Multiple Molecular Sequence Data Mol. Biol. Evol., March 1, 2005; 22(3): 409 - 420. [Abstract] [Full Text] [PDF] |
||||
![]() |
P. C. Abad, I. S. Mian, C. Plachot, A. Nelpurackal, C. Bator-Kelly, and S. A. Lelievre The C terminus of the nuclear protein NuMA: Phylogenetic distribution and structure Protein Sci., October 22, 2004; 13(10): 2573 - 2577. [Abstract] [Full Text] [PDF] |
||||
![]() |
H. P. Shanahan, M. A. Garcia, S. Jones, and J. M. Thornton Identifying DNA-binding proteins using structural motifs and the electrostatic potential Nucleic Acids Res., September 8, 2004; 32(16): 4732 - 4741. [Abstract] [Full Text] [PDF] |
||||
![]() |
H. Viklund and A. Elofsson Best {alpha}-helical transmembrane protein topology predictions are achieved using hidden Markov models and evolutionary information Protein Sci., July 1, 2004; 13(7): 1908 - 1917. [Abstract] [Full Text] [PDF] |
||||
![]() |
P. K. Papasaikas, P. G. Bagos, Z. I. Litou, V. J. Promponas, and S. J. Hamodrakas PRED-GPCR: GPCR recognition and family classification server Nucleic Acids Res., July 1, 2004; 32(suppl_2): W380 - W382. [Abstract] [Full Text] [PDF] |
||||
![]() |
A. E. Torda, J. B. Procter, and T. Huber Wurst: a protein threading server with a structural scoring function, sequence profiles and optimized substitution matrices Nucleic Acids Res., July 1, 2004; 32(suppl_2): W532 - W535. [Abstract] [Full Text] [PDF] |
||||
![]() |
J. Weston, A. Elisseeff, D. Zhou, C. S. Leslie, and W. S. Noble Protein ranking: From local to global structure in the protein similarity network PNAS, April 27, 2004; 101(17): 6559 - 6563. [Abstract] [Full Text] [PDF] |
||||
![]() |
M. A. Marti-Renom, M.S. Madhusudhan, and A. Sali Alignment of protein sequences by their profiles Protein Sci., April 1, 2004; 13(4): 1071 - 1087. [Abstract] [Full Text] [PDF] |
||||
![]() |
B. de Bono and C. Chothia Exegesis: a procedure to improve gene predictions and its use to find immunoglobulin superfamily proteins in the human and mouse genomes Nucleic Acids Res., November 1, 2003; 31(21): 6096 - 6103. [Abstract] [Full Text] [PDF] |
||||
![]() |
S. Jones, J. A. Barker, I. Nobeli, and J. M. Thornton Using structural motif templates to identify proteins with DNA binding function Nucleic Acids Res., June 1, 2003; 31(11): 2811 - 2823. [Abstract] [Full Text] [PDF] |
||||
![]() |
A. R. Panchenko Finding weak similarities between proteins by sequence profile comparison Nucleic Acids Res., January 15, 2003; 31(2): 683 - 689. [Abstract] [Full Text] [PDF] |
||||
![]() |
R. Cazalis, T. Aussenac, L. Rhazi, A. Marin, and J.-F. Gibrat Homology modeling and molecular dynamics simulations of the N-terminal domain of wheat high molecular weight glutenin subunit 10 Protein Sci., January 1, 2003; 12(1): 34 - 43. [Abstract] [Full Text] [PDF] |
||||
![]() |
P. Mallick, R. Weiss, and D. Eisenberg The directional atomic solvation energy: An atom-based potential for the assignment of protein sequences to known folds PNAS, December 10, 2002; 99(25): 16041 - 16046. [Abstract] [Full Text] [PDF] |
||||
![]() |
A. Turchin and I. S. Kohane Gene homology resources on the World Wide Web Physiol Genomics, December 3, 2002; 11(3): 165 - 177. [Abstract] [Full Text] [PDF] |
||||
![]() |
M. Madera and J. Gough A comparison of profile hidden Markov model procedures for remote homology detection Nucleic Acids Res., October 1, 2002; 30(19): 4321 - 4328. [Abstract] [Full Text] [PDF] |
||||
![]() |
D. E. Fouts, R. B. Abramovitch, J. R. Alfano, A. M. Baldo, C. R. Buell, S. Cartinhour, A. K. Chatterjee, M. D'Ascenzo, M. L. Gwinn, S. G. Lazarowitz, et al. Genomewide identification of Pseudomonas syringae pv. tomato DC3000 promoters controlled by the HrpL alternative sigma factor PNAS, February 19, 2002; 99(4): 2275 - 2280. [Abstract] [Full Text] [PDF] |
||||
![]() |
A. R. Panchenko and S. H. Bryant A comparison of position-specific score matrices based on sequence and structure alignments Protein Sci., February 1, 2002; 11(2): 361 - 370. [Abstract] [Full Text] [PDF] |
||||
![]() |
G. S. C. Dance, P. Beemiller, Y. Yang, D. V. Mater, I. S. Mian, and H. C. Smith Identification of the yeast cytidine deaminase CDD1 as an orphan C{->}U RNA editase Nucleic Acids Res., April 15, 2001; 29(8): 1772 - 1780. [Abstract] [Full Text] [PDF] |
||||
![]() |
X. Shao and N. V. Grishin Common fold in helix-hairpin-helix proteins Nucleic Acids Res., July 15, 2000; 28(14): 2643 - 2650. [Abstract] [Full Text] [PDF] |
||||
![]() |
J.E. Bray, A.E. Todd, F.M.G. Pearl, J.M. Thornton, and C.A. Orengo The CATH Dictionary of Homologous Superfamilies (DHS): a consensus approach for identifying distant structural homologues Protein Eng. Des. Sel., March 1, 2000; 13(3): 153 - 165. [Abstract] [Full Text] [PDF] |
||||
![]() |
A. M. Baldo and M. A. McClure Evolution and Horizontal Transfer of dUTPase-Encoding Genes in Viruses and Their Hosts J. Virol., September 1, 1999; 73(9): 7710 - 7721. [Abstract] [Full Text] |
||||
![]() |
V. Geetha, V. Di Francesco, J. Garnier, and P. J. Munson Comparing protein sequence-based and predicted secondary structure-based methods for identification of remote homologs Protein Eng. Des. Sel., July 1, 1999; 12(7): 527 - 534. [Abstract] [Full Text] [PDF] |
||||
![]() |
T. Hashimoto, L. B. Sanchez, T. Shirakura, M. Muller, and M. Hasegawa Secondary absence of mitochondria in Giardia lamblia and Trichomonas vaginalis revealed by valyl-tRNA synthetase phylogeny PNAS, June 9, 1998; 95(12): 6860 - 6865. [Abstract] [Full Text] [PDF] |
||||











