Bioinformatics, Vol 14, 846-856, Copyright © 1998 by Oxford University Press
K Karplus, C Barrett and R Hughey
MOTIVATION: A new hidden Markov model method (SAM-T98) for finding remote
homologs of protein sequences is described and evaluated. The method begins
with a single target sequence and iteratively builds a hidden Markov model
(HMM) from the sequence and homologs found using the HMM for database
search. SAM-T98 is also used to construct model libraries automatically
from sequences in structural databases. METHODS: We evaluate the SAM-T98
method with four datasets. Three of the test sets are fold-recognition
tests, where the correct answers are determined by structural similarity.
The fourth uses a curated database. The method is compared against
WU-BLASTP and against DOUBLE- BLAST, a two-step method similar to ISS, but
using BLAST instead of FASTA. RESULTS: SAM-T98 had the fewest errors in all
tests-dramatically so for the fold-recognition tests. At the minimum-error
point on the SCOP (Structural Classification of Proteins)-domains test,
SAM-T98 got 880 true positives and 68 false positives, DOUBLE-BLAST got 533
true positives with 71 false positives, and WU-BLASTP got 353 true
positives with 24 false positives. The method is optimized to recognize
superfamilies, and would require parameter adjustment to be used to find
family or fold relationships. One key to the performance of the HMM method
is a new score-normalization technique that compares the score to the score
with a reversed model rather than to a uniform null model. AVAILABILITY: A
World Wide Web server, as well as information on obtaining the Sequence
Alignment and Modeling (SAM) software suite, can be found at
http://www.cse.ucsc.edu/research/compbi o/ CONTACT: karplus@cse.ucsc.edu;
http://www.cse.ucsc.edu/~karplus
ARTICLES
Hidden Markov models for detecting remote protein homologies
Department of Computer Engineering, Jack Baskin School of Engineering, University of California, Santa Cruz, CA 95064, USA.
![]()
CiteULike
Connotea
Del.icio.us What's this?
This article has been cited by other articles:
![]() |
M. M. Lee, M. K. Chan, and R. Bundschuh Simple is beautiful: a straightforward approach to improve the delineation of true and false positives in PSI-BLAST searches Bioinformatics, June 1, 2008; 24(11): 1339 - 1343. [Abstract] [Full Text] [PDF] |
||||
![]() |
A. Gholami, R. Kassis, E. Real, O. Delmas, S. Guadagnini, F. Larrous, D. Obach, M.-C. Prevost, Y. Jacob, and H. Bourhy Mitochondrial Dysfunction in Lyssavirus-Induced Apoptosis J. Virol., May 15, 2008; 82(10): 4774 - 4784. [Abstract] [Full Text] [PDF] |
||||
![]() |
A. Poleksic and M. Fienup Optimizing the size of the sequence profiles to increase the accuracy of protein sequence alignments generated by profile-profile algorithms Bioinformatics, May 1, 2008; 24(9): 1145 - 1153. [Abstract] [Full Text] [PDF] |
||||
![]() |
K.-C. Liang, X. Wang, and D. Anastassiou A profile-based deterministic sequential Monte Carlo algorithm for motif discovery Bioinformatics, January 1, 2008; 24(1): 46 - 55. [Abstract] [Full Text] [PDF] |
||||
![]() |
A. J. Reid, C. Yeats, and C. A. Orengo Methods of remote homology detection can be combined to increase coverage by 10% in the midnight zone Bioinformatics, September 15, 2007; 23(18): 2353 - 2360. [Abstract] [Full Text] [PDF] |
||||
![]() |
H. Zhou and J. Skolnick Ab Initio Protein Structure Prediction Using Chunk-TASSER Biophys. J., September 1, 2007; 93(5): 1510 - 1518. [Abstract] [Full Text] [PDF] |
||||
![]() |
G. Williamson, I. Aeberli, L. Miguet, Z. Zhang, M.-B. Sanchez, V. Crespy, D. Barron, P. Needs, P. A. Kroon, H. Glavinas, et al. Interaction of Positional Isomers of Quercetin Glucuronides with the Transporter ABCC2 (cMOAT, MRP2) Drug Metab. Dispos., August 1, 2007; 35(8): 1262 - 1268. [Abstract] [Full Text] [PDF] |
||||
![]() |
S. Hochreiter, M. Heusel, and K. Obermayer Fast model-based protein homology detection without alignment Bioinformatics, July 15, 2007; 23(14): 1728 - 1736. [Abstract] [Full Text] [PDF] |
||||
![]() |
M. G. Kann, S. L. Sheetlin, Y. Park, S. H. Bryant, and J. L. Spouge The identification of complete domains within protein sequences using accurate E-values for semi-global alignment Nucleic Acids Res., July 9, 2007; 35(14): 4678 - 4685. [Abstract] [Full Text] [PDF] |
||||
![]() |
J. A. Campbell, D. S. Trossman, W. M. Yokoyama, and L. N. Carayannopoulos Zoonotic orthopoxviruses encode a high-affinity antagonist of NKG2D J. Exp. Med., June 11, 2007; 204(6): 1311 - 1317. [Abstract] [Full Text] [PDF] |
||||
![]() |
J. M. Walraven, J. O. Trent, and D. W. Hein Computational and Experimental Analyses of Mammalian Arylamine N-Acetyltransferase Structure and Function Drug Metab. Dispos., June 1, 2007; 35(6): 1001 - 1007. [Abstract] [Full Text] [PDF] |
||||
![]() |
S. Wu and Y. Zhang LOMETS: A local meta-threading-server for protein structure prediction Nucleic Acids Res., May 11, 2007; 35(10): 3375 - 3382. [Abstract] [Full Text] [PDF] |
||||
![]() |
E. Harsay and R. Schekman Avl9p, a Member of a Novel Protein Superfamily, Functions in the Late Secretory Pathway Mol. Biol. Cell, April 1, 2007; 18(4): 1203 - 1219. [Abstract] [Full Text] [PDF] |
||||
![]() |
P. Fariselli, I. Rossi, E. Capriotti, and R. Casadio The WWWH of remote homolog detection: The state of the art Brief Bioinform, March 1, 2007; 8(2): 78 - 87. [Abstract] [Full Text] [PDF] |
||||
![]() |
D. Wilson, M. Madera, C. Vogel, C. Chothia, and J. Gough The SUPERFAMILY database in 2007: families and functions Nucleic Acids Res., January 12, 2007; 35(suppl_1): D308 - D313. [Abstract] [Full Text] [PDF] |
||||
![]() |
L. H. Greene, T. E. Lewis, S. Addou, A. Cuff, T. Dallman, M. Dibley, O. Redfern, F. Pearl, R. Nambudiry, A. Reid, et al. The CATH domain structure database: new protocols and classification levels give a more comprehensive resource for exploring evolution Nucleic Acids Res., January 12, 2007; 35(suppl_1): D291 - D297. [Abstract] [Full Text] [PDF] |
||||
![]() |
A. Penna, E. Bertozzini, C. Battocchi, L. Galluzzi, M. G. Giacobbe, M. Vila, E. Garces, A. Luglie, and M. Magnani Monitoring of HAB species in the Mediterranean Sea through molecular methods J. Plankton Res., January 1, 2007; 29(1): 19 - 38. [Abstract] [Full Text] [PDF] |
||||
![]() |
E. K. Freyhult, J. P. Bollback, and P. P. Gardner Exploring genomic dark matter: A critical assessment of the performance of homology search methods on noncoding RNA Genome Res., January 1, 2007; 17(1): 117 - 125. [Abstract] [Full Text] [PDF] |
||||
![]() |
M. P. Marsh, S. K. Campos, M. L. Baker, C. Y. Chen, W. Chiu, and M. A. Barry Cryoelectron Microscopy of Protein IX-Modified Adenoviruses Suggests a New Position for the C Terminus of Protein IX J. Virol., December 1, 2006; 80(23): 11881 - 11886. [Abstract] [Full Text] [PDF] |
||||
![]() |
Y.-K. Yu, E. M. Gertz, R. Agarwala, A. A. Schaffer, and S. F. Altschul Retrieval accuracy, statistical significance and compositional similarity in protein sequence database searches Nucleic Acids Res., November 6, 2006; 34(20): 5966 - 5973. [Abstract] [Full Text] [PDF] |
||||
![]() |
D. Chivian and D. Baker Homology modeling using parametric alignment ensemble generation with consensus and energy-based model selection Nucleic Acids Res., October 18, 2006; 34(17): e112 - e112. [Abstract] [Full Text] [PDF] |
||||
![]() |
A. Alminaite, V. Halttunen, V. Kumar, A. Vaheri, L. Holm, and A. Plyusnin Oligomerization of hantavirus nucleocapsid protein: analysis of the N-terminal coiled-coil domain. J. Virol., September 1, 2006; 80(18): 9073 - 9081. [Abstract] [Full Text] [PDF] |
||||
![]() |
J. Cheng and P. Baldi A machine learning information retrieval approach to protein fold recognition Bioinformatics, June 15, 2006; 22(12): 1456 - 1463. [Abstract] [Full Text] [PDF] |
||||
![]() |
T. V. Pereira, F. M. Salzano, A. Mostowska, W. H. Trzeciak, A. Ruiz-Linares, J. A. B. Chies, C. Saavedra, C. Nagamachi, A. M. Hurtado, K. Hill, et al. Natural selection and molecular evolution in primate PAX9 gene, a major determinant of tooth development PNAS, April 11, 2006; 103(15): 5676 - 5681. [Abstract] [Full Text] [PDF] |
||||
![]() |
B. Wallner and A. Elofsson Identification of correct regions in protein models using structural, alignment, and consensus information Protein Sci., April 1, 2006; 15(4): 900 - 913. [Abstract] [Full Text] [PDF] |
||||
![]() |
W. Ge, H. Hu, K. Ding, L. Sun, and S. Zheng Protein Interaction Analysis of ST14 Domains and Their Point and Deletion Mutants J. Biol. Chem., March 17, 2006; 281(11): 7406 - 7412. [Abstract] [Full Text] [PDF] |
||||
![]() |
L. Ponchon, P. Boulanger, G. Labesse, and L. Letellier The Endonuclease Domain of Bacteriophage Terminases Belongs to the Resolvase/Integrase/Ribonuclease H Superfamily: A BIOINFORMATICS ANALYSIS VALIDATED BY A FUNCTIONAL STUDY ON BACTERIOPHAGE T5 J. Biol. Chem., March 3, 2006; 281(9): 5829 - 5836. [Abstract] [Full Text] [PDF] |
||||
![]() |
R. L. Marsden, D. Lee, M. Maibaum, C. Yeats, and C. A. Orengo Comprehensive genome analysis of 203 genomes provides structural genomics with new insights into protein family space Nucleic Acids Res., February 15, 2006; 34(3): 1066 - 1080. [Abstract] [Full Text] [PDF] |
||||
![]() |
Q.-w. Dong, X.-l. Wang, and L. Lin Application of latent semantic analysis to protein remote homology detection Bioinformatics, February 1, 2006; 22(3): 285 - 290. [Abstract] [Full Text] [PDF] |
||||
![]() |
K. Arnold, L. Bordoli, J. Kopp, and T. Schwede The SWISS-MODEL workspace: a web-based environment for protein structure homology modelling Bioinformatics, January 15, 2006; 22(2): 195 - 201. [Abstract] [Full Text] [PDF] |
||||
![]() |
V. S. Gowri, O. Krishnadev, C. S. Swamy, and N. Srinivasan MulPSSM: a database of multiple position-specific scoring matrices of protein domain families Nucleic Acids Res., January 1, 2006; 34(suppl_1): D243 - D246. [Abstract] [Full Text] [PDF] |
||||
![]() |
T. Fossen, V. Wray, K. Bruns, J. Rachmat, P. Henklein, U. Tessmer, A. Maczurek, P. Klinger, and U. Schubert Solution Structure of the Human Immunodeficiency Virus Type 1 p6 Protein J. Biol. Chem., December 30, 2005; 280(52): 42515 - 42527. [Abstract] [Full Text] [PDF] |
||||
![]() |
R. Jordanova, G. Radoslavov, P. Fischer, A. Torda, F. Lottspeich, R. Boteva, R. D. Walter, I. Bankov, and E. Liebau The Highly Abundant Protein Ag-lbp55 from Ascaridia galli Represents a Novel Type of Lipid-binding Proteins J. Biol. Chem., December 16, 2005; 280(50): 41429 - 41438. [Abstract] [Full Text] [PDF] |
||||
![]() |
H. Rangwala and G. Karypis Profile-based direct kernels for remote homology detection and fold recognition Bioinformatics, December 1, 2005; 21(23): 4239 - 4247. [Abstract] [Full Text] [PDF] |
||||
![]() |
B. Wallner and A. Elofsson Pcons5: combining consensus, structural evaluation and fold recognition scores Bioinformatics, December 1, 2005; 21(23): 4248 - 4254. [Abstract] [Full Text] [PDF] |
||||
![]() |
K. Karplus, R. Karchin, G. Shackelford, and R. Hughey Calibrating E-values for hidden Markov models using reverse-sequence null models Bioinformatics, November 15, 2005; 21(22): 4107 - 4115. [Abstract] [Full Text] [PDF] |
||||
![]() |
G. A. Price, G. E. Crooks, R. E. Green, and S. E. Brenner Statistical evaluation of pairwise protein sequence comparison with the Bayesian bootstrap Bioinformatics, October 15, 2005; 21(20): 3824 - 3831. [Abstract] [Full Text] [PDF] |
||||
![]() |
G. E. Crooks, R. E. Green, and S. E. Brenner Pairwise alignment incorporating dipeptide covariation Bioinformatics, October 1, 2005; 21(19): 3704 - 3710. [Abstract] [Full Text] [PDF] |
||||
![]() |
C. S. Pettitt, L. J. McGuffin, and D. T. Jones Improving sequence-based fold recognition by using 3D model quality assessment Bioinformatics, September 1, 2005; 21(17): 3509 - 3515. [Abstract] [Full Text] [PDF] |
||||
![]() |
D. Kihara The effect of long-range interactions on the secondary structure formation of proteins Protein Sci., August 1, 2005; 14(8): 1955 - 1963. [Abstract] [Full Text] [PDF] |
||||
![]() |
D. Kozakov, K. H. Clodfelter, S. Vajda, and C. J. Camacho Optimal Clustering for Detecting Near-Native Conformations in Protein Docking Biophys. J., August 1, 2005; 89(2): 867 - 875. [Abstract] [Full Text] [PDF] |
||||
![]() |
O. Grana, V. A. Eyrich, F. Pazos, B. Rost, and A. Valencia EVAcon: a protein contact prediction evaluation service Nucleic Acids Res., July 1, 2005; 33(suppl_2): W347 - W351. [Abstract] [Full Text] [PDF] |
||||
![]() |
I. Sillitoe, M. Dibley, J. Bray, S. Addou, and C. Orengo Assessing strategies for improved superfamily recognition Protein Sci., July 1, 2005; 14(7): 1800 - 1810. [Abstract] [Full Text] [PDF] |
||||
![]() |
B. Anand, V.S. Gowri, and N. Srinivasan Use of multiple profiles corresponding to a sequence alignment enables effective detection of remote homologues Bioinformatics, June 15, 2005; 21(12): 2821 - 2826. [Abstract] [Full Text] [PDF] |
||||
![]() |
R. Karchin, M. Diekhans, L. Kelly, D. J. Thomas, U. Pieper, N. Eswar, D. Haussler, and A. Sali LS-SNP: large-scale annotation of coding non-synonymous SNPs based on multiple information sources Bioinformatics, June 15, 2005; 21(12): 2814 - 2820. [Abstract] [Full Text] [PDF] |
||||
![]() |
L.-C. Fortier, J. D. Bouchard, and S. Moineau Expression and Site-Directed Mutagenesis of the Lactococcal Abortive Phage Infection Protein AbiK J. Bacteriol., June 1, 2005; 187(11): 3721 - 3730. [Abstract] [Full Text] [PDF] |
||||
![]() |
C. S. Sevier, H. Kadokura, V. C. Tam, J. Beckwith, D. Fass, and C. A. Kaiser The prokaryotic enzyme DsbB may share key structural features with eukaryotic disulfide bond forming oxidoreductases Protein Sci., June 1, 2005; 14(6): 1630 - 1642. [Abstract] [Full Text] [PDF] |
||||
![]() |
R. Y. Kahsay, G. Wang, G. Gao, L. Liao, and R. Dunbrack Quasi-consensus-based comparison of profile hidden Markov models for protein sequences Bioinformatics, May 15, 2005; 21(10): 2287 - 2293. [Abstract] [Full Text] [PDF] |
||||
![]() |
P. K. Shah, P. Aloy, P. Bork, and R. B. Russell Structural similarity to bridge sequence space: Finding new families on the bridges Protein Sci., May 1, 2005; 14(5): 1305 - 1314. [Abstract] [Full Text] [PDF] |
||||
![]() |
F. D. Ciccarelli and P. Bork The WHy domain mediates the response to desiccation in plants and bacteria Bioinformatics, April 15, 2005; 21(8): 1304 - 1307. [Abstract] [Full Text] [PDF] |
||||
![]() |
M. G. Kann, P. A. Thiessen, A. R. Panchenko, A. A. Schaffer, S. F. Altschul, and S. H. Bryant A structure-based method for protein sequence alignment Bioinformatics, April 15, 2005; 21(8): 1451 - 1456. [Abstract] [Full Text] [PDF] |
||||
![]() |
M. C. Alegria, D. P. Souza, M. O. Andrade, C. Docena, L. Khater, C. H. I. Ramos, A. C. R. da Silva, and C. S. Farah Identification of New Protein-Protein Interactions Involving the Products of the Chromosome- and Plasmid-Encoded Type IV Secretion Loci of the Phytopathogen Xanthomonas axonopodis pv. citri J. Bacteriol., April 1, 2005; 187(7): 2315 - 2325. [Abstract] [Full Text] [PDF] |
||||
![]() |
M. Itoh, S. Goto, T. Akutsu, and M. Kanehisa Fast and accurate database homology search using upper bounds of local alignment scores Bioinformatics, April 1, 2005; 21(7): 912 - 921. [Abstract] [Full Text] [PDF] |
||||
![]() |
X. Yuan and C. Bystroff Non-sequential structure-based alignments reveal topology-independent core packing arrangements in proteins Bioinformatics, April 1, 2005; 21(7): 1010 - 1019. [Abstract] [Full Text] [PDF] |
||||
![]() |
I. Kifer, O. Sasson, and M. Linial Predicting fold novelty based on ProtoNet hierarchical classification Bioinformatics, April 1, 2005; 21(7): 1020 - 1027. [Abstract] [Full Text] [PDF] |
||||
![]() |
M. J. L. Barragan, B. Blazquez, M. T. Zamarro, J. M. Mancheno, J. L. Garcia, E. Diaz, and M. Carmona BzdR, a Repressor That Controls the Anaerobic Catabolism of Benzoate in Azoarcus sp. CIB, Is the First Member of a New Subfamily of Transcriptional Regulators J. Biol. Chem., March 18, 2005; 280(11): 10683 - 10694. [Abstract] [Full Text] [PDF] |
||||
![]() |
I. Schmitt, G. Mueller, and H. T. Lumbsch Ascoma morphology is homoplaseous and phylogenetically misleading in some pyrenocarpous lichens Mycologia, March 1, 2005; 97(2): 362 - 374. [Abstract] [Full Text] [PDF] |
||||
![]() |
S. Yang, R. F. Doolittle, and P. E. Bourne Phylogeny determined by protein domain content PNAS, January 11, 2005; 102(2): 373 - 378. [Abstract] [Full Text] [PDF] |
||||
![]() |














