Using substitution probabilities to improve position-specific scoring matrices
Howard Hughes Medical Institute, Basic Sciences Division and Fred Hutchinson Cancer Research Center Seattle, WA 98104, USA
1To whom correspondence should be addressed E-mail:henikoff{at}howard.fhcrc.org
Each column of amino acids in a multiple alignment of protein sequences can be represented as a vector of 20 amino acid counts. For alignment and searching applications, the count vector is an imperfect representation of a position, because the observed sequences are an incomplete sample of the full set of related sequences. One general solution to this problem is to model unobserved sequences by adding artificial pseudo-counts to the observed counts. We introduce a simple method for computing pseudo-counts that combines the diversity observed in each alignment position with amino acid substitution probabilities. In extensive empirical tests, this position-based method out-performed other pseudo-count methods and was a substantial improvement over the traditional average score method used for constructing profiles.
Received on September 11, 1995; revised on January 4, 1996; accepted on January 4, 1996
This article has been cited by other articles:
![]() |
W. Pirovano, K. A. Feenstra, and J. Heringa Sequence comparison by sequence harmony identifies subtype-specific functional sites Nucleic Acids Res., December 2, 2006; 34(22): 6540 - 6548. [Abstract] [Full Text] [PDF] |
||||
![]() |
M. Frenkel-Morgenstern, H. Voet, and S. Pietrokovski Enhanced statistics for local alignment of multiple alignments improves prediction of protein function and structure Bioinformatics, July 1, 2005; 21(13): 2950 - 2956. [Abstract] [Full Text] [PDF] |
||||
![]() |
L. Bao and Y. Cui Prediction of the phenotypic effects of non-synonymous single nucleotide polymorphisms using structural and evolutionary information Bioinformatics, May 15, 2005; 21(10): 2185 - 2190. [Abstract] [Full Text] [PDF] |
||||
![]() |
M. Legendre, A. Lambert, and D. Gautheret Profile-based detection of microRNA precursors in animal genomes Bioinformatics, April 1, 2005; 21(7): 841 - 845. [Abstract] [Full Text] [PDF] |
||||
![]() |
N. Mirkovic, M. A. Marti-Renom, B. L. Weber, A. Sali, and A. N. A. Monteiro Structure-Based Assessment of Missense Mutations in Human BRCA1: Implications for Breast and Ovarian Cancer Predisposition Cancer Res., June 1, 2004; 64(11): 3790 - 3797. [Abstract] [Full Text] [PDF] |
||||
![]() |
M. A. Marti-Renom, M.S. Madhusudhan, and A. Sali Alignment of protein sequences by their profiles Protein Sci., April 1, 2004; 13(4): 1071 - 1087. [Abstract] [Full Text] [PDF] |
||||
![]() |
N. Eswar, B. John, N. Mirkovic, A. Fiser, V. A. Ilyin, U. Pieper, A. C. Stuart, M. A. Marti-Renom, M. S. Madhusudhan, B. Yerkovich, et al. Tools for comparative protein structure modeling and analysis Nucleic Acids Res., July 1, 2003; 31(13): 3375 - 3380. [Abstract] [Full Text] [PDF] |
||||
![]() |
T. Rose, J. Henikoff, and S. Henikoff CODEHOP (COnsensus-DEgenerate Hybrid Oligonucleotide Primer) PCR primer design Nucleic Acids Res., July 1, 2003; 31(13): 3763 - 3766. [Abstract] [Full Text] [PDF] |
||||
![]() |
N. E. Taylor and E. A. Greene PARSESNP: a tool for the analysis of nucleotide polymorphisms Nucleic Acids Res., July 1, 2003; 31(13): 3808 - 3811. [Abstract] [Full Text] [PDF] |
||||
![]() |
J. J. Abbott, J. Pei, J. L. Ford, Y. Qi, V. N. Grishin, L. A. Pitcher, M. A. Phillips, and N. V. Grishin Structure Prediction and Active Site Analysis of the Metal Binding Determinants in gamma -Glutamylcysteine Synthetase J. Biol. Chem., November 2, 2001; 276(45): 42099 - 42107. [Abstract] [Full Text] [PDF] |
||||
![]() |
P. C. Ng and S. Henikoff Predicting Deleterious Amino Acid Substitutions Genome Res., May 1, 2001; 11(5): 863 - 874. [Abstract] [Full Text] |
||||
![]() |
A. F. Neuwald and A. Poleksic PSI-BLAST searches using hidden Markov models of structural repeats: prediction of an unusual sliding DNA clamp and of {beta}-propellers in UV-damaged DNA-binding protein Nucleic Acids Res., September 15, 2000; 28(18): 3570 - 3580. [Abstract] [Full Text] [PDF] |
||||
![]() |
S. R. Sunyaev, F. Eisenhaber, I. V. Rodchenkov, B. Eisenhaber, V. G. Tumanyan, and E. N. Kuznetsov PSIC: profile extraction from sequence alignments with position-specific counts of independent observations Protein Eng. Des. Sel., May 1, 1999; 12(5): 387 - 394. [Abstract] [Full Text] [PDF] |
||||
![]() |
S. Henikoff, E. A. Greene, S. Pietrokovski, P. Bork, T. K. Attwood, and L. Hood Gene Families: The Taxonomy of Protein Paralogs and Chimeras Science, October 24, 1997; 278(5338): 609 - 614. [Abstract] [Full Text] |
||||







