Dirichlet mixtures: a method for improved detection of weak but significant protein sequence homology
Baskin Center for Computer Engineering and Information Sciences, Applied Sciences Building, University of California at Santa Cruz Santa Cruz, CA 95064, USA
1The Sanger Centre, Hinxton Hall Hinxton, Cambs CB10 1RQ, UK
2Life Sciences Division (Mail Stop 29100), Lawrence Berkeley Laboratory, University of California Berkeley, CA 94720, USA
1To whom correspondence should be addressed. E-mail: kimmen{at}cse.ucsc.edu
We present a method for condensing the information in multiple alignments of proteins into a mixture of Dirichlet densities over amino acid distributions. Dirichiet mixture densities are designed to be combined with observed amino acid frequencies to form estimates of expected amino acid probabilities at each position in a profile, hidden Markov model or other statistical model. These estimates give a statistical model greater generalization capacity, so that remotely related family members can be more reliably recognized by the model. This paper corrects the previously published formula for estimating these expected probabilities, and contains complete derivations of the Dirichiet mixture formulas, methods for optimizing the mixtures to match particular databases, and suggestions for efficient implementation.
This article has been cited by other articles:
![]() |
D. P. Brown Efficient functional clustering of protein sequences using the Dirichlet process Bioinformatics, August 15, 2008; 24(16): 1765 - 1771. [Abstract] [Full Text] [PDF] |
||||
![]() |
J. G. Glanville, D. Kirshner, N. Krishnamurthy, and K. Sjolander Berkeley Phylogenomics Group web servers: resources for structural phylogenomic analysis Nucleic Acids Res., July 13, 2007; 35(suppl_2): W27 - W32. [Abstract] [Full Text] [PDF] |
||||
![]() |
E. K. Freyhult, J. P. Bollback, and P. P. Gardner Exploring genomic dark matter: A critical assessment of the performance of homology search methods on noncoding RNA Genome Res., January 1, 2007; 17(1): 117 - 125. [Abstract] [Full Text] [PDF] |
||||
![]() |
T. Muramatsu and M. Suwa Statistical analysis and prediction of functional residues effective for GPCR-G-protein coupling selectivity Protein Eng. Des. Sel., June 1, 2006; 19(6): 277 - 283. [Abstract] [Full Text] [PDF] |
||||
![]() |
S.R. EDDY Computational Analysis of RNAs Cold Spring Harb Symp Quant Biol, January 1, 2006; 71(0): 117 - 128. [Abstract] [PDF] |
||||
![]() |
R. Y. Kahsay, G. Gao, and L. Liao An improved hidden Markov model for transmembrane protein detection and topology prediction and its applications to complete genomes Bioinformatics, May 1, 2005; 21(9): 1853 - 1858. [Abstract] [Full Text] [PDF] |
||||
![]() |
G. E. Crooks and S. E. Brenner An alternative model of amino acid replacement Bioinformatics, April 1, 2005; 21(7): 975 - 980. [Abstract] [Full Text] [PDF] |
||||
![]() |
M. N. Price, K. H. Huang, E. J. Alm, and A. P. Arkin A novel method for accurate operon predictions in all sequenced prokaryotes Nucleic Acids Res., February 8, 2005; 33(3): 880 - 892. [Abstract] [Full Text] [PDF] |
||||
![]() |
E. P. Xing and R. M. Karp MotifPrototyper: A Bayesian profile model for motif families PNAS, July 20, 2004; 101(29): 10523 - 10528. [Abstract] [Full Text] [PDF] |
||||
![]() |
H. Viklund and A. Elofsson Best {alpha}-helical transmembrane protein topology predictions are achieved using hidden Markov models and evolutionary information Protein Sci., July 1, 2004; 13(7): 1908 - 1917. [Abstract] [Full Text] [PDF] |
||||
![]() |
G. Wang and R. L. Dunbrack Jr. Scoring profile-to-profile sequence alignments Protein Sci., June 1, 2004; 13(6): 1612 - 1626. [Abstract] [Full Text] [PDF] |
||||
![]() |
A. Y. Lau and D. I. Chasman Functional classification of proteins and protein variants PNAS, April 27, 2004; 101(17): 6576 - 6581. [Abstract] [Full Text] [PDF] |
||||
![]() |
N. Hulo, C. J. A. Sigrist, V. Le Saux, P. S. Langendijk-Genevaux, L. Bordoli, A. Gattiker, E. De Castro, P. Bucher, and A. Bairoch Recent improvements to the PROSITE database Nucleic Acids Res., January 1, 2004; 32(90001): D134 - 137. [Abstract] [Full Text] [PDF] |
||||
![]() |
R. S. Williams, D. I. Chasman, D. D. Hau, B. Hui, A. Y. Lau, and J. N. M. Glover Detection of Protein Folding Defects Caused by BRCA1-BRCT Truncation and Missense Mutations J. Biol. Chem., December 26, 2003; 278(52): 53007 - 53016. [Abstract] [Full Text] [PDF] |
||||
![]() |
D. Kim, D. Xu, J.-t. Guo, K. Ellrott, and Y. Xu PROSPECT II: protein structure prediction program for genome-scale applications Protein Eng. Des. Sel., September 1, 2003; 16(9): 641 - 650. [Abstract] [Full Text] [PDF] |
||||
![]() |
P. D. Thomas, M. J. Campbell, A. Kejariwal, H. Mi, B. Karlak, R. Daverman, K. Diemer, A. Muruganujan, and A. Narechania PANTHER: A Library of Protein Families and Subfamilies Indexed by Function Genome Res., September 1, 2003; 13(9): 2129 - 2141. [Abstract] [Full Text] [PDF] |
||||
![]() |
P. C. Ng and S. Henikoff Predicting Deleterious Amino Acid Substitutions Genome Res., May 1, 2001; 11(5): 863 - 874. [Abstract] [Full Text] |
||||
![]() |
E. J. Moler, D. C. Radisky, and I. S. Mian Integrating naive Bayes models and external knowledge to examine copper and iron homeostasis in S. cerevisiae Physiol Genomics, December 18, 2000; 4(2): 127 - 135. [Abstract] [Full Text] [PDF] |
||||
![]() |
S. R. Sunyaev, F. Eisenhaber, I. V. Rodchenkov, B. Eisenhaber, V. G. Tumanyan, and E. N. Kuznetsov PSIC: profile extraction from sequence alignments with position-specific counts of independent observations Protein Eng. Des. Sel., May 1, 1999; 12(5): 387 - 394. [Abstract] [Full Text] [PDF] |
||||








