Bioinformatics Vol. 17 no. 90001 2001
Pages S65-S73
© 2001 Oxford University Press
Using mixtures of common ancestors for estimating the probabilities of discrete events in biological sequences
1 Department of Computer Science, Columbia
University, New York, NY, 10027
2 School of CSE, Hebrew University,
Jerusalem, Israel
Received on February 5, 2001
; revised on April 4, 2001
; accepted on April 4, 2001
Accurately estimating probabilities from observations is important for probabilistic-based approaches to problems in computational biology. In this paper we present a biologically-motivated method for estimating probability distributions over discrete alphabets from observations using a mixture model of common ancestors. The method is an extension of substitution matrix-based probability estimation methods. In contrast to previous such methods, our method has a simple Bayesian interpretation and has the advantage over Dirichlet mixtures that it is both effective and simple to compute for large alphabets. The method is applied to estimate amino acid probabilities based on observed counts in an alignment and is shown to perform comparably to previous methods. The method is also applied to estimate probability distributions over protein families and improves protein classification accuracy.
Contact: eeskin{at}cs.columbia.edu
![]()
CiteULike
Connotea
Del.icio.us What's this?
This article has been cited by other articles:
![]() |
N. Lartillot and H. Philippe A Bayesian Mixture Model for Across-Site Heterogeneities in the Amino-Acid Replacement Process Mol. Biol. Evol., June 1, 2004; 21(6): 1095 - 1109. [Abstract] [Full Text] [PDF] |
||||
