Enhanced position weight matrices using mixture models
1Department of Genetics, University of Pennsylvania Philadelphia, PA 19104, USA
2Department of Biology, University of Pennsylvania Philadelphia, PA 19104, USA
*To whom correspondence should be addressed.
Motivation: Positional weight matrix (PWM) is derived from a set of experimentally determined binding sites. Here we explore whether there exist subclasses of binding sites and if the mixture of these subclass-PWMs can improve the binding site prediction. Intuitively, the subclasses correspond to either distinct binding preference of the same transcription factor in different contexts or distinct subtypes of the transcription factor.
Result: We report an Expectation Maximization algorithm adapting the mixture model of Baily and Elkan. We assessed the relative merit of using two subclass-PWMs. The resulting PWMs were evaluated with respect to preferred conservation (relative to mouse) of potential sites in human promoters and expression coherence of the potential target genes. Based on 64 JASPAR vertebrate PWMs, 6181% of the cases resulted in a higher conservation using the mixture model. Also in 98% of the cases the expression coherence was higher for the target genes of one of the subclass-PWMs. Our analysis of Reb1 sites is consistent with previously discovered subtypes using independent methods. Additionally application of our method to mutated sites for transcription factor LEU3 reveals subclasses that segregate into strongly binding and weakly binding sites with P-value of 0.008. This is the first study which attempts to quantify the subtly different binding specificities of a transcription factor on a large scale and suggests the use of a mixture of PWMs, instead of the current practice of using a single PWM, for a transcription factor.
Availability:
Contact: sridharh{at}pcbi.upenn.edu
Received on January 15, 2005; accepted on March 27, 2005
This article has been cited by other articles:
![]() |
S. Keles, C. L. Warren, C. D. Carlson, and A. Z. Ansari CSI-Tree: A regression tree approach for modeling binding properties of DNA-binding molecules based on cognate site identification (CSI) data Nucleic Acids Res., April 13, 2008; (2008) gkn057v1. [Abstract] [Full Text] [PDF] |
||||
![]() |
L. N. Singh, L.-S. Wang, and S. Hannenhalli TREMOR a tool for retrieving transcriptional modules by incorporating motif covariance Nucleic Acids Res., December 18, 2007; 35(21): 7360 - 7371. [Abstract] [Full Text] [PDF] |
||||
