Bioinformatics Advance Access originally published online on November 2, 2005
Bioinformatics 2006 22(2):157-163; doi:10.1093/bioinformatics/bti731
| ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
Sequence features of DNA binding sites reveal structural class of associated transcription factor
Duke University, Department of Computer Science Box 90129, Durham, NC 27708, USA
*To whom correspondence should be addressed.
Motivation: A key goal in molecular biology is to understand the mechanisms by which a cell regulates the transcription of its genes. One important aspect of this transcriptional regulation is the binding of transcription factors (TFs) to their specific cis-regulatory counterparts on the DNA. TFs recognize and bind their DNA counterparts according to the structure of their DNA-binding domains (e.g. zinc finger, leucine zipper, homeodomain). The structure of these domains can be used as a basis for grouping TFs into classes. Although the structure of DNA-binding domains varies widely across TFs generally, the TFs within a particular class bind to DNA in a similar fashion, suggesting the existence of class-specific features in the DNA sequences bound by each class of TFs.
Results: In this paper, we apply a sparse Bayesian learning algorithm to identify a small set of class-specific features in the DNA sequences bound by different classes of TFs; the algorithm simultaneously learns a true multi-class classifier that uses these features to predict the DNA-binding domain of the TF that recognizes a particular set of DNA sequences. We train our algorithm on the six largest classes in TRANSFAC, comprising a total of 587 TFs. We learn a six-class classifier for this training set that achieves 87% leave-one-out cross-validation accuracy. We also identify features within cis-regulatory sequences that are highly specific to each class of TF, which has significant implications for how TF binding sites should be modeled for the purpose of motif discovery.
Contact: lee{at}cs.duke.edu; amink{at}cs.duke.edu
Received on July 21, 2005; revised on September 30, 2005; accepted on October 18, 2005
This article has been cited by other articles:
![]() |
U. J. Pape, S. Rahmann, and M. Vingron Natural similarity measures between position frequency matrices with an application to clustering Bioinformatics, February 1, 2008; 24(3): 350 - 357. [Abstract] [Full Text] [PDF] |
||||
![]() |
S. Nikolajewa, R. Pudimat, M. Hiller, M. Platzer, and R. Backofen BioBayesNet: a web server for feature extraction and Bayesian network modeling of biological sequence data Nucleic Acids Res., July 13, 2007; 35(suppl_2): W688 - W693. [Abstract] [Full Text] [PDF] |
||||
![]() |
A. V. Morozov and E. D. Siggia Connecting protein structure with predictions of regulatory sites PNAS, April 24, 2007; 104(17): 7068 - 7073. [Abstract] [Full Text] [PDF] |
||||


