Bioinformatics Advance Access published online on November 2, 2005
Bioinformatics, doi:10.1093/bioinformatics/bti731
| ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
1 Duke University, Department of Computer Science, Box 90129, Durham, NC 27708
* To whom correspondence should be addressed.
Motivation: A key goal in molecular biology is to understand the mechanisms by which the cell regulates the transcription of its genes. One important aspect of this transcriptional regulation is the binding of transcription factors (TFs) to their specific cis-regulatory counterparts on the DNA. TFs recognize and bind their DNA counterparts according to the structure of their DNA-binding domains (e.g., zinc finger, leucine zipper, homeodomain). The structure of these domains can be used as a basis for grouping TFs into classes. Although the structure of DNA-binding domains varies widely across TFs generally, the TFs within a particular class bind to DNA in a similar fashion, suggesting the existence of class-specific features in the DNA sequences bound by each class of TFs. Results: In this paper, we apply a sparse Bayesian learning algorithm to identify a small set of class-specific features in the DNA sequences bound by different classes of TFs; the algorithm simultaneously learns a true multi-class classifier that uses these features to predict the DNA-binding domain of the TF that recognizes a particular set of DNA sequences.We train our algorithm on the six largest classes in TRANSFAC, comprising a total of 587 TFs. We learn a six-class classifier for this training set that achieves 87% LOOCV (leave-one-out cross-validation) accuracy. We also identify features within cis-regulatory sequences that are highly specific to each class of TF, which has significant implications for how TF binding sites should be modeled for the purpose of motif discovery.
Received July 21, 2005
Revised September 30, 2005
Accepted October 18, 2005
Article
Sequence features of DNA binding sites reveal structural class of associated transcription factor
Leelavati Narlikar, E-mail: lee{at}cs.duke.edu
![]()
Abstract ![]()
CiteULike
Connotea
Del.icio.us What's this?
This article has been cited by other articles:
![]() |
U. J. Pape, S. Rahmann, and M. Vingron Natural similarity measures between position frequency matrices with an application to clustering Bioinformatics, February 1, 2008; 24(3): 350 - 357. [Abstract] [Full Text] [PDF] |
||||
![]() |
S. Nikolajewa, R. Pudimat, M. Hiller, M. Platzer, and R. Backofen BioBayesNet: a web server for feature extraction and Bayesian network modeling of biological sequence data Nucleic Acids Res., July 13, 2007; 35(suppl_2): W688 - W693. [Abstract] [Full Text] [PDF] |
||||
![]() |
A. V. Morozov and E. D. Siggia Connecting protein structure with predictions of regulatory sites PNAS, April 24, 2007; 104(17): 7068 - 7073. [Abstract] [Full Text] [PDF] |
||||


