Bioinformatics Advance Access published online on December 16, 2008
Bioinformatics, doi:10.1093/bioinformatics/btn645
| ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
Predicting the Binding Preference of Transcription Factors to Individual DNA k-mers
1Department of Molecular Genetics, 2Banting and Best Department of Medical Research, University of Toronto, Toronto, ON M5S 3E1, 3Division of Genetics, Department of Medicine, 4Department of Pathology, Brigham and Women's Hospital and Harvard Medical School, Boston, MA 021156, 5Committee on Higher Degrees in Biophysics, Harvard University, Cambridge, MA 02138 2, and 6Harvard/MIT Division of Health Sciences and Technology (HST), Harvard Medical School, Boston, MA 02115
*To whom correspondence should be addressed. Prof. Timothy R. Hughes, E-mail: t.hughes{at}utoronto.ca
| Abstract |
|---|
Motivation: Recognition of specific DNA sequences is a central mechanism by which transcription factors (TFs) control gene expression. Many TF binding preferences, however, are unknown or poorly characterized, in part due to the difficulty associated with determining their specificity experimentally, and an incomplete understanding of the mechanisms governing sequence specificity. New techniques that estimate the affinity of TFs to all possible kmers provide a new opportunity to study DNA-protein interaction mechanisms, and may facilitate inference of binding preferences for members of a given TF family when such information is available for other family members.
Results: We employed a new data set consisting of the relative preferences of mouse homeodomains for all 8-base DNA sequences in order to ask how well we can predict the binding profiles of homeodomains when given only their protein sequences. We evaluated a panel of standard statistical inference techniques, as well as variations of the protein features considered. Nearest-neighbour among functionally-important residues emerged among the most effective methods. Our results underscore the complexity of TFDNA recognition, and suggest a rational approach for future analyses of TF families.
Associate Editor: Prof. David Rocke
Received on August 10, 2008; revised on November 16, 2008; accepted on December 11, 2008