Bioinformatics Advance Access originally published online on November 17, 2007
Bioinformatics 2008 24(1):26-33; doi:10.1093/bioinformatics/btm539
| ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
Predicting and understanding transcription factor interactions based on sequence level determinants of combinatorial control
1Applied Bioinformatics, PRI, Wageningen UR, Droevendaalsesteeg 1, 2Biometris, Wageningen UR, Bornsesteeg 47 and 3Bioscience, PRI, Wageningen UR, Droevendaalsesteeg 1, Wageningen, The Netherlands
*To whom correspondence should be addressed.
| Abstract |
|---|
Motivation: Transcription factor interactions are the cornerstone of combinatorial control, which is a crucial aspect of the gene regulatory system. Understanding and predicting transcription factor interactions based on their sequence alone is difficult since they are often part of families of factors sharing high sequence identity. Given the scarcity of experimental data on interactions compared to available sequence data, however, it would be most useful to have accurate methods for the prediction of such interactions.
Results: We present a method consisting of a Random Forest-based feature-selection procedure that selects relevant motifs out of a set found using a correlated motif search algorithm. Prediction accuracy for several transcription factor families (bZIP, MADS, homeobox and forkhead) reaches 60–90%. In addition, we identified those parts of the sequence that are important for the interaction specificity, and show that these are in agreement with available data. We also used the predictors to perform genome-wide scans for interaction partners and recovered both known and putative new interaction partners.
Contact: roeland.vanham{at}wur.nl
Supplementary information: Supplementary data are available at Bioinformatics online.
Associate Editor: Limsoon Wong
Received on July 12, 2007; revised on October 13, 2007; accepted on October 19, 2007