Bioinformatics Advance Access published online on November 17, 2007
Bioinformatics, doi:10.1093/bioinformatics/btm539
Predicting and understanding transcription factor interactions based on sequence level determinants of combinatorial control
1 Applied Bioinformatics, PRI, Wageningen UR, Droevendaalsesteeg 1, Wageningen, The Netherlands
2 Biometris, Wageningen UR, Bornsesteeg 47, Wageningen, The Netherlands
3 Bioscience, PRI, Wageningen UR, Droevendaalsesteeg 1, Wageningen, The Netherland
*To whom correspondence should be addressed. Dr. R.C.H.J. van Ham, E-mail: roeland.vanham{at}wur.nl
| Abstract |
|---|
Motivation: Transcription factor interactions are the cornerstone of combinatorial control, which is a crucial aspect of the gene regulatory system. Understanding and predicting transcription factor interactions based on their sequence alone is difficult since they are often part of families of factors sharing high sequence identity. Given the scarcity of experimental data on interactions compared to available sequence data, however, it would be most useful to have accurate methods for the prediction of such interactions.
Results:We present a method consisting of a Random Forest-based feature selection procedure that selects relevant motifs out of a set found using a correlated motif search algorithm. Prediction accuracy for several transcription factor families (bZIP, MADS, homeobox and forkhead) reaches 60% to 90%. In addition, we identified those parts of the sequence that are important for the interaction specificity, and show that these are in agreement with available data. We also used the predictors to perform genome wide scans for interaction partners and recovered both known and putative new interaction partners.
Associate Editor: Dr. Limsoon Wong
Received on July 12, 2007; revised on October 13, 2007; accepted on October 19, 2007