Statistical mechanical modeling of genome-wide transcription factor occupancy data by MatrixREDUCE
1 Department of Biological Sciences, Columbia University New York, NY 10027, USA
2 Center for Studies in Physics and Biology, The Rockefeller University New York, NY 10021, USA
3 Center for Computational Biology and Bioinformatics, Columbia University New York, NY 10032, USA
*To whom correspondence should be addressed.
Motivation: Regulation of gene expression by a transcription factor requires physical interaction between the factor and the DNA, which can be described by a statistical mechanical model. Based on this model, we developed the MatrixREDUCE algorithm, which uses genome-wide occupancy data for a transcription factor (e.g. ChIP-chip) and associated nucleotide sequences to discover the sequence-specific binding affinity of the transcription factor. Advantages of our approach are that the information for all probes on the microarray is efficiently utilized because there is no need to delineate "bound" and "unbound" sequences, and that, unlike information content-based methods, it does not require a background sequence model.
Results: We validated the performance of MatrixREDUCE by inferring the sequence-specific binding affinities for several transcription factors in S. cerevisiae and comparing the results with three other independent sources of transcription factor sequence-specific affinity information: (i) experimental measurement of transcription factor binding affinities for specific oligonucleotides, (ii) reporter gene assays for promoters with systematically mutated binding sites, and (iii) relative binding affinities obtained by modeling transcription factor-DNA interactions based on co-crystal structures of transcription factors bound to DNA substrates. We show that transcription factor binding affinities inferred by MatrixREDUCE are in good agreement with all three validating methods.
Availability: MatrixREDUCE source code is freely available for non-commercial use at http://www.bussemakerlab.org/. The software runs on Linux, Unix, and Mac OS X.
Contact: Harmen.Bussemaker{at}columbia.edu
This article has been cited by other articles:
![]() |
F. Cordero, M. Botta, and R. A. Calogero Microarray data analysis and mining approaches Brief Funct Genomic Proteomic, January 22, 2008; (2008) elm034v1. [Abstract] [Full Text] [PDF] |
||||
![]() |
B. C. Foat, R. G. Tepper, and H. J. Bussemaker TransfactomeDB: a resource for exploring the nucleotide sequence specificity and condition-specific regulatory activity of trans-acting factors Nucleic Acids Res., January 11, 2008; 36(suppl_1): D125 - D131. [Abstract] [Full Text] [PDF] |
||||
![]() |
X. Chen, T. R. Hughes, and Q. Morris RankMotif++: a motif-search algorithm that accounts for relative ranks of K-mers in binding transcription factors Bioinformatics, July 1, 2007; 23(13): i72 - i79. [Abstract] [Full Text] [PDF] |
||||
![]() |
A. V. Morozov and E. D. Siggia Connecting protein structure with predictions of regulatory sites PNAS, April 24, 2007; 104(17): 7068 - 7073. [Abstract] [Full Text] [PDF] |
||||
![]() |
H. G. Roider, A. Kanhere, T. Manke, and M. Vingron Predicting transcription factor affinities to DNA from a biophysical model Bioinformatics, January 15, 2007; 23(2): 134 - 141. [Abstract] [Full Text] [PDF] |
||||



