Bioinformatics Advance Access originally published online on December 6, 2005
Bioinformatics 2006 22(4):423-429; doi:10.1093/bioinformatics/bti815
| ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
A hypothesis-based approach for identifying the binding specificity of regulatory proteins from chromatin immunoprecipitation data
1MIT Computer Science and Artificial Intelligence Laboratory 32 Vassar Street, Cambridge, MA 02139, USA
2Whitehead Institute for Biomedical Research, Nine Cambridge Center Cambridge, MA 02142, USA
3Department of Biology, Massachusetts Institute of Technology Cambridge, MA 02139, USA
4Division of Biological Engineering, Massachusetts Institute of Technology Cambridge, MA 02139, USA
*To whom correspondence should be addressed.
Motivation: Genome-wide chromatin-immunoprecipitation (ChIP-chip) detects binding of transcriptional regulators to DNA in vivo at low resolution. Motif discovery algorithms can be used to discover sequence patterns in the bound regions that may be recognized by the immunoprecipitated protein. However, the discovered motifs often do not agree with the binding specificity of the protein, when it is known.
Results: We present a powerful approach to analyzing ChIP-chip data, called THEME, that tests hypotheses concerning the sequence specificity of a protein. Hypotheses are refined using constrained local optimization. Cross-validation provides a principled standard for selecting the optimal weighting of the hypothesis and the ChIP-chip data and for choosing the best refined hypothesis. We demonstrate how to derive hypotheses for proteins from 36 domain families. Using THEME together with these hypotheses, we analyze ChIP-chip datasets for 14 human and mouse proteins. In all the cases the identified motifs are consistent with the published data with regard to the binding specificity of the proteins.
Availability: THEME is freely available for download.
Contact: fraenkel-admin{at}mit.edu
Supplementary information: http://fraenkel.mit.edu/THEME
Received on September 14, 2005; revised on November 14, 2005; accepted on December 1, 2005
This article has been cited by other articles:
![]() |
S. M. Li, J. Wakefield, and S. Self A transdimensional Bayesian model for pattern recognition in DNA sequences Biostat., March 18, 2008; (2008) kxm058v1. [Abstract] [Full Text] [PDF] |
||||
![]() |
C. D. McCabe, D. D. Spyropoulos, D. Martin, and C. S. Moreno Genome-Wide Analysis of the Homeobox C6 Transcriptional Network in Prostate Cancer Cancer Res., March 15, 2008; 68(6): 1988 - 1996. [Abstract] [Full Text] [PDF] |
||||
![]() |
S. Mahony and P. V. Benos STAMP: a web tool for exploring DNA-binding motif similarities Nucleic Acids Res., July 13, 2007; 35(suppl_2): W253 - W258. [Abstract] [Full Text] [PDF] |
||||
![]() |
K. A. Romer, G.-R. Kayombya, and E. Fraenkel WebMOTIFS: automated discovery, filtering and scoring of DNA sequence motifs using multiple programs and Bayesian approaches Nucleic Acids Res., July 13, 2007; 35(suppl_2): W217 - W220. [Abstract] [Full Text] [PDF] |
||||
![]() |
V. X. Jin, H. O'Geen, S. Iyengar, R. Green, and P. J. Farnham Identification of an OCT4 and SRY regulatory module using integrated computational and experimental genomics approaches Genome Res., June 1, 2007; 17(6): 807 - 817. [Abstract] [Full Text] [PDF] |
||||
![]() |
A. V. Morozov and E. D. Siggia Connecting protein structure with predictions of regulatory sites PNAS, April 24, 2007; 104(17): 7068 - 7073. [Abstract] [Full Text] [PDF] |
||||
![]() |
C. M. Li and R. R. Klevecz From the Cover: A rapid genome-scale response of the transcriptional oscillator to perturbation reveals a period-doubling path to phenotypic change PNAS, October 31, 2006; 103(44): 16254 - 16259. [Abstract] [Full Text] [PDF] |
||||




