Bioinformatics Advance Access originally published online on August 5, 2004
Bioinformatics 2004 20(18):3516-3525; doi:10.1093/bioinformatics/bth438
| ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
Bioinformatics vol. 20 issue 18 © Oxford University Press 2004; all rights reserved.
Comparative analysis of methods for representing and searching for transcription factor binding sites


Department of Computer Science & Lewis-Sigler Institute for Integrative Genomics, Princeton University, Princeton, NJ 08544, USA
Received on May 15, 2004; revised on July 2, 2004; accepted on July 22, 2004
Advance Access Publication August 5, 2004
Motivation: An important step in unravelling the transcriptional regulatory network of an organism is to identify, for each transcription factor, all of its DNA binding sites. Several approaches are commonly used in searching for a transcription factor's binding sites, including consensus sequences and position-specific scoring matrices. In addition, methods that compute the average number of nucleotide matches between a putative site and all known sites can be employed. Such basic approaches can all be naturally extended by incorporating pairwise nucleotide dependencies and per-position information content. In this paper, we evaluate the effectiveness of these basic approaches and their extensions in finding binding sites for a transcription factor of interest without erroneously identifying other genomic sequences.
Results: In cross-validation testing on a dataset of Escherichia coli transcription factors and their binding sites, we show that there are statistically significant differences in how well various methods identify transcription factor binding sites. The use of per-position information content improves the performance of all basic approaches. Furthermore, including local pairwise nucleotide dependencies within binding site models results in statistically significant performance improvements for approaches based on nucleotide matches. Based on our analysis, the best results when searching for DNA binding sites of a particular transcription factor are obtained by methods that incorporate both information content and local pairwise correlations.
Availability: The software is available at http://compbio.cs.princeton.edu/bindsites
Contact: msingh{at}cs.princeton.edu
* To whom correspondence should be addressed.
The authors wish it to be known that, in their opinion, the first two authors should be regarded as joint First Authors.
![]()
CiteULike
Connotea
Del.icio.us What's this?
This article has been cited by other articles:
![]() |
C. Yanover, M. Singh, and E. Zaslavsky M are better than one: an ensemble-based motif finder and its application to regulatory element prediction Bioinformatics, April 1, 2009; 25(7): 868 - 874. [Abstract] [Full Text] [PDF] |
||||
![]() |
A. V. Persikov, R. Osada, and M. Singh Predicting DNA recognition by Cys2His2 zinc finger proteins Bioinformatics, January 1, 2009; 25(1): 22 - 29. [Abstract] [Full Text] [PDF] |
||||
![]() |
S. Hannenhalli Eukaryotic transcription factor binding sites--modeling and integrative search methods Bioinformatics, June 1, 2008; 24(11): 1325 - 1331. [Abstract] [Full Text] [PDF] |
||||
![]() |
R. Johnson, R. J. Gamblin, L. Ooi, A. W. Bruce, I. J. Donaldson, D. R. Westhead, I. C. Wood, R. M. Jackson, and N. J. Buckley Identification of the REST regulon reveals extensive transposable element-mediated binding site duplication Nucleic Acids Res., September 1, 2006; 34(14): 3862 - 3877. [Abstract] [Full Text] [PDF] |
||||
![]() |
J. Hu, B. Li, and D. Kihara Limitations and potentials of current motif discovery algorithms Nucleic Acids Res., September 2, 2005; 33(15): 4899 - 4913. [Abstract] [Full Text] [PDF] |
||||

