Skip Navigation


Bioinformatics Advance Access originally published online on February 18, 2007
Bioinformatics 2007 23(8):1032-1034; doi:10.1093/bioinformatics/btm047
This Article
Right arrow Abstract Freely available
Right arrow FREE Full Text (Print PDF) Freely available
Right arrow Supplementary data
Right arrowOA All Versions of this Article:
23/8/1032    most recent
btm047v1
Right arrow Comments: Submit a response
Right arrow Alert me when this article is cited
Right arrow Alert me when Comments are posted
Right arrow Alert me if a correction is posted
Services
Right arrow Email this article to a friend
Right arrow Similar articles in this journal
Right arrow Similar articles in PubMed
Right arrow Alert me to new issues of the journal
Right arrow Add to My Personal Archive
Right arrow Download to citation manager
Right arrow Search for citing articles in:
ISI Web of Science (3)
Google Scholar
Right arrow Articles by Papatsenko, D.
Right arrow Search for Related Content
PubMed
Right arrow PubMed Citation
Right arrow Articles by Papatsenko, D.
Social Bookmarking
 Add to CiteULike   Add to Connotea   Add to Del.icio.us  
What's this?

© 2007 The Author(s)
This is an Open Access article distributed under the terms of the Creative Commons Attribution Non-Commercial License (http://creativecommons.org/licenses/by-nc/2.0/uk/) which permits unrestricted non-commercial use, distribution, and reproduction in any medium, provided the original work is properly cited.

ClusterDraw web server: a tool to identify and visualize clusters of binding motifs for transcription factors

Dmitri Papatsenko

Department of Molecular and Cell Biology, University of California, Berkeley, Berkeley, CA 94720, USA


    ABSTRACT
 TOP
 ABSTRACT
 1 INTRODUCTION
 2 ALGORITHM
 3 RESULTS AND DISCUSSION
 ACKNOWLEDGEMENTS
 REFERENCES
 

ClusterDraw is a program aimed to identification of binding sites and binding-site clusters. Major difference of the ClusterDraw from existing tools is its ability to scan a wide range of parameter values and weigh statistical significance of all possible clusters, smaller than a selected size. The program produces graphs along with decorated FASTA files. ClusterDraw web server is available at the following URL: http://flydev.berkeley.edu/cgi-bin/cld/submit.cgi

Contact: dxp{at}berkeley.edu

Supplementary information: Supplementary data are available at Bioinformatics online.


    1 INTRODUCTION
 TOP
 ABSTRACT
 1 INTRODUCTION
 2 ALGORITHM
 3 RESULTS AND DISCUSSION
 ACKNOWLEDGEMENTS
 REFERENCES
 
Large number of programs have been developed to identify transcription regulatory regions in genomic sequences (Alkema et al., 2004; Berman et al., 2004; Frith et al., 2003; Markstein et al., 2002; Philippakis et al., 2005; Pierstorff et al., 2006; Rajewsky et al., 2002; Sinha et al., 2006; Sosinsky et al., 2003; Waleev et al., 2006). However, this important task still represents a challenge. One obstacle is the presence of large amount of non-functional binding-site matches (Papatsenko et al., 2002). Available binding motifs are imperfect and, often, thresholds in binding motif searches are not known. In addition, search for binding-site clusters may require size of the expected clusters or window size. This adds a second ambiguous parameter to the search. A statistical solution to the cluster size problem was employed by A. Wagner in r-scan analysis (Wagner, 1997, 1998, 1999; Karlin and Brendel,1992). ClusterDraw takes advantage of the r-scan algorithm, combined with an exhaustive search over a wide range of the binding site match P-values (Lifanov et al., 2003). The program calculates cluster significance from the sum l (in bases) of N – 1 consecutive distances between all N site matches present in a cluster and determines statistical significance for every possible cluster, smaller than a given size lmax. Among all overlapping clusters, the program selects those producing the best statistical scores. The described method is equivalent to a search for the best clusters in the parameter space defined by the motif match quality, size of the resolution window and position in a sequence.


    2 ALGORITHM
 TOP
 ABSTRACT
 1 INTRODUCTION
 2 ALGORITHM
 3 RESULTS AND DISCUSSION
 ACKNOWLEDGEMENTS
 REFERENCES
 
2.1 Calculating motif match P-values
Calculation of cumulative match P-value for a word is based on the score M calculated using position-weighted matrix (Prestidge and Stormo, 1993) (PWM, see Equation 1S, Supplementary Material). First, for a PWM given, the algorithm finds all possible words producing score higher or equal than the score M. Then, expected frequencies of all these words are calculated using standard approach (see Equation 2S, Supplementary Material). Sum, taken over all word frequencies for the words scoring higher or equal than M is the cumulative match probability PM corresponding to the matrix score M (see also Equations 2S–4S in Supplementary Material):


Formula 1

(1)
In this formula, RM is the rank of PWM score M among the PWM scores of all possible words for the matrix; qji is the genome frequency of a character in the ith position of a word with the score rank j; n is the total number of positions in the motif (matrix). M to P conversion tables are generated for each motif at the beginning of the program.

2.2 Calculating cluster significance
Cluster significance score E is calculated from the cluster size l, the number of matches N, the match probability cutoff P and the number of binding motifs in the search T using binomial distribution (Wagner, 1997):


Formula 2

(2)
In the case of searches with multiple-binding motifs, for a binding motif t, a matrix score cutoff Mt corresponding to the selected match probability cutoff P is calculated. Thus, for each motif, the identified matches will have the same match probability cutoff P. Given the number of binding motifs T, the probability cutoff is equal to PT (see also Supplementary Material, Equations 5S–8S).


    3 RESULTS AND DISCUSSION
 TOP
 ABSTRACT
 1 INTRODUCTION
 2 ALGORITHM
 3 RESULTS AND DISCUSSION
 ACKNOWLEDGEMENTS
 REFERENCES
 
ClusterDraw web server has a standard common gateway interface (CGI); current settings allow processing of up to 100 KB of sequence data. For a convenience of users, motifs can be entered as multiple alignments or position frequency matrices (PFMs). Basic interface provides only three options: minimal combination of binding motifs, cluster significance cutoff and background model/organism selection. Advanced interface provides options to control maximal cluster size, minimal match P-value, statistics and graphics. By default, ClusterDraw filters overlapping binding sites by finding local maxima; however, options are available to control this function and even extract overlapping sites/composite elements (Makeev et al., 2003; Waleev et al., 2006). ClusterDraw output plot is shown in Figure 1.A


Figure 1
View larger version (60K):
[in this window]
[in a new window]
[Download PowerPoint slide]
 
Fig. 1. Program output and performance. (A) ClusterDraw output: 2D plot of cluster E-values (Z-axis, color-bar) in the space of parameters defined by the position in sequence (X-axis) and match P-value cutoff (Y-axis). AHAB and ClusterDraw profiles for eve (B), h (C) and sim (D) loci from D.melanogaster and for sim locus from mosquito A.gambiae (E), using Drosophila-binding motifs. Red bars below each panel show positions of known enhancers. Numbers in the upper right corners show correlation between AHAB and ClusterDraw profiles. Both programs were run with default parameters, using identical binding motifs. Cluster significance profiles produced by both programs were re-sampled and normalized to 0–1 range.

 
To validate performance of ClusterDraw, cluster significance profiles generated by ClusterDraw were compared to profiles generated by AHAB (Rajewsky et al., 2002). The AHAB was selected as one of the algorithms less sensitive to the window size and optimized to analyze the same type of data (i.e. fly enhancers), performance of AHAB versus other programs is available (Pierstorff et al., 2006). Results of the tests were quite striking, in most cases, cluster significance profiles produced by AHAB and ClusterDraw were highly correlated (see Fig. 1); predictive power of the both algorithms was similar as well. Differences were found in the ranks of the highest scores (see arrows in Fig. 1B). One can explain the agreement between the two different programs by the fact that they both perform exhaustive local searches. AHAB identifies the best out of all possible partitions for a given set of binding motifs in a window; ClusterDraw finds the best out of all possible overlapping clusters. The considered tests demonstrate efficiency of an exhaustive search strategy in detection of regulatory regions.

Performance tests for ClusterDraw and AHAB were also run on genomic sequences from mosquito Anopheles gambia, and honeybee Apis mellifera. These sequences contained Anopheles and Apis sim enhancers, recently identified in M. Levine lab (Zinzen et al., 2006). Given binding motifs presented in Drosophila sim enhancer, the both programs were able correctly predict sim enhancers in mosquito (see Fig. 1E) and honeybee (data not shown).

Absence of the window and the match score cutoff parameters in ClusterDraw, as well as correlation of the program predictions with other programs and experimental data provides new opportunities (Clyde et al., 2003; Ochoa-Espinosa et al., 2005) in the exploration of transcription regulatory regions.


    ACKNOWLEDGEMENTS
 TOP
 ABSTRACT
 1 INTRODUCTION
 2 ALGORITHM
 3 RESULTS AND DISCUSSION
 ACKNOWLEDGEMENTS
 REFERENCES
 
Author thanks Mike Levine, who participated in algorithm improvement and provided data for testing. The work was supported by grant from Moore foundation to the Center of Integrated Genomics, University of California, Berkeley. Funding to pay the Open Access publication charges was provided by the Center for Integrative Genomics, University of California, Berkeley. The Center is supported by a grant from Moore Foundation.

Conflict of Interest: none declared.


    FOOTNOTES
 
Associate Editor: Alex Bateman

Received on October 25, 2006; revised on January 12, 2007; accepted on February 6, 2007

    REFERENCES
 TOP
 ABSTRACT
 1 INTRODUCTION
 2 ALGORITHM
 3 RESULTS AND DISCUSSION
 ACKNOWLEDGEMENTS
 REFERENCES
 

    Alkema WB, et al. MSCAN: identification of functional clusters of transcription factor binding sites. Nucleic Acids Res. (2004) 32:W195–W198.[Abstract/Free Full Text]

    Berman BP, et al. Computational identification of developmental enhancers: conservation and function of transcription factor binding-site clusters in Drosophila melanogaster and Drosophila pseudoobscura. Genome Biol. (2004) 5:R61. Epub 2004, Aug 2020.[CrossRef][Medline]

    Clyde DE, et al. A self-organizing system of repressor gradients establishes segmental complexity in Drosophila. Nature (2003) 426:849–853.[CrossRef][Medline]

    Frith MC, et al. Cluster-Buster: finding dense clusters of motifs in DNA sequences. Nucleic Acids Res. (2003) 31:3666–3668.[Abstract/Free Full Text]

    Karlin S, Brendel V. Chance and statistical significance in protein and DNA sequence analysis. Science (1992) 257:39–49.[Abstract/Free Full Text]

    Lifanov AP, et al. Homotypic regulatory clusters in Drosophila. Genome Res. (2003) 13:579–588.[Abstract/Free Full Text]

    Makeev VJ, et al. Distance preferences in the arrangement of binding motifs and hierarchical levels in organization of transcription regulatory information. Nucleic Acids Res. (2003) 31:6016–6026.[Abstract/Free Full Text]

    Markstein M, et al. Genome-wide analysis of clustered Dorsal binding sites identifies putative target genes in the Drosophila embryo. Proc. Natl. Acad. Sci. USA (2002) 99:763–768. Epub 2001, Dec 2018.[Abstract/Free Full Text]

    Ochoa-Espinosa A, et al. The role of binding site cluster strength in Bicoid-dependent patterning in Drosophila. Proc. Natl. Acad. Sci. USA (2005) 102:4960–4965. Epub 2005, Mar 4925.[Abstract/Free Full Text]

    Papatsenko DA, et al. Extraction of functional binding sites from unique regulatory regions: the Drosophila early developmental enhancers. Genome Res. (2002) 12:470–481.[Abstract/Free Full Text]

    Philippakis AA, et al. Modulefinder: a tool for computational discovery of cis regulatory modules. In: Pac. Symp. Biocomput. (2005) 519–530.

    Pierstorff N, et al. Identifying cis-regulatory modules by combining comparative and compositional analysis of DNA. Bioinformatics (2006) 10:10.

    Prestridge DS, Stormo G. SIGNAL SCAN 3.0: new database and program features. Comput. Appl. Biosci. (1993) 9:113–115.[Abstract/Free Full Text]

    Rajewsky N, et al. Computational detection of genomic cis-regulatory modules applied to body patterning in the early Drosophila embryo. BMC Bioinformatics (2002) 3:30. Epub 2002, Oct 2024.[CrossRef][Medline]

    Sinha S, et al. Stubb: a program for discovery and analysis of cis-regulatory modules. Nucleic Acids Res. (2006) 34:W555–W559.[Abstract/Free Full Text]

    Sosinsky A, et al. Target explorer: an automated tool for the identification of new target genes for a specified set of transcription factors. Nucleic Acids Res. (2003) 31:3589–3592.[Abstract/Free Full Text]

    Wagner A. A computational genomics approach to the identification of gene networks. Nucleic Acids Res. (1997) 25:3594–3604.[Abstract/Free Full Text]

    Wagner A. A computational "genome walk" technique to identify regulatory interactions in gene networks. In: Pac. Symp. Biocomput. (1998) 264–278.

    Wagner A. Genes regulated cooperatively by one or more transcription factors and their identification in whole eukaryotic genomes. Bioinformatics (1999) 15:776–784.[Abstract/Free Full Text]

    Waleev T, et al. Composite module analyst: identification of transcription factor binding site combinations using genetic algorithm. Nucleic Acids Res. (2006) 34:W541–W545.[Abstract/Free Full Text]

    Zinzen RP, Cande J, Ronshaugen M, Papatsenko D, Levine M. Evolution of the ventral midline in insect embryos. Dev Cell. (2006) 11:895–902.[CrossRef][Web of Science][Medline]


Add to CiteULike CiteULike   Add to Connotea Connotea   Add to Del.icio.us Del.icio.us    What's this?


This article has been cited by other articles:


Home page
Nucleic Acids ResHome page
D. Papatsenko, Y. Goltsev, and M. Levine
Organization of developmental enhancers in the Drosophila embryo
Nucleic Acids Res., September 1, 2009; 37(17): 5665 - 5677.
[Abstract] [Full Text] [PDF]


Home page
BioinformaticsHome page
U. J. Pape, H. Klein, and M. Vingron
Statistical detection of cooperative transcription factors with similarity adjustment
Bioinformatics, August 15, 2009; 25(16): 2103 - 2109.
[Abstract] [Full Text] [PDF]


This Article
Right arrow Abstract Freely available
Right arrow FREE Full Text (Print PDF) Freely available
Right arrow Supplementary data
Right arrowOA All Versions of this Article:
23/8/1032    most recent
btm047v1
Right arrow Comments: Submit a response
Right arrow Alert me when this article is cited
Right arrow Alert me when Comments are posted
Right arrow Alert me if a correction is posted
Services
Right arrow Email this article to a friend
Right arrow Similar articles in this journal
Right arrow Similar articles in PubMed
Right arrow Alert me to new issues of the journal
Right arrow Add to My Personal Archive
Right arrow Download to citation manager
Right arrow Search for citing articles in:
ISI Web of Science (3)
Google Scholar
Right arrow Articles by Papatsenko, D.
Right arrow Search for Related Content
PubMed
Right arrow PubMed Citation
Right arrow Articles by Papatsenko, D.
Social Bookmarking
 Add to CiteULike   Add to Connotea   Add to Del.icio.us  
What's this?