Skip Navigation


Bioinformatics Advance Access originally published online on April 6, 2005
Bioinformatics 2005 21(12):2909-2911; doi:10.1093/bioinformatics/bti425
This Article
Right arrow Abstract Freely available
Right arrow FREE Full Text (Print PDF) Freely available
Right arrow All Versions of this Article:
21/12/2909    most recent
bti425v1
Right arrow Alert me when this article is cited
Right arrow Alert me if a correction is posted
Services
Right arrow Email this article to a friend
Right arrow Similar articles in this journal
Right arrow Similar articles in ISI Web of Science
Right arrow Similar articles in PubMed
Right arrow Alert me to new issues of the journal
Right arrow Add to My Personal Archive
Right arrow Download to citation manager
Right arrow Search for citing articles in:
ISI Web of Science (1)
Right arrowRequest Permissions
Google Scholar
Right arrow Articles by Che, D.
Right arrow Articles by Liu, J. S.
Right arrow Search for Related Content
PubMed
Right arrow PubMed Citation
Right arrow Articles by Che, D.
Right arrow Articles by Liu, J. S.
Social Bookmarking
 Add to CiteULike   Add to Connotea   Add to Del.icio.us  
What's this?

© The Author 2005. Published by Oxford University Press. All rights reserved. For Permissions, please email: journals.permissions{at}oupjournals.org

BEST: Binding-site Estimation Suite of Tools

Dongsheng Che 1,*, Shane Jensen 2, Liming Cai 1 and Jun S. Liu 3,*

1Department of Computer Science, University of Georgia Athens, GA 30602, USA
2Department of Statistics, The Wharton School, University of Pennsylvania Philadelphia, PA 19102, USA
3Department of Statistics, Harvard University Cambridge, MA 02138-2931, USA

*To whom correspondence should be addressed.


    Abstract
 TOP
 Abstract
 INTRODUCTION
 THE BEST SYSTEM
 IMPLEMENTATION
 APPLICATIONS OF BEST
 REFERENCES
 

Summary: The purpose of our Binding-site Estimation Suite of Tools (BEST) is two-fold: to provide a platform for using and comparing different motif-finding programs for transcription factor binding site prediction, and to improve the accuracy of these predictions by further optimization. Our software package BEST includes four commonly used motif-finding programs: AlignACE, BioProspector, CONSENSUS and MEME, as well as the optimization program BioOptimizer. BEST allows the user to run programs either separately or sequentially and manages all programs by automating the common inputs and the optimization procedure. The BEST system was implemented in Qt, a C++ application development framework, and was compiled and executed on Linux operating systems.

Availability: BEST is available for download at http://www.cs.uga.edu/~che/BEST and http://www.fas.harvard.edu/~junliu/BEST

Contact: dsche{at}uga.edu, jliu{at}stat.harvard.edu


    INTRODUCTION
 TOP
 Abstract
 INTRODUCTION
 THE BEST SYSTEM
 IMPLEMENTATION
 APPLICATIONS OF BEST
 REFERENCES
 
A transcription factor binding site (TFBS) is a short DNA segment typically located upstream of a gene and is recognized by a specific transcription factor (TF). Accurate identification of these binding sites contributes to our understanding of gene regulatory networks and aids in the design of biological experiments. Experimental procedures for TFBS determination such as DNase footprinting (Galas and Schmitz, 1978) and gel-shift assays (Garner and Revzin, 1981) are reliable, but are time-consuming and expensive, making them impractical for genome-wide transcription regulation studies. In the past decade, many computational tools have been developed for TFBS prediction. The premise of these computational methods is that the upstream sequences of co-regulated genes tend to be ‘enriched’ with binding sites recognizable by one or a few TFs. Thus, the computational problem is to find over-represented sequence patterns, called motifs, in these upstream sequences.

Existing motif-finding algorithms can be categorized by their underlying computational procedure. AlignACE (Roth et al., 1998), BioProspector (Liu et al., 2001) and Gibbs motif sampler (Liu et al., 1995) are based on a Bayesian statistical model and Gibbs sampling methods (Lawrence et al., 1993; Jensen et al., 2004). MEME (Bailey and Elkan, 1994) uses the EM-algorithm to find the maximum likelihood estimates for a similar statistical model. CONSENSUS (Hertz and Stormo, 1999) uses a sequential search strategy to find motifs that optimize an information-based scoring function. CUBIC (Olman et al., 2003) formulates the motif prediction problem as a cluster identification problem. MDscan (Liu et al., 2002) combines word enumeration with the Bayesian modeling approach.

However, for a particular dataset, these different algorithms often provide different motif predictions, and no algorithm has been shown to dominate others in all cases. Jensen and Liu (2004) developed BioOptimizer to compare and further refine the output of any motif-finding algorithm. BioOptimizer can handle unknown site abundance and unknown motif width, and can also optimize two-block motifs with variable-length gaps. Using BioOptimizer in conjunction with AlignACE, BioProspector, CONSENSUS or MEME was shown in simulation and real data studies to be superior to using any one of these programs alone.


    THE BEST SYSTEM
 TOP
 Abstract
 INTRODUCTION
 THE BEST SYSTEM
 IMPLEMENTATION
 APPLICATIONS OF BEST
 REFERENCES
 
Our software package BEST includes a graphic user interface to integrate BioOptimizer with several motif-finding programs, thereby allowing users to compare their results on a single platform. The BEST system contains two core components: the motif-finding programs supported by BEST (currently AlignACE, BioProspector, CONSENSUS and MEME) and the corresponding BioOptimizer algorithm for each program (Fig. 1). If the user is only interested in a particular program, they can set up the fields and click the ‘run’ button for that program. Alternatively, the user can use several programs and compare their results by setting up the GUIs for each one. The ‘run’ button in the main window is used to trigger the workflow engine, which runs the selected programs sequentially.



View larger version (14K):
[in this window]
[in a new window]
 
Fig. 1 Architectural outline of the BEST system. AlignACE, BioProspector, CONSENSUS and MEME are the motif-finding programs, while BioOptimizerA, BioOptimizerB, BioOptimizerC and BioOptimizerM are their corresponding optimization programs. Each program accepts inputs via their GUI, and the output of each motif-finding program is used as input to Biooptimizer.

 
BEST treats input variables such as the sequence file and the motif width as global parameters, so that inputs into a single GUI are automatically fed to other programs. Outputs of each motif-finding program are automatically filled into the corresponding BioOptimizer GUI. BEST accepts sequence files in the FASTA format, which is the default format for AlignACE, BioOptimizer, BioProspector and MEME. CONSENSUS uses a specialized sequence format that is generated by BEST when needed. The ‘Help’ menu in the main window contains detailed information about the plugged-in programs. The color of the button for each program indicates the status of that program—with green indicating running and pink indicating finish.

A limitation of most motif-finding programs is that the user must provide a fixed motif width, which is not usually known a priori. BEST addresses this issue by accepting a range of possible widths and lets BioOptimizer to rank the predictions. In the final output, BEST produces a table for all reported motifs sorted according to their BioOptimizer scores. For each of the top 10 motifs, the table provides the motif score, width, number of predicted sites and consensus sequence from both the original motif-finding program and BioOptimizer.


    IMPLEMENTATION
 TOP
 Abstract
 INTRODUCTION
 THE BEST SYSTEM
 IMPLEMENTATION
 APPLICATIONS OF BEST
 REFERENCES
 
BEST was implemented in Qt, a complete C++ application development framework, for several reasons. Qt is easy to use and provides a class library and many tools. It supports multiple platforms, including Unix/Linux and Windows platforms. The current version of BEST was compiled and executed on Linux operating systems since many of the plugged-in executable motif-finding programs operate in such systems. Programs included in BEST are AlignACE3.0, BioProspector, CONSENSUS v6c and MEME.3.0.8, as well as BioOptimizer versions for each.


    APPLICATIONS OF BEST
 TOP
 Abstract
 INTRODUCTION
 THE BEST SYSTEM
 IMPLEMENTATION
 APPLICATIONS OF BEST
 REFERENCES
 
RDS1 binding sites prediction
This dataset contains upstream sequences of 49 genes selected by a ChIP-chip experiment (Harbison et al., 2004) as targets of transcription factor RDS1. The main window of BEST (Fig. 2A) contains a button for each of the four motif-finding programs, as well as a button for BioOptimizer and a global ‘run’ button. Figure 2B shows the GUI of the BioProspector program. For this application, we used a one-block model with a width range of 8–12 bps, while all other parameters were left as defaults. The same parameters were used in the GUIs of the other three programs. All entered information can be saved by clicking the ‘Save’ button. The buttons for all plugged-in programs in the main window are green, indicating that the program is ready to run. After the ‘run’ button in the main window is clicked, the workflow engine triggers all programs to run sequentially, and is completed when the green buttons change color to pink. A portion of the program output is displayed in Figure 2D. This table shows that the best scored motif is ‘gactCGGCCGag’ [pattern ‘kCGGCCGa’ was reported in Harbison et al. (2004)].



View larger version (60K):
[in this window]
[in a new window]
 
Fig. 2 Sample inputs and outputs for BEST. (A) Main window of the BEST program. (B) BioProspector GUI with RDS1 sequence file. (C) BioOptimizer GUI with parameters automatically filled based on the BioProspector GUI. (D) Snapshot of the summary output for RDS1 motifs. (E) Snapshot of the summary output for STB4 motifs.

 
STB4 binding sites
Our second dataset consists of 28 regulatory sequences bound by STB4. The motif pattern ‘TCGgnnCGA’ has been predicted and validated in Harbison et al. (2004). We ran all four motif-finding programs within BEST using a one-block model with a motif width ranging from 7 to 11 bps. The combined output (Fig. 2E) from the four motif-finding programs alone shows three distinct motifs. However, once optimized and ranked by BioOptimizer, the top two motifs are identical and matched the confirmed motif pattern in Harbison et al. (2004).

The foregoing two examples and other unreported ones demonstrate that BEST not only is easy and intuitive to use, but also improves TFBS prediction accuracy. BEST achieves these goals by enabling the biologist to simultaneously use, compare and combine the few most powerful motif-search programs available. An immediate future extension of our work is to include more motif-finding tools into the BEST framework.


    Acknowledgments
 
We thank the developers of each motif-finding program for allowing us to include their current versions in BEST. We also thank X Shirley Liu for helpful discussions. The work was partially supported by the NIH Grant R01-HG02518-01 and the NSF grant DMS-0244638.

Received on December 14, 2004; revised on March 24, 2005; accepted on March 31, 2005

    REFERENCES
 TOP
 Abstract
 INTRODUCTION
 THE BEST SYSTEM
 IMPLEMENTATION
 APPLICATIONS OF BEST
 REFERENCES
 

    Bailey, T.L. and Elkan, C. (1994) Fitting a mixture model by expectation maximization to discover motifs in biopolymers. Proceedings of the Second International Conference on Intelligent Systems for Molecular BiologyStanford, CA , Bethesda, MD AAAI Press, pp. 28–36.

    Galas, D.J. and Schmitz, A. (1978) A DNA footprinting: a simple method for the detection of protein–DNA binding specificity. Nucleic Acids Res., 5, 3157–3170[Abstract/Free Full Text].

    Garner, M.M. and Revzin, A. (1981) A gel electrophoresis method for quantifying the binding of proteins to specific DNA regions: application to components of the Escherichia coli lactose operon regulatory system. Nucleic Acids Res., 9, 3047–3060[Abstract/Free Full Text].

    Harbison, C.T., et al. (2004) Transcriptional regulatory code of a eukaryotic genome. Nature, 431, 99–104[CrossRef][Medline].

    Hertz, G.Z. and Stormo, G.D. (1999) Identifying DNA and protein patterns with statistically significant alignments of multiple sequences. Bioinformatics, 15, 563–577[Abstract/Free Full Text].

    Jensen, S.T. and Liu, J.S. (2004) BioOptimizer: a Bayesian scoring function approach to motif discovery. Bioinformatics, 20, 1557–1564[Abstract/Free Full Text].

    Jensen, S.T., et al. (2004) Computational discovery of regulatory binding motifs: a Bayesian perspective. Statist. Sci., 19, 188–204[CrossRef].

    Lawrence, C.E., et al. (1993) Detecting subtle sequence signals: a Gibbs sampling strategy for multiple alignment. Science, 262, 208–214[Abstract/Free Full Text].

    Liu, J.S., et al. (1995) Bayesian models for multiple local sequence alignment and Gibbs sampling strategies. J. Am. Statist. Assoc., 90, 1156–1170[CrossRef][ISI].

    Liu, X., et al. (2001) BioProspector: discovering conserved DNA motifs in upstream regulatory regions of co-expressed genes. Pac. Symp. Biocomput., 6, 127–138.

    Liu, X., et al. (2002) An algorithm for finding protein–DNA interaction sites with applications to chromatin immunoprecipitation microarray experiments. Nat. Biotechnol., 20, 835–839[ISI][Medline].

    Olman, V., et al. (2003) CUBIC: identification of regulatory binding sites through data clustering. J. Bioinform. Computat. Biol., 1, 21–40.

    Roth, F.R., et al. (1998) Finding DNA regulatory motifs within unaligned non-coding sequences clustered by whole-genome mRNA quantitation. Nat. Biotechnol., 16, 939–945[CrossRef][ISI][Medline].


Add to CiteULike CiteULike   Add to Connotea Connotea   Add to Del.icio.us Del.icio.us    What's this?


This article has been cited by other articles:


Home page
J. Bacteriol.Home page
S.-M. Jiang, N. Ishmael, J. D. Hotopp, M. Puliti, L. Tissi, N. Kumar, M. J. Cieslewicz, H. Tettelin, and M. R. Wessels
Variation in the Group B Streptococcus CsrRS Regulon and Effects on Pathogenicity
J. Bacteriol., March 15, 2008; 190(6): 1956 - 1965.
[Abstract] [Full Text] [PDF]


Home page
BioinformaticsHome page
S. Kumar and J. Dudley
Bioinformatics software for biologists in the genomics era
Bioinformatics, July 15, 2007; 23(14): 1713 - 1717.
[Abstract] [Full Text] [PDF]


Home page
Nucleic Acids ResHome page
K. A. Romer, G.-R. Kayombya, and E. Fraenkel
WebMOTIFS: automated discovery, filtering and scoring of DNA sequence motifs using multiple programs and Bayesian approaches
Nucleic Acids Res., July 13, 2007; 35(suppl_2): W217 - W220.
[Abstract] [Full Text] [PDF]


This Article
Right arrow Abstract Freely available
Right arrow FREE Full Text (Print PDF) Freely available
Right arrow All Versions of this Article:
21/12/2909    most recent
bti425v1
Right arrow Alert me when this article is cited
Right arrow Alert me if a correction is posted
Services
Right arrow Email this article to a friend
Right arrow Similar articles in this journal
Right arrow Similar articles in ISI Web of Science
Right arrow Similar articles in PubMed
Right arrow Alert me to new issues of the journal
Right arrow Add to My Personal Archive
Right arrow Download to citation manager
Right arrow Search for citing articles in:
ISI Web of Science (1)
Right arrowRequest Permissions
Google Scholar
Right arrow Articles by Che, D.
Right arrow Articles by Liu, J. S.
Right arrow Search for Related Content
PubMed
Right arrow PubMed Citation
Right arrow Articles by Che, D.
Right arrow Articles by Liu, J. S.
Social Bookmarking
 Add to CiteULike   Add to Connotea   Add to Del.icio.us  
What's this?