Skip Navigation



Bioinformatics Advance Access published online on March 29, 2005

Bioinformatics, doi:10.1093/bioinformatics/bti410
This Article
Right arrow Advance Access manuscript (PDF) Freely available
Right arrow All Versions of this Article:
21/11/2657    most recent
bti410v1
Right arrow Alert me when this article is cited
Right arrow Alert me if a correction is posted
Services
Right arrow Email this article to a friend
Right arrow Similar articles in this journal
Right arrow Similar articles in PubMed
Right arrow Alert me to new issues of the journal
Right arrow Add to My Personal Archive
Right arrow Download to citation manager
Right arrowRequest Permissions
Google Scholar
Right arrow Articles by Ben-Gal, I.
Right arrow Articles by Grosse, I.
Right arrow Search for Related Content
PubMed
Right arrow PubMed Citation
Right arrow Articles by Ben-Gal, I.
Right arrow Articles by Grosse, I.
Social Bookmarking
 Add to CiteULike   Add to Connotea   Add to Del.icio.us  
What's this?

© The Author (2005). Published by Oxford University Press. All rights reserved. For Permissions, please email: journals.permissions@oupjournals.org
Received September 22, 2004
Revised March 13, 2005
Accepted March 23, 2005

Article

Identification of transcription factor binding sites with variable-order Bayesian networks

I. Ben-Gal 1*, A. Shani 1, A. Gohr 2, J. Grau 2, S. Arviv 1, A. Shmilovici 3, S. Posch 4, and I. Grosse 5

1 Department of Industrial Engineering, Tel-Aviv University, Tel-Aviv, 69978, Israel
2 Institute of Computer Science, University Halle, 06099 Halle (Saale), Germany; Institute of Plant Genetics and Crop Plant Research, 06466 Gatersleben, Germany
3 Department of Information Systems Engineering, Ben-Gurion University P.O.Box 653, Beer-Sheva, 84105, Israel
4 Institute of Computer Science, University Halle, 06099 Halle (Saale), Germany
5 Institute of Plant Genetics and Crop Plant Research, 06466 Gatersleben, Germany

* To whom correspondence should be addressed.
I. Ben-Gal, E-mail: bengal{at}eng.tau.ac.il


   Abstract

Motivation: We propose a new class of variable order Bayesian network (VOBN) models for the identification of transcription factor binding sites (TFBSs). The proposed models generalize the widely-used position weight matrix (PWM) models, Markov models and Bayesian network (BN) models. In contrast to these models, where for each position a fixed subset of the remaining positions is used to model dependencies, in VOBN models these subsets may vary based on the specific nucleotides observed, which are called the context.

This flexibility turns out to be of advantage for the classification and analysis of TFBSs, as statistical dependencies between nucleotides in different TFBS positions (not necessarily adjacent) may be taken into account efficiently - in a position-specific and context-specific manner.

Results: We apply the VOBN model to a set of 238 experimentally verified sigma-70 binding sites in E.coli. We find that the VOBN model can distinguish those 238 sites from a set of 472 intergenic ‘non-promoter’ sequences with higher accuracy than fixed-order Markov models or Bayesian trees (BT). We use a replicated stratified-holdout experiments having a fixed true-negative rate of 99.9%. We find that for a foreground inhomogeneous VOBN model of order 1 and a background homogeneous variable-order Markov (VOM) model of order 5 the obtained mean true-positive (TP) rate is 47.56%. In comparison, the best TP rate for the conventional models is 44.39%, obtained from a foreground PWM model and a background 2nd-order Markov model. As the standard deviation of the estimated TP rate is ~ 0.01%, this improvement is highly significant.

Availability: All datasets are available upon request from the authors at bengal@eng.tau.ac.il. A web server for utilizing VOBN and VOM models is available at http://www.eng.tau.ac.il/~bengal/.


Add to CiteULike CiteULike   Add to Connotea Connotea   Add to Del.icio.us Del.icio.us    What's this?


This article has been cited by other articles:


Home page
Brief BioinformHome page
E. Wingender
The TRANSFAC project as an example of framework technology that supports the analysis of genomic regulation
Brief Bioinform, July 1, 2008; 9(4): 326 - 332.
[Abstract] [Full Text] [PDF]


Home page
BioinformaticsHome page
S. Sonnenburg, A. Zien, P. Philips, and G. Ratsch
POIMs: positional oligomer importance matrices--understanding support vector machine-based signal detectors
Bioinformatics, July 1, 2008; 24(13): i6 - i14.
[Abstract] [PDF]


Home page
DNA ResHome page
A. Vandenbon, Y. Miyamoto, N. Takimoto, T. Kusakabe, and K. Nakai
Markov Chain-based Promoter Structure Modeling for Tissue-specific Expression Pattern Prediction
DNA Res, February 7, 2008; (2008) dsm034v1.
[Abstract] [Full Text] [PDF]


Home page
BioinformaticsHome page
T. S. Rani, S. D. Bhavani, and R. S. Bapi
Analysis of E.coli promoter recognition problem in dinucleotide feature space
Bioinformatics, March 1, 2007; 23(5): 582 - 588.
[Abstract] [Full Text] [PDF]


Home page
Nucleic Acids ResHome page
J. Grau, I. Ben-Gal, S. Posch, and I. Grosse
VOMBAT: prediction of transcription factor binding sites using variable order Bayesian trees.
Nucleic Acids Res., July 1, 2006; 34(Web Server issue): W529 - W533.
[Abstract] [Full Text] [PDF]



Disclaimer:
Please note that abstracts for content published before 1996 were created through digital scanning and may therefore not exactly replicate the text of the original print issues. All efforts have been made to ensure accuracy, but the Publisher will not be held responsible for any remaining inaccuracies. If you require any further clarification, please contact our Customer Services Department.