Skip Navigation



Bioinformatics Advance Access published online on October 27, 2004

Bioinformatics, doi:10.1093/bioinformatics/bti083
Bioinformatics © Oxford University Press 2004; all rights reserved
This Article
Right arrow Advance Access manuscript (PDF) Freely available
Right arrow All Versions of this Article:
21/7/922    most recent
bti083v1
Right arrow Alert me when this article is cited
Right arrow Alert me if a correction is posted
Services
Right arrow Email this article to a friend
Right arrow Similar articles in this journal
Right arrow Similar articles in PubMed
Right arrow Alert me to new issues of the journal
Right arrow Add to My Personal Archive
Right arrow Download to citation manager
Right arrowRequest Permissions
Google Scholar
Right arrow Articles by Sabatti, C.
Right arrow Articles by Liao, J. C.
Right arrow Search for Related Content
PubMed
Right arrow PubMed Citation
Right arrow Articles by Sabatti, C.
Right arrow Articles by Liao, J. C.
Social Bookmarking
 Add to CiteULike   Add to Connotea   Add to Del.icio.us  
What's this?

Received April 14, 2004
Revised October 4, 2004
Accepted October 6, 2004

Article

Vocabulon: a dictionary model approach for reconstruction and localization of transcription factor binding sites

Chiara Sabatti 1*, Lars Rohlin 2, Kenneth Lange 3, and James C. Liao 2

1 Department of Human Genetics, UCLA, Los Angeles CA 90095-7088; Department of Statistics, UCLA, Los Angeles CA 90095-7088
2 Department of Chemical Engineering, UCLA, Los Angeles CA 90095
3 Department of Human Genetics, UCLA, Los Angeles CA 90095-7088; Department of Statistics, UCLA, Los Angeles CA 90095-7088; Department of Biomathematics, UCLA, Los Angeles, CA 90095

* To whom correspondence should be addressed.
Chiara Sabatti, E-mail: csabatti{at}mednet.ucla.edu


   Abstract

We describe the first implementation of dictionary-style models to the study of transcription factors binding sites in an entire genome. Vocabulon's unique feature is that it can both reconstruct binding sites characterized by unknown motifs and impute locations of known binding sites in long sequences by simultaneous search. On one hand, the dictionary model specifies a probability for the entire sequence taking simultaneously into account all the possible binding sites. This greatly reduces the number of false positives. On the other hand, the possibility of refining motif description, as an increasing number of binding sites are identified, augments the sensitivity of the method. We illustrate these properties with examples in Escherichia Coli. The results of gene expression arrays are used both to guide the search and corroborate it.


Add to CiteULike CiteULike   Add to Connotea Connotea   Add to Del.icio.us Del.icio.us    What's this?


This article has been cited by other articles:


Home page
BioinformaticsHome page
C. Sabatti and G. M. James
Bayesian sparse hidden components analysis for transcription regulation networks
Bioinformatics, March 15, 2006; 22(6): 739 - 746.
[Abstract] [Full Text] [PDF]



Disclaimer:
Please note that abstracts for content published before 1996 were created through digital scanning and may therefore not exactly replicate the text of the original print issues. All efforts have been made to ensure accuracy, but the Publisher will not be held responsible for any remaining inaccuracies. If you require any further clarification, please contact our Customer Services Department.