Bioinformatics Advance Access originally published online on October 27, 2004
Bioinformatics 2005 21(7):922-931; doi:10.1093/bioinformatics/bti083
| ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
Vocabulon: a dictionary model approach for reconstruction and localization of transcription factor binding sites
1Departments of Human Genetics and Statistics, UCLA Los Angeles, CA 90095-7088, USA
2Department of Chemical Engineering, UCLA Los Angeles, CA 90095, USA
3Department of Biomathematics, UCLA Los Angeles, CA 90095, USA
*To whom correspondence should be addressed at Department of Human Genetics, UCLA School of Medicine, 695 Charles E. Young Drive South, Los Angeles, CA 90095-7088, USA.
Motivation: Gene expression arrays enable measurements of transcription values for a large number or all genes in the genome. In order to better interpret these resluts and to use them to reconstruct transcription networks, information on location of binding sites for regulatory proteins in the entire genome is needed. In particular, this represents an open problem in Escherichia coli.
Results: We describe the first implementation of dictionary-style models to the study of transcription factors binding sites in an entire genome. Vocabulon's unique feature is that it can both reconstruct binding sites characterized by unknown motifs and impute locations of known binding sites in long sequences by simultaneous search. On one hand, the dictionary model specifies a probability for the entire sequence taking simultaneously into account all the possible binding sites. This greatly reduces the number of false positives. On the other hand, the possibility of refining motif description, as an increasig number of binding sites are identified, augments the sensitivity of the method. We illustrate these properties with examples in E.coli. The results of gene expression arrays are used both to guide the search and corroborate it.
Availability: For copy of the Vocabulon program and other details please contact csabatti{at}mednet.ucla.edu
Contact: csabatti{at}mednet.ucla.edu
![]()
CiteULike
Connotea
Del.icio.us What's this?
This article has been cited by other articles:
![]() |
C. Sabatti and G. M. James Bayesian sparse hidden components analysis for transcription regulation networks Bioinformatics, March 15, 2006; 22(6): 739 - 746. [Abstract] [Full Text] [PDF] |
||||
