Bioinformatics Vol. 19 no. 15 2003
Pages 1952-1963
© 2003 Oxford University Press
Background rareness-based iterative multiple sequence alignment algorithm for regulatory element detection
Life Sciences Division, Oak Ridge National Laboratory, PO Box 3480, Oak Ridge, TN 37830, USA
Received on November 8, 2002
; revised on March 13, 2003
; accepted on April 25, 2003
Motivation: Experimental methods capable of generating sets of co-regulated genes have become commonplace, however, recognizing the regulatory motifs responsible for this regulation remains difficult. As a result, computational detection of transcription factor binding sites in such data sets has been an active area of research. Most approaches have utilized either Gibbs sampling or greedy strategies to identify such elements in sets of sequences. These existing methods have varying degrees of success depending on the strength and length of the signals and the number of available sequences. We present a new deterministic iterative algorithm for regulatory element detection based on a Markov chain background. As in other methods, sequences in the entire genome and the training set are taken into account in order to discriminate against commonly occurring signals and produce patterns, which are significant in the training set.
Results: The results of the algorithm compare favorably with existing tools on previously known and newly compiled data sets. The iteration based search appears rather rigorous, not only finding the binding sites, but also showing how the binding site stands out from genomic background. The approach used to score the results is critical and a discussion of various scoring schemes and options is also presented. Benchmarking of several methods shows that while most tools are good at detecting strong signals, Gibbs sampling algorithms give inconsistent results when the regulatory element signal becomes weak. A Markov chain based background model alleviates the drawbacks of MAP (maximum a posteriori log likelihood) scores.
Availability: Available on request from the authors.
Supplementary information: Data and the results presented in this paper are available on the web at http://compbio.ornl.gov/mira/index.html
Contact: uberbacherec{at}ornl.gov
* To whom correspondence should be addressed.
![]()
CiteULike
Connotea
Del.icio.us What's this?
This article has been cited by other articles:
![]() |
G. Pavesi, P. Mereghetti, G. Mauri, and G. Pesole Weeder Web: discovery of transcription factor binding sites in a set of sequences from co-regulated genes Nucleic Acids Res., July 1, 2004; 32(suppl_2): W199 - W203. [Abstract] [Full Text] [PDF] |
||||
