Bioinformatics Vol. 19 Suppl. 2 2003
pages ii206-ii214
© 2003 Oxford University Press
Finding optimal degenerate patterns in DNA sequences
1 Graduate School of Mathematics, Kyushu
University, Fukuoka 812-8581, Japan
2 Bioinformatics Center, Institute for
Chemical Research, Kyoto University, Uji, Kyoto 611-0011, Japan
3 Faculty of Mathematics, Kyushu University,
Fukuoka 812-8581, Japan
Received on March 17, 2003
; accepted on June 9, 2003
Motivation: The problem of finding transcription factor binding
sites in the upstream regions of given genes is algorithmically an
interesting and challenging problem in computational biology. A
degenerate pattern over a finite alphabet
is a
sequence of subsets of
. A string over IUPAC nucleic
acid codes is also a degenerate pattern over
= {A, C, G,
T}, and is used as one of the major patterns modeling
transcription factor binding sites in the upstream regions of genes.
However, it is known that the problem of finding a degenerate
pattern consistent with both positive and negative string sets is in
general NP-complete. Our aim is to devise a heuristic algorithm to
find a degenerate pattern which is optimal for positive and negative
string sets w.r.t. a given score function.
Results: We have proposed an enumerative algorithm called SUPERPOSITION for finding optimal degenerate patterns with a pruning technique, which works with most all reasonable score functions. The performance score of the algorithm has been compared with those of other popular motif-finding algorithms YMF, MEME and AlignACE on various sets of co-regulated genes of yeast. In the computational experiment, SUPERPOSITION has outperformed the others on several gene sets.
Availability: The python script SUPERPOSITION is available at http://www.math.kyushu-u.ac.jp/~om/softwares.html
Contact: om{at}math.kyushu-u.ac.jp