Bioinformatics Vol. 17 no. 90001 2001
Pages S207-S214
© 2001 Oxford University Press
An algorithm for finding signals of unknown length in DNA sequences
1 Department of Computer Science, Systems
and Communication, University of MilanBicocca, Via Bicocca
degli Arcimboldi 8, Milan, I-20126, Italy
2 Department of General Physiology and
Biochemistry, University of Milan, Via Celoria 20, Milan, I-20133,
Italy
Received on February 5, 2001
; revised on April 3, 2001
; accepted on April 3, 2001
Pattern discovery in unaligned DNA sequences is a
challenging problem in both computer science and molecular biology.
Several different methods and techniques have been proposed so far,
but in most of the cases signals in DNA sequences are very
complicated and avoid detection. Exact exhaustive methods can solve
the problem only for short signals with a limited number of
mutations. In this work, we extend exhaustive enumeration also to
longer patterns. More in detail, the basic version of algorithm
presented in this paper, given as input a set of sequences and an
error ratio
< 1, finds all patterns that occur in at
least q sequences of the set with at most
m mutations, where m is the length of the pattern.
The only restriction is imposed on the location of mutations along
the signal. That is, a valid occurrence of a pattern can present at
most 
i
mismatches in the first
i nucleotides, and so on. However, we show how the algorithm
can be used also when no assumption can be made on the position of
mutations. In this case, it is also possible to have an estimate of
the probability of finding a signal according to the signal length,
the error ratio, and the input parameters. Finally, we discuss some
significance measures that can be used to sort the patterns output by
the algorithm.
Contact: pavesi{at}disco.unimib.it
![]()
CiteULike
Connotea
Del.icio.us What's this?
This article has been cited by other articles:
![]() |
H.-b. Yu, G. Kunarso, F. H. Hong, and L. W. Stanton Zfp206, Oct4, and Sox2 Are Integrated Components of a Transcriptional Regulatory Network in Embryonic Stem Cells J. Biol. Chem., November 6, 2009; 284(45): 31327 - 31335. [Abstract] [Full Text] [PDF] |
||||
![]() |
M. Defrance and J. van Helden info-gibbs: a motif discovery algorithm that directly optimizes information content during sampling Bioinformatics, October 15, 2009; 25(20): 2715 - 2722. [Abstract] [Full Text] [PDF] |
||||
![]() |
Y. Halperin, C. Linhart, I. Ulitsky, and R. Shamir Allegro: Analyzing expression and sequence in concert to discover regulatory programs Nucleic Acids Res., April 1, 2009; 37(5): 1566 - 1579. [Abstract] [Full Text] [PDF] |
||||
![]() |
E. Wijaya, S.-M. Yiu, N. T. Son, R. Kanagasabai, and W.-K. Sung MotifVoter: a novel ensemble method for fine-grained integration of generic motif finders Bioinformatics, October 15, 2008; 24(20): 2288 - 2295. [Abstract] [Full Text] [PDF] |
||||
![]() |
C. Linhart, Y. Halperin, and R. Shamir Transcription factor and microRNA motif discovery: The Amadeus platform and a compendium of metazoan target sets Genome Res., July 1, 2008; 18(7): 1180 - 1189. [Abstract] [Full Text] [PDF] |
||||
![]() |
A. Vandenbon, Y. Miyamoto, N. Takimoto, T. Kusakabe, and K. Nakai Markov Chain-based Promoter Structure Modeling for Tissue-specific Expression Pattern Prediction DNA Res, February 7, 2008; (2008) dsm034v1. [Abstract] [Full Text] [PDF] |
||||
![]() |
F. Cordero, M. Botta, and R. A. Calogero Microarray data analysis and mining approaches Brief Funct Genomic Proteomic, January 22, 2008; (2008) elm034v1. [Abstract] [Full Text] [PDF] |
||||
![]() |
E. Wijaya, K. Rajaraman, S.-M. Yiu, and W.-K. Sung Detection of generic spaced motifs using submotif pattern mining Bioinformatics, June 15, 2007; 23(12): 1476 - 1485. [Abstract] [Full Text] [PDF] |
||||
![]() |
T. Frickey and G. Weiller Mclip: motif detection based on cliques of gapped local profile-to-profile alignments Bioinformatics, February 15, 2007; 23(4): 502 - 503. [Abstract] [Full Text] [PDF] |
||||
![]() |
K. I. Zeller, X. Zhao, C. W. H. Lee, K. P. Chiu, F. Yao, J. T. Yustein, H. S. Ooi, Y. L. Orlov, A. Shahab, H. C. Yong, et al. Global mapping of c-Myc binding sites and target gene networks in human B cells PNAS, November 21, 2006; 103(47): 17834 - 17839. [Abstract] [Full Text] [PDF] |
||||
![]() |
D. GuhaThakurta Computational identification of transcriptional regulatory elements in DNA sequence Nucleic Acids Res., July 19, 2006; 34(12): 3585 - 3598. [Abstract] [Full Text] [PDF] |
||||
![]() |
G. Pavesi, P. Mereghetti, F. Zambelli, M. Stefani, G. Mauri, and G. Pesole MoD Tools: regulatory motif discovery in nucleotide sequences from co-regulated or homologous genes. Nucleic Acids Res., July 1, 2006; 34(Web Server issue): W566 - W570. [Abstract] [Full Text] [PDF] |
||||
![]() |
L. S Hon and A. N Jain A deterministic motif finding algorithm with application to the human genome Bioinformatics, May 1, 2006; 22(9): 1047 - 1054. [Abstract] [Full Text] [PDF] |
||||
![]() |
G. Pavesi, P. Mereghetti, G. Mauri, and G. Pesole Weeder Web: discovery of transcription factor binding sites in a set of sequences from co-regulated genes Nucleic Acids Res., July 1, 2004; 32(suppl_2): W199 - W203. [Abstract] [Full Text] [PDF] |
||||
![]() |
S. Sinha and M. Tompa Discovery of novel transcription factor binding sites by statistical overrepresentation Nucleic Acids Res., December 15, 2002; 30(24): 5549 - 5560. [Abstract] [Full Text] [PDF] |
||||






