Bioinformatics Advance Access published online on May 5, 2007
Bioinformatics, doi:10.1093/bioinformatics/btm118
| ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
Detection of generic spaced motifs using submotif pattern mining
aInstitute for Infocomm Research , 21 Heng Mui Keng Terrace , Singapore 119613.
bSchool of Computing , National University of Singapore , Singapore 119260.
cGenome Institute of Singapore , 60 Biopolis Street , #02-01 Genome , Singapore 138672.
dDepartment of Computer Science, The University of Hong Kong, Pokfulam Road, Hong Kong.
*To whom correspondence should be addressed. Wing-Kin Sung, E-mail: ksung{at}comp.nus.edu.sg
| Abstract |
|---|
Motivation: Identification of motifs is one of the critical stages in studying the regulatory interactions of genes. Motifs can have complicated patterns. In particular, spaced motifs, an important class of motifs, consist of several short segments separated by spacers of different lengths. Locating spaced motifs is not trivial. Existing motif-finding algorithms are either designed for monad motifs (short contiguous patterns with some mismatches) or have assumptions on the spacer lengths or can only handle at most two segments. An effective motif finder for generic spaced motifs is highly desirable.
Results: This paper proposes a novel approach for identifying spaced motifs with any number of spacers of different lengths. We introduce the notion of submotifs to capture the segments in the spaced motif and formulate the motif finding problem as a frequent submotif mining problem. We provide an algorithm called SPACE to solve the problem. Based on experiments on real biological datasets, synthetic datasets and the motif assessment benchmarks by Tompa et al., we show that our algorithm performs better than existing tools for spaced motifs with improvements in both sensitivity and specificity and for monads, SPACE performs as good as other tools.
Availability: The source code is available upon request from the authors.
Associate Editor: Prof. John Quackenbush
Received on December 5, 2006; revised on March 16, 2007; accepted on March 16, 2007
This article has been cited by other articles:
![]() |
E. Wijaya, S.-M. Yiu, N. T. Son, R. Kanagasabai, and W.-K. Sung MotifVoter: a novel ensemble method for fine-grained integration of generic motif finders Bioinformatics, October 15, 2008; 24(20): 2288 - 2295. [Abstract] [Full Text] [PDF] |
||||
![]() |
C.-Y. Chen, H.-K. Tsai, C.-M. Hsu, M.-J. May Chen, H.-G. Hung, G. T.-W. Huang, and W.-H. Li Discovering gapped binding sites of yeast transcription factors PNAS, February 19, 2008; 105(7): 2527 - 2532. [Abstract] [Full Text] [PDF] |
||||

