Bioinformatics Advance Access published online on October 27, 2005
Bioinformatics, doi:10.1093/bioinformatics/bti745
1 Department of Chemical Engineering, Massachusetts Institute of Technology, Cambridge, MA 02139, USA
* To whom correspondence should be addressed.
Motivation: Motif discovery in sequential data is a problem of great interest and with many applications. However, previous methods have been unable to combine exhaustive search with complex motif representations and are each typically only applicable to a certain class of problems. Results: Here we present a GEneric MOtif DIscovery Algorithm (Gemoda) for sequential data. Gemoda can be applied to any dataset with a sequential character, including both categorical and real-valued data. As we show, Gemoda deterministically discovers motifs that are maximal in composition and length. As well, the algorithm allows any choice of similarity metric for finding motifs. Finally, Gemoda's output motifs are representation-agnostic: they can be represented using regular expressions, position weight matrices, or any number of other models for any type of sequential data. We demonstrate a number of applications of the algorithm, including the discovery of motifs in amino acids sequences, a new solution to the (l,d)-motif problem in DNA sequences, and the discovery of conserved protein sub-structures. Availability: Gemoda is freely available at http://web.mit.edu/bamel/gemoda. Supplementary Information: Available at http://web.mit.edu/bamel/gemoda.
Received August 4, 2005
Revised October 14, 2005
Accepted October 24, 2005
Article
A generic motif discovery algorithm for sequential data
2 Department of Chemical Engineering, Massachusetts Institute of Technology, Cambridge, MA 02139, USA; IBM Research Division, Thomas J. Watson Research Center, Yorktown Heights, NY 10598, USA
Gregory N. Stephanopoulos, E-mail: gregstep{at}mit.edu
![]()
Abstract ![]()
CiteULike
Connotea
Del.icio.us What's this?
This article has been cited by other articles:
![]() |
E. Wijaya, K. Rajaraman, S.-M. Yiu, and W.-K. Sung Detection of generic spaced motifs using submotif pattern mining Bioinformatics, June 15, 2007; 23(12): 1476 - 1485. [Abstract] [Full Text] [PDF] |
||||
