Skip Navigation



Bioinformatics Advance Access published online on October 27, 2005

Bioinformatics, doi:10.1093/bioinformatics/bti745
This Article
Right arrow Advance Access manuscript (PDF) Freely available
Right arrow All Versions of this Article:
22/1/21    most recent
bti745v1
Right arrow Comments: Submit a response
Right arrow Alert me when this article is cited
Right arrow Alert me when Comments are posted
Right arrow Alert me if a correction is posted
Services
Right arrow Email this article to a friend
Right arrow Similar articles in this journal
Right arrow Similar articles in PubMed
Right arrow Alert me to new issues of the journal
Right arrow Add to My Personal Archive
Right arrow Download to citation manager
Right arrowRequest Permissions
Google Scholar
Right arrow Articles by Jensen, K. L.
Right arrow Articles by Stephanopoulos, G. N.
Right arrow Search for Related Content
PubMed
Right arrow PubMed Citation
Right arrow Articles by Jensen, K. L.
Right arrow Articles by Stephanopoulos, G. N.
Social Bookmarking
 Add to CiteULike   Add to Connotea   Add to Del.icio.us  
What's this?

© The Author (2005). Published by Oxford University Press. All rights reserved. For Permissions, please email: journals.permissions@oxfordjournals.org
Received August 4, 2005
Revised October 14, 2005
Accepted October 24, 2005

Article

A generic motif discovery algorithm for sequential data

Kyle L. Jensen 1, Mark P. Styczynski 1, Isidore Rigoutsos 2, and Gregory N. Stephanopoulos 1*

1 Department of Chemical Engineering, Massachusetts Institute of Technology, Cambridge, MA 02139, USA
2 Department of Chemical Engineering, Massachusetts Institute of Technology, Cambridge, MA 02139, USA; IBM Research Division, Thomas J. Watson Research Center, Yorktown Heights, NY 10598, USA

* To whom correspondence should be addressed.
Gregory N. Stephanopoulos, E-mail: gregstep{at}mit.edu


   Abstract

Motivation: Motif discovery in sequential data is a problem of great interest and with many applications. However, previous methods have been unable to combine exhaustive search with complex motif representations and are each typically only applicable to a certain class of problems.

Results: Here we present a GEneric MOtif DIscovery Algorithm (Gemoda) for sequential data. Gemoda can be applied to any dataset with a sequential character, including both categorical and real-valued data. As we show, Gemoda deterministically discovers motifs that are maximal in composition and length. As well, the algorithm allows any choice of similarity metric for finding motifs. Finally, Gemoda's output motifs are representation-agnostic: they can be represented using regular expressions, position weight matrices, or any number of other models for any type of sequential data. We demonstrate a number of applications of the algorithm, including the discovery of motifs in amino acids sequences, a new solution to the (l,d)-motif problem in DNA sequences, and the discovery of conserved protein sub-structures.

Availability: Gemoda is freely available at http://web.mit.edu/bamel/gemoda.

Supplementary Information: Available at http://web.mit.edu/bamel/gemoda.


Add to CiteULike CiteULike   Add to Connotea Connotea   Add to Del.icio.us Del.icio.us    What's this?


This article has been cited by other articles:


Home page
BioinformaticsHome page
E. Wijaya, K. Rajaraman, S.-M. Yiu, and W.-K. Sung
Detection of generic spaced motifs using submotif pattern mining
Bioinformatics, June 15, 2007; 23(12): 1476 - 1485.
[Abstract] [Full Text] [PDF]



Disclaimer: Please note that abstracts for content published before 1996 were created through digital scanning and may therefore not exactly replicate the text of the original print issues. All efforts have been made to ensure accuracy, but the Publisher will not be held responsible for any remaining inaccuracies. If you require any further clarification, please contact our Customer Services Department.