Skip Navigation


Bioinformatics Advance Access originally published online on October 27, 2005
Bioinformatics 2006 22(1):21-28; doi:10.1093/bioinformatics/bti745
This Article
Right arrow Full Text Freely available
Right arrow FREE Full Text (Print PDF) Freely available
Right arrow All Versions of this Article:
22/1/21    most recent
bti745v1
Right arrow Comments: Submit a response
Right arrow Alert me when this article is cited
Right arrow Alert me when Comments are posted
Right arrow Alert me if a correction is posted
Services
Right arrow Email this article to a friend
Right arrow Similar articles in this journal
Right arrow Similar articles in ISI Web of Science
Right arrow Similar articles in PubMed
Right arrow Alert me to new issues of the journal
Right arrow Add to My Personal Archive
Right arrow Download to citation manager
Right arrow Search for citing articles in:
ISI Web of Science (3)
Right arrowRequest Permissions
Google Scholar
Right arrow Articles by Jensen, K. L.
Right arrow Articles by Stephanopoulos, G. N.
Right arrow Search for Related Content
PubMed
Right arrow PubMed Citation
Right arrow Articles by Jensen, K. L.
Right arrow Articles by Stephanopoulos, G. N.
Social Bookmarking
 Add to CiteULike   Add to Connotea   Add to Del.icio.us  
What's this?

© The Author 2005. Published by Oxford University Press. All rights reserved. For Permissions, please email: journals.permissions@oxfordjournals.org

A generic motif discovery algorithm for sequential data

Kyle L. Jensen 1, Mark P. Styczynski 1, Isidore Rigoutsos 1,2 and Gregory N. Stephanopoulos 1,*

1Department of Chemical Engineering, Massachusetts Institute of Technology Cambridge, MA 02139, USA
2IBM Research Division, Thomas J. Watson Research Center Yorktown Heights, NY 10598, USA

*To whom correspondence should be addressed.

Motivation: Motif discovery in sequential data is a problem of great interest and with many applications. However, previous methods have been unable to combine exhaustive search with complex motif representations and are each typically only applicable to a certain class of problems.

Results: Here we present a generic motif discovery algorithm (Gemoda) for sequential data. Gemoda can be applied to any dataset with a sequential character, including both categorical and real-valued data. As we show, Gemoda deterministically discovers motifs that are maximal in composition and length. As well, the algorithm allows any choice of similarity metric for finding motifs. Finally, Gemoda's output motifs are representation-agnostic: they can be represented using regular expressions, position weight matrices or any number of other models for any type of sequential data. We demonstrate a number of applications of the algorithm, including the discovery of motifs in amino acids sequences, a new solution to the (l,d)-motif problem in DNA sequences and the discovery of conserved protein substructures.

Availability: Gemoda is freely available at http://web.mit.edu/bamel/gemoda

Contact: gregstep{at}mit.edu

Supplementary Information: Available at http://web.mit.edu/bamel/gemoda


Received on August 4, 2005; revised on October 14, 2005; accepted on October 24, 2005

Add to CiteULike CiteULike   Add to Connotea Connotea   Add to Del.icio.us Del.icio.us    What's this?


This article has been cited by other articles:


Home page
BioinformaticsHome page
E. Wijaya, K. Rajaraman, S.-M. Yiu, and W.-K. Sung
Detection of generic spaced motifs using submotif pattern mining
Bioinformatics, June 15, 2007; 23(12): 1476 - 1485.
[Abstract] [Full Text] [PDF]



Disclaimer: Please note that abstracts for content published before 1996 were created through digital scanning and may therefore not exactly replicate the text of the original print issues. All efforts have been made to ensure accuracy, but the Publisher will not be held responsible for any remaining inaccuracies. If you require any further clarification, please contact our Customer Services Department.