Skip Navigation



Bioinformatics Advance Access published online on January 19, 2007

Bioinformatics, doi:10.1093/bioinformatics/btl665
This Article
Right arrow Advance Access manuscript (PDF) Freely available
Right arrow All Versions of this Article:
23/6/687    most recent
btl665v2
btl665v1
Right arrow Alert me when this article is cited
Right arrow Alert me if a correction is posted
Services
Right arrow Email this article to a friend
Right arrow Similar articles in this journal
Right arrow Similar articles in PubMed
Right arrow Alert me to new issues of the journal
Right arrow Add to My Personal Archive
Right arrow Download to citation manager
Right arrowRequest Permissions
Google Scholar
Right arrow Articles by Ye, K.
Right arrow Articles by IJzerman, A. P.
Right arrow Search for Related Content
PubMed
Right arrow PubMed Citation
Right arrow Articles by Ye, K.
Right arrow Articles by IJzerman, A. P.
Social Bookmarking
 Add to CiteULike   Add to Connotea   Add to Del.icio.us  
What's this?

© The Author (2007). Published by Oxford University Press. All rights reserved. For Permissions, please email: journals.permissions@oxfordjournals.org

An efficient, versatile and scalable pattern growth approach to mine frequent patterns in unaligned protein sequences

Kai Ye 1,*, Walter A. Kosters 2 and Adriaan P. IJzerman 1

1Division of Medicinal Chemistry LACDR, Leiden University, Leiden, The Netherlands
2Leiden Institute of Advanced Computer Science, Leiden University, Leiden, The Netherlands.

*To whom correspondence should be addressed. Kai Ye, E-mail: k.ye{at}lacdr.leidenuniv.nl, yekai_19770619{at}hotmail.com


   Abstract

Motivation: Pattern discovery in protein sequences is often based on multiple sequence alignments (MSA). The procedure can be computationally intensive and often requires manual adjustment, which may be particularly difficult for a set of deviating sequences. In contrast, two algorithms, PRATT2 (http//www.ebi.ac.uk/pratt/) and TEIRESIAS (http://cbcsrv.watson.ibm.com/) are used to directly identify frequent patterns from unaligned biological sequences without an attempt to align them. Here we propose a new algorithm with more efficiency and more functionality than both PRATT2 and TEIRESIAS, and discuss some of its applications to G protein-coupled receptors, a protein family of important drug targets.

Results: In this study, we designed and implemented six algorithms to mine three different pattern types from either one or two datasets using a pattern growth approach. We compared our approach to PRATT2 and TEIRESIAS in efficiency, completeness and the diversity of pattern types. Compared to PRATT2, our approach is faster, capable of processing large datasets and able to identify so-called type III patterns. Our approach is comparable to TEIRESIAS in the discovery of so-called type I patterns but has additional functionality such as mining so-called type II and type III patterns and finding discriminating patterns between two datasets.

Availability: The source code for pattern growth algorithms and their pseudo-code are available at http://www.liacs.nl/home/kosters/pg/

Associate Editor: Limsoon Wong


Received on October 24, 2006; revised on December 12, 2006; accepted on December 27, 2006

Add to CiteULike CiteULike   Add to Connotea Connotea   Add to Del.icio.us Del.icio.us    What's this?




Disclaimer:
Please note that abstracts for content published before 1996 were created through digital scanning and may therefore not exactly replicate the text of the original print issues. All efforts have been made to ensure accuracy, but the Publisher will not be held responsible for any remaining inaccuracies. If you require any further clarification, please contact our Customer Services Department.