Bioinformatics Advance Access originally published online on January 19, 2007
Bioinformatics 2007 23(6):687-693; doi:10.1093/bioinformatics/btl665
| ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
An efficient, versatile and scalable pattern growth approach to mine frequent patterns in unaligned protein sequences
1Division of Medicinal Chemistry, Leiden/Amsterdam Center for Drug Research and 2Leiden Institute of Advanced Computer Science, Leiden University, Leiden, The Netherlands
*To whom correspondence should be addressed.
| Abstract |
|---|
Motivation: Pattern discovery in protein sequences is often based on multiple sequence alignments (MSA). The procedure can be computationally intensive and often requires manual adjustment, which may be particularly difficult for a set of deviating sequences. In contrast, two algorithms, PRATT2 (http//www.ebi.ac.uk/pratt/) and TEIRESIAS (http://cbcsrv.watson.ibm.com/) are used to directly identify frequent patterns from unaligned biological sequences without an attempt to align them. Here we propose a new algorithm with more efficiency and more functionality than both PRATT2 and TEIRESIAS, and discuss some of its applications to G protein-coupled receptors, a protein family of important drug targets.
Results: In this study, we designed and implemented six algorithms to mine three different pattern types from either one or two datasets using a pattern growth approach. We compared our approach to PRATT2 and TEIRESIAS in efficiency, completeness and the diversity of pattern types. Compared to PRATT2, our approach is faster, capable of processing large datasets and able to identify the so-called type III patterns. Our approach is comparable to TEIRESIAS in the discovery of the so-called type I patterns but has additional functionality such as mining the so-called type II and type III patterns and finding discriminating patterns between two datasets.
Availability: The source code for pattern growth algorithms and their pseudo-code are available at http://www.liacs.nl/home/kosters/pg/
Contact: k.ye{at}lacdr.leidenuniv.nl
Associate Editor: Limsoon Wong
Received on October 24, 2006; revised on December 12, 2006; accepted on December 27, 2006