Bioinformatics Advance Access published online on February 15, 2005
Bioinformatics, doi:10.1093/bioinformatics/bti323
| ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
1 Information Science and Technology Institute, University of Urbino 61029 Urbino, Italy
* To whom correspondence should be addressed.
Motivation: Matching a biological sequence against a probabilistic pattern (or profile) is a common task in computational biology. A probabilistic profile, represented as a scoring matrix, is more suitable than a deterministic pattern to retain the peculiarities of a given segment of a family of biological sequences. Brute-force algorithms take O(NP) to match a sequence of N characters against a profile of length P << N. Results: In this work, we exploit string compression techniques to speedup brute-force profile matching. We present two algorithms, based on run-length and LZ78 encodings, that reduce computational complexity by the compression factor of the encoding.
Received October 11, 2004
Revised January 13, 2005
Accepted February 9, 2005
Article
Using sequence compression to speedup probabilistic profile matching
Alessandro Bogliolo, E-mail: bogliolo{at}sti.uniurb.it
![]()
Abstract ![]()
CiteULike
Connotea
Del.icio.us What's this?
This article has been cited by other articles:
![]() |
A. Kocsor, A. Kertesz-Farkas, L. Kajan, and S. Pongor Application of compression-based distance measures to protein sequence classification: a methodological study Bioinformatics, February 15, 2006; 22(4): 407 - 412. [Abstract] [Full Text] [PDF] |
||||
