Bioinformatics Advance Access originally published online on February 15, 2005
Bioinformatics 2005 21(10):2225-2229; doi:10.1093/bioinformatics/bti323
| ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
Using sequence compression to speedup probabilistic profile matching
Information Science and Technology Institute, University of Urbino 61029 Urbino, Italy
*To whom correspondence should be addressed.
Motivation: Matching a biological sequence against a probabilistic pattern (or profile) is a common task in computational biology. A probabilistic profile, represented as a scoring matrix, is more suitable than a deterministic pattern to retain the peculiarities of a given segment of a family of biological sequences. Brute-force algorithms take O(NP) to match a sequence of N characters against a profile of length P << N.
Results: In this work, we exploit string compression techniques to speedup brute-force profile matching. We present two algorithms, based on run-length and LZ78 encodings, that reduce computational complexity by the compression factor of the encoding.
Contact: bogliolo{at}sti.uniurb.it
![]()
CiteULike
Connotea
Del.icio.us What's this?
This article has been cited by other articles:
![]() |
A. Kocsor, A. Kertesz-Farkas, L. Kajan, and S. Pongor Application of compression-based distance measures to protein sequence classification: a methodological study Bioinformatics, February 15, 2006; 22(4): 407 - 412. [Abstract] [Full Text] [PDF] |
||||
