Bioinformatics Advance Access originally published online on July 22, 2004
Bioinformatics 2004 20(18):3455-3461; doi:10.1093/bioinformatics/bth426
| ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
Bioinformatics vol. 20 issue 18 © Oxford University Press 2004; all rights reserved.
A probabilistic measure for alignment-free sequence comparison
1 School of Computing and Information Technology, Griffith University, Nathan Campus, QLD 4111, Australia and 2 Alchemia Ltd, PO Box 6242, Upper Mount Gravatt, QLD 4122, Australia
Received on March 1, 2004; revised on June 28, 2004; accepted on July 26, 2004
Advance Access Publication July 22, 2004
Motivation: Alignment-free sequence comparison methods are still in the early stages of development compared to those of alignment-based sequence analysis. In this paper, we introduce a probabilistic measure of similarity between two biological sequences without alignment. The method is based on the concept of comparing the similarity/dissimilarity between two constructed Markov models.
Results: The method was tested against six DNA sequences, which are the thrA, thrB and thrC genes of the threonine operons from Escherichia coli K-12 and from Shigella flexneri; and one random sequence having the same base composition as thrA from E.coli. These results were compared with those obtained from CLUSTAL W algorithm (alignment-based) and the chaos game representation (alignment-free). The method was further tested against a more complex set of 40 DNA sequences and compared with other existing sequence similarity measures (alignment-free).
Availability: All datasets and computer codes written in MATLAB are available upon request from the first author.
Contact: t.pham{at}griffith.edu.au
* To whom correspondence should be addressed.
![]()
CiteULike
Connotea
Del.icio.us What's this?
This article has been cited by other articles:
![]() |
Q. Dai, Y. Yang, and T. Wang Markov model plus k-word distributions: a synergy that produces novel statistical measures for sequence comparison Bioinformatics, October 15, 2008; 24(20): 2296 - 2302. [Abstract] [Full Text] [PDF] |
||||
![]() |
T.-J. Wu, Y.-H. Huang, and L.-A. Li Optimal word sizes for dissimilarity measures and estimation of the degree of dissimilarity between DNA sequences Bioinformatics, November 15, 2005; 21(22): 4125 - 4132. [Abstract] [Full Text] [PDF] |
||||
