Skip Navigation

This Article
Right arrow FREE Full Text (Print PDF) Freely available
Right arrow FREE Full Text (Screen PDF)
Right arrow Comments: Submit a response
Right arrow Alert me when this article is cited
Right arrow Alert me when Comments are posted
Right arrow Alert me if a correction is posted
Services
Right arrow Email this article to a friend
Right arrow Similar articles in this journal
Right arrow Similar articles in ISI Web of Science
Right arrow Similar articles in PubMed
Right arrow Alert me to new issues of the journal
Right arrow Add to My Personal Archive
Right arrow Download to citation manager
Right arrow Search for citing articles in:
ISI Web of Science (103)
Right arrowRequest Permissions
Citing Articles
Right arrowScopus Links
Right arrowCiting Articles via CrossRef
Google Scholar
Right arrow Articles by Vinga, S.
Right arrow Articles by Almeida, J.
Right arrow Search for Related Content
PubMed
Right arrow PubMed Citation
Right arrow Articles by Vinga, S.
Right arrow Articles by Almeida, J.
Social Bookmarking
 Add to CiteULike   Add to Connotea   Add to Del.icio.us  
What's this?

Bioinformatics Vol. 19 no. 4 2003
Pages 513-523
© 2003 Oxford University Press

Alignment-free sequence comparison—a review

Susana Vinga 2 and Jonas Almeida 1,*

1 Department of Biometry & Epidemiology, Medical University of South Carolina, 135 Cannon Street, Suite 303, PO Box 250835, Charleston, SC 29425, USA
2 Biomathematics Group, ITQB—Universidad Nova Lisboa, PO Box 127, 2780-156 Oeiras, Portugal

Received on July 15, 2002 ; revised on September 27, 2002 ; accepted on October 6, 2002

Motivation: Genetic recombination and, in particular, genetic shuffling are at odds with sequence comparison by alignment, which assumes conservation of contiguity between homologous segments. A variety of theoretical foundations are being used to derive alignment-free methods that overcome this limitation. The formulation of alternative metrics for dissimilarity between sequences and their algorithmic implementations are reviewed.

Results: The overwhelming majority of work on alignment-free sequence has taken place in the past two decades, with most reports published in the past 5 years. Two main categories of methods have been proposed—methods based on word (oligomer) frequency, and methods that do not require resolving the sequence with fixed word length segments. The first category is based on the statistics of word frequency, on the distances defined in a Cartesian space defined by the frequency vectors, and on the information content of frequency distribution. The second category includes the use of Kolmogorov complexity and Chaos Theory. Despite their low visibility, alignment-free metrics are in fact already widely used as pre-selection filters for alignment-based querying of large applications. Recent work is furthering their usage as a scale-independent methodology that is capable of recognizing homology when loss of contiguity is beyond the possibility of alignment.

Availability: Most of the alignment-free algorithms reviewed were implemented in MATLAB code and are available at http://bioinformatics.musc.edu/resources.html

Contact: almeidaj{at}musc.edu; svinga{at}itqb.unl.pt

* To whom correspondence should be addressed.


Add to CiteULike CiteULike   Add to Connotea Connotea   Add to Del.icio.us Del.icio.us    What's this?


This article has been cited by other articles:


Home page
BioinformaticsHome page
S. R. Maetschke, K. S. Kassahn, J. A. Dunn, S.-P. Han, E. Z. Curley, K. J. Stacey, and M. A. Ragan
A visual framework for sequence analysis using n-grams and spectral rearrangement
Bioinformatics, March 15, 2010; 26(6): 737 - 744.
[Abstract] [Full Text] [PDF]


Home page
BioinformaticsHome page
M. Domazet-Loso and B. Haubold
Efficient estimation of pairwise distances between genomes
Bioinformatics, December 15, 2009; 25(24): 3221 - 3227.
[Abstract] [Full Text] [PDF]


Home page
J Am Med Inform AssocHome page
G. Tsafnat and E. W Coiera
Computational Reasoning across Multiple Models
JAMIA, November 1, 2009; 16(6): 768 - 774.
[Abstract] [Full Text] [PDF]


Home page
Proc. Natl. Acad. Sci. USAHome page
G. A. Wu, S.-R. Jun, G. E. Sims, and S.-H. Kim
Whole-proteome phylogeny of large dsDNA virus families by an alignment-free method
PNAS, August 4, 2009; 106(31): 12826 - 12831.
[Abstract] [Full Text] [PDF]


Home page
BioinformaticsHome page
R. Giancarlo, D. Scaturro, and F. Utro
Textual data compression in computational biology: a synopsis
Bioinformatics, July 1, 2009; 25(13): 1575 - 1586.
[Abstract] [Full Text] [PDF]


Home page
Proc. Natl. Acad. Sci. USAHome page
G. E. Sims, S.-R. Jun, G. A. Wu, and S.-H. Kim
Alignment-free genome comparison with feature frequency profiles (FFP) and optimal resolutions
PNAS, February 24, 2009; 106(8): 2677 - 2682.
[Abstract] [Full Text] [PDF]


Home page
J. Bacteriol.Home page
M. Deloger, M. El Karoui, and M.-A. Petit
A Genomic Distance Based on MUM Indicates Discontinuity between Most Bacterial Species and Genera
J. Bacteriol., January 1, 2009; 191(1): 91 - 99.
[Abstract] [Full Text] [PDF]


Home page
BioinformaticsHome page
Q. Dai, Y. Yang, and T. Wang
Markov model plus k-word distributions: a synergy that produces novel statistical measures for sequence comparison
Bioinformatics, October 15, 2008; 24(20): 2296 - 2302.
[Abstract] [Full Text] [PDF]


Home page
Nucleic Acids ResHome page
K. Yang and L. Zhang
Performance comparison between k-tuple distance and four model-based distances in phylogenetic tree reconstruction
Nucleic Acids Res., March 1, 2008; 36(5): e33 - e33.
[Abstract] [Full Text] [PDF]


Home page
BioinformaticsHome page
S. Hochreiter, M. Heusel, and K. Obermayer
Fast model-based protein homology detection without alignment
Bioinformatics, July 15, 2007; 23(14): 1728 - 1736.
[Abstract] [Full Text] [PDF]


Home page
BioinformaticsHome page
M. R. Kantorovitz, G. E. Robinson, and S. Sinha
A statistical method for alignment-free comparison of regulatory sequences
Bioinformatics, July 1, 2007; 23(13): i249 - i255.
[Abstract] [Full Text] [PDF]


Home page
Syst BiolHome page
M. Hohl and M. A. Ragan
Is Multiple-Sequence Alignment Required for Accurate Inference of Phylogeny?
Syst Biol, April 1, 2007; 56(2): 206 - 221.
[Abstract] [Full Text] [PDF]


Home page
Brief BioinformHome page
A. Yu. Mitrophanov and M. Borodovsky
Statistical significance in biological sequence analysis
Brief Bioinform, March 1, 2006; 7(1): 2 - 24.



Home page
BioinformaticsHome page
A. Kocsor, A. Kertesz-Farkas, L. Kajan, and S. Pongor
Application of compression-based distance measures to protein sequence classification: a methodological study
Bioinformatics, February 15, 2006; 22(4): 407 - 412.
[Abstract] [Full Text] [PDF]


Home page
BioinformaticsHome page
T.-J. Wu, Y.-H. Huang, and L.-A. Li
Optimal word sizes for dissimilarity measures and estimation of the degree of dissimilarity between DNA sequences
Bioinformatics, November 15, 2005; 21(22): 4125 - 4132.
[Abstract] [Full Text] [PDF]


Home page
BioinformaticsHome page
F. R. Pinto, L. A. Cowart, Y. A. Hannun, B. Rohrer, and J. S. Almeida
Local correlation of expression profiles with gene annotations--proof of concept for a general conciliatory method
Bioinformatics, April 1, 2005; 21(7): 1037 - 1045.
[Abstract] [Full Text] [PDF]


Home page
Nucleic Acids ResHome page
R. C. Edgar
Local homology recognition and distance measures in linear time using compressed amino acid alphabets
Nucleic Acids Res., January 16, 2004; 32(1): 380 - 385.
[Abstract] [Full Text] [PDF]



Disclaimer: Please note that abstracts for content published before 1996 were created through digital scanning and may therefore not exactly replicate the text of the original print issues. All efforts have been made to ensure accuracy, but the Publisher will not be held responsible for any remaining inaccuracies. If you require any further clarification, please contact our Customer Services Department.