Bioinformatics, Vol 15, 194-202, Copyright © 1999 by Oxford University Press
JS Varre, JP Delahaye and E Rivals
MOTIVATION: Evolution acts in several ways on DNA: either by mutating a
base, or by inserting, deleting or copying a segment of the sequence
(Ruddle, 1997; Russell, 1994; Li and Grauer, 1991). Classical alignment
methods deal with point mutations (Waterman, 1995), genome-level mutations
are studied using genome rearrangement distances (Bafna and Pevzner, 1993,
1995; Kececioglu and Sankoff, 1994; Kececioglu and Ravi, 1995). The latter
distances generally operate, not on the sequences, but on an ordered list
of genes. To our knowledge, no measure of distance attempts to compare
sequences using a general set of segment- based operations. RESULTS: Here
we define a new family of distances, called transformation distances, which
quantify the dissimilarity between two sequences in terms of segment-based
events. We focus on the case where segment-copy, -reverse-copy and
-insertion are allowed in our set of operations. Those events are weighted
by their description length, but other sets of weights are possible when
biological information is available. The transformation distance from
sequence S to sequence T is then the Minimum Description Length among all
possible scripts that build T knowing S with segment-based operations. The
underlying idea is related to Kolmogorov complexity theory. We present an
algorithm which, given two sequences S and T, computes exactly and
efficiently the transformation distance from S to T. Unlike alignment
methods, the method we propose does not necessarily respect the order of
the residues within the compared sequences and is therefore able to account
for duplications and translocations that cannot be properly described by
sequence alignment. A biological application on Tnt1 tobacco
retrotransposon is presented. AVAILABILITY: The algorithm and the graphical
interface can be downloaded at http://www.lifl.fr/ approximately varre/TD
ARTICLES
Transformation distances: a family of dissimilarity measures based on movements of segments
Laboratoire d'Informatique Fondamentale de Lille (LIFL), UFR IEEA - Bat M3, 59655 Villeneuve d'Ascq Cedex, France.
![]()
CiteULike
Connotea
Del.icio.us What's this?
This article has been cited by other articles:
![]() |
R. Giancarlo, D. Scaturro, and F. Utro Textual data compression in computational biology: a synopsis Bioinformatics, July 1, 2009; 25(13): 1575 - 1586. [Abstract] [Full Text] [PDF] |
||||
![]() |
M. Deloger, M. El Karoui, and M.-A. Petit A Genomic Distance Based on MUM Indicates Discontinuity between Most Bacterial Species and Genera J. Bacteriol., January 1, 2009; 191(1): 91 - 99. [Abstract] [Full Text] [PDF] |
||||

