Detection of significant patterns by compression algorithms: the case of approximate tandem repeats in DNA sequences
1Laboratoire d'Informatique Fondamentale de Lille CNRS URA 369, Villeneuve d'Ascq 59655, France
2Université de Mons-Hainaut Mons, B7000, Belgium
3Université de Versailles Saint-Quentin Versailles 78035, France
4To whom correspondence should be addressed
MOTIVATION: Compression algorithms can be used to analyse genetic sequences. A compression algorithm tests a given property on the sequence and uses it to encode the sequence: if the property is true, it reveals some structure of the sequence which can be described briefly, this yields a description of the sequence which is shorter than the sequence of nucleotides given in extenso. The more a sequence is compressed by the algorithm, the more significant is the property for that sequence.
RESULTS: We present a compression algorithm that tests the presence of a particular type of dosDNA (defined ordered sequence-DNA): approximate tandem repeats of small motifs (i.e. of lengths <4). This algorithm has been experimented with on four yeast chromosomes. The presence of approximate tandem repeats seems to be a uniform structural property of yeast chromosomes.
AVAILABILITY: The algorithms in C are available on the World Wide Web (URL: http://www.lifl.fr/~rivals/Doc/RTA/).
CONTACT: E-mail: rivals{at}lifl.fr
Received on March 27, 1996; revised on July 29, 1996; accepted on August 22, 1996
This article has been cited by other articles:
![]() |
R. Kolpakov, G. Bana, and G. Kucherov mreps: efficient and flexible detection of tandem repeats in DNA Nucleic Acids Res., July 1, 2003; 31(13): 3672 - 3678. [Abstract] [Full Text] [PDF] |
||||
![]() |
E. P.C. Rocha An Appraisal of the Potential for Illegitimate Recombination in Bacterial Genomes and Its Consequences: From Duplications to Genome Reduction Genome Res., June 1, 2003; 13(6): 1123 - 1132. [Abstract] [Full Text] [PDF] |
||||

