Discovering simple DNA sequences by the algorithmic significance method
1Linus Pauling Institute of Science and Medicine 440 Page Mill Rd, Palo Alto, CA 94306, USA
1To whom reprint requests should be sent. Present address: Biological and Medical Research Division, Bldg 202, Argonne National Laboratory, Argonne, IL 604394833, USA
A new method, algorithmic significance, is proposed as a tool for discovery of patterns in DNA sequences. The main idea is that patterns can be discovered by finding ways to encode the observed data concisely. In this sense, the method can be viewed as a formal version of the Occam's Razor principle. In this paper the method is applied to discover significantly simple DNA sequences. We define DNA sequences to be simple if they contain repeated occurrences of certain words and thus can be encoded in a small number of bits. Such definition includes minisatellites and microsatellites. A standard dynamic programming algorithm for data compression is applied to compute the minimal encoding lengths of sequences in linear time. An electronic mail server for identification of simple sequences based on the proposed method has been installed at the Internet address pythia@anl.gov.
Received on July 20, 1992; accepted on January 5, 1993
This article has been cited by other articles:
![]() |
V. Paar, N. Pavin, M. Rosandic, M. Gluncic, I. Basar, R. Pezer, and S. D. Zinic ColorHOR--novel graphical algorithm for fast scan of alpha satellite higher-order repeats and HOR annotation for GenBank sequence of human genome Bioinformatics, April 1, 2005; 21(7): 846 - 852. [Abstract] [Full Text] [PDF] |
||||
![]() |
K. J. Kalafus, A. R. Jackson, and A. Milosavljevic Pash: Efficient Genome-Scale Sequence Anchoring by Positional Hashing Genome Res., April 1, 2004; 14(4): 672 - 678. [Abstract] [Full Text] [PDF] |
||||
![]() |
J. C. Jiang, P. A. Kirchman, M. Zagulski, J. Hunt, and S. M. Jazwinski Homologs of the Yeast Longevity Gene LAG1 in Caenorhabditis elegans and Human Genome Res., December 1, 1998; 8(12): 1259 - 1272. [Abstract] [Full Text] |
||||
![]() |
L. C. Bailey Jr., S. Fischer, J. Schug, J. Crabtree, M. Gibson, and G. C. Overton GAIA: Framework Annotation of Genomic Sequence Genome Res., March 1, 1998; 8(3): 234 - 250. [Abstract] [Full Text] |
||||

