A generic algorithm for finding restriction sites within DNA sequences
Division of Biomedical Engineering and Computing, Vanderbilt University Medical School Nashville, TN 372322155
1Hewlett-Packard Co Cupertino, CA 95014, USA
This paper describes a generic algorithm for finding restriction sites within DNA sequences. The genericity of the algorithm is made possible through the use of set theory. Basic elements of DNA sequences, i.e. nucleotides (bases), are represented in sets, and DNA sequences, whether specific, ambiguous or even protein-coding, are represented as sequences of those sets. The set intersection operation demonstrates its ability to perform pattern-matching correctly on various DNA sequences. The performance analysis showed that the degree of complexity of the pattern matching is reduced from exponential to linear. An example is given to show the actual and potential restriction sites, derived by the generic algorithm, in the DNA sequence template coding for a synthetic calmodulin.
Received on October 2, 1990; accepted on December 18, 1990