Data bank homology search algorithm with linear computation complexity
Supercomputer Computations Research Institute, Florida State University B-186, Tallahassee, FL 32306-4052, USA
1Institute of Cytology and Genetics, Russian Academy of Science pr. akad. Lavrentyeva 10, Novosibirsk, 630090, Russia
2Institute dt Technologie Biomediche Avanzate, Consiglio Nazionale Delle Ricerche, via Ampere 56 21031 Milano MI, Italy
3To whom reprint requests should be Sent
A new algorithm for data bank homology search is proposed. The principal advantages of the new algorithm are: (i) linear computation complexity; (ii) low memory requirements; and (iii) high sensitivity to the presence of local region homology. The algorithm first calculates indicative matrices of k-tuple realization in the query sequence and then searches for an appropriate number of matching k-tuples within a narrow range in database sequences. It does not require k-tuple coordinates tabulation and in-memory placement for database sequences. The algorithm is implemented in a program for execution on PC-compatible computers and tested on PIR and GenBank databases with good results. A few modifications designed to improve the selectivity are also discussed. As an application example, the search for homology of the mouse homeotic protein HOX 3.1 is given.
; accepted on January 20, 1994