Fast, statistically based alignment of amino acid sequences on the base of diagonal fragments of DOT-matrices
Institute of Cytology and Genetics of the Siberian Department of Russian Academy of Sciences Lavrentyeva avenue 10. 630090 Novosibirsk, Russia
1Istituto di Tecnologie Biomediche Avanzate, Consiglio Nazionale Delle Ricerche via Ampere 56, 20131 Milano, Italy
We present a new pairwise alignment algorithm that uses iterative statistical analysis of homologous subsequences. Apart from the classical conversion of the DOT-matrix characteristic of the NeedlemanWunsch algorithm (NW), we used only those matrix elements that corresponded to the most non-random subsequence homologies. The most reliable elements of the DOT-matrix are written to the compact competition matrices. The algorithm then searches for alignment on the base of only these matrix elements. Our algorithm has low storage and memory requirements, but provides a reliable alignment for the sequences of weak homology (or, at least for the homology regions). In such cases classical NW algorithms often produce unreliable results on the level of statistical noise due to accumulation of random matchings throughout the aligned sequences.
Received on August 22, 1991; accepted on May 14, 1992