A probabilistic algorithm for interactive huge genome comparison
1Computer Science Laboratory (CSL), Conservatoire National des Arts et Métiers (CNAM) 292 rue Samt-Martin, 75003 Paris
2Laboratory of Cellular and Molecular Biology, University of La Rochelle Avenue Marillac, 17042 La Rochelle Cedex I, France
3To whom correspondence should he addressed
We designed a new probabilistic algorithm, named PAGEC (probabilistic algorithm for genome comparison), which allowed a highly interactive study of long genomic strings. The comparison between two nucleic acid sequences is based on the creation of multiple index table, which drastically reduces processing time for huge genomes, e.g. 13 min for a 4 Mb/4 Mb comparison. PAGEC lowered the need for memory when compared with other types of algorithm and took into account the low resolution of the final representalion (paper or computer screen). Considering that standard printers permit a 300 d.p.i. resolution, the loss of computed information due to the probabilistic conception of the algorithm was not usually noticeable in the present study, mainly due to increasd genomic sizes. Refinement was possible through an interactive zooming system, which enabled which visualization of the lexical base sequences of a considered part of both of the studied genoines. Biological examples of computation based on yeast and animal nucleic acid sequences presented in this paper reveal the flexibility of the PAGEC program, which is a valuable toolfor genetic studies as it offers a solution to an important problem that will become even more important as time passes.
Received on June 5, 1995; accepted on September 7, 1995