An algorithm based on graph theory for the assembly of contigs in physical mapping of DNA
1Department of Genetics and Development, Columbia University College of Physicians and Surgeons 630 West 168th Street, New York, NY 10032
2Department of Biochemistry and Molecular Biophysics, Columbia University College of Physicians and Surgeons 630 West 168th Street, New York, NY 10032
3Department of Neurology, Columbia University College of Physicians and Surgeons 630 West 168th Street, New York, NY 10032
4Department of Cancer Center, Columbia University College of Physicians and Surgeons 630 West 168th Street, New York, NY 10032
5Department of Howard Hughes Medical Institute, Columbia University College of Physicians and Surgeons 630 West 168th Street, New York, NY 10032
*To whom correspondence should be addressed
An algorithm is described for mapping DNA contigs based on an interval graph (IG) representation. In general terms, the input to the algorithm is a set of binary overlapping relations among finite intervals spread along a real line, from which the algorithm generates sets of ordered overlapping fragments spanning that line. The implications of a more general case of the IG, called a probe interval graph (PIG), in which only a subset of cosmids are used as probes, are also discussed. In the specific case of cosmids hybridizing to regions of a YAC, the algorithm takes cross-hybridization information using the cosmids as probes, and orders them along the YAC; if gaps exist due to insufficient coverage of cosmid contigs along the length of the YAC, repetitive use of the algorithm generates sets of ordered overlapping fragments. Both the IG and the PIG can expose problems caused by false overlaps, such as hybridizations due to repetitive elements. The algorithm, has been coded in C; CPU time is essentially linear with respect to the number of cosmids analyzed. Results are presented for the application of a PIG to cosmid contig assembly along a human chromosome 13-specific YAC. An alignment of 67 cosmids spanning a YAC took 0.28 seconds of CPU time on a Convex 220 computer.