Bioinformatics Vol. 19 no. 1 2003
Pages 22-29
© 2003 Oxford University Press
A divide-and-conquer approach to fragment assembly
University of Nebraska-Lincoln, Department of Electrical Engineering, 209N WSEC, Lincoln, NE 68503, USA
Received on February 4, 2002
; revised on June 10, 2002 and July 6, 2002
; accepted on July 9, 2002
Motivation: One of the major problems in DNA sequencing is assembling the fragments obtained by shotgun sequencing. Most existing fragment assembly techniques follow the overlaplayoutconsensus approach. This framework requires extensive computation in each phase and becomes inefficient with increasing number of fragments.
Results: We propose a new algorithm which solves the overlap, layout, and consensus phases simultaneously. The fragments are clustered with respect to their Average Mutual Information (AMI) profiles using the k-means algorithm. This removes the unnecessary burden of considering the collection of fragments as a whole. Instead, the orientation and overlap detection are solved efficiently, within the clusters. The algorithm has successfully reconstructed both artificial and real data.
Availability: Available on request from the authors.
Contact: otu{at}eecomm.unl.edu