Bioinformatics Advance Access originally published online on June 17, 2009
Bioinformatics 2009 25(17):2279-2280; doi:10.1093/bioinformatics/btp374
| ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
A fast hybrid short read fragment assembly algorithm
1 School of Computer Engineering, Nanyang Technological University, Singapore 639798, 2 Department of Computer Science and Software Engineering, University of Melbourne, 3 National ICT Australia, Victoria Research Lab, Victoria 3010 and 4 School of Computer Science and IT, RMIT University, Melbourne 3001, Australia
* To whom correspondence should be addressed.
| Abstract |
|---|
Summary: The shorter and vastly more numerous reads produced by second-generation sequencing technologies require new tools that can assemble massive numbers of reads in reasonable time. Existing short-read assembly tools can be classified into two categories: greedy extension-based and graph-based. While the graph-based approaches are generally superior in terms of assembly quality, the computer resources required for building and storing a huge graph are very high. In this article, we present Taipan, an assembly algorithm which can be viewed as a hybrid of these two approaches. Taipan uses greedy extensions for contig construction but at each step realizes enough of the corresponding read graph to make better decisions as to how assembly should continue. We show that this approach can achieve an assembly quality at least as good as the graph-based approaches used in the popular Edena and Velvet assembly tools using a moderate amount of computing resources.
Availability and Implementation: Source code in C running on Linux is freely available at http://taipan.sourceforge.net
Contact: asbschmidt{at}ntu.edu.sg
Associate Editor: Alfonso Valencia
Received on February 8, 2009; revised on June 7, 2009; accepted on June 12, 2009